You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This may be implemented in other packages, so maybe just a wrapper or documentation matter. The idea is to create validation rules for a given data.frame "a la" testthat.
Specific use cases (examples):
column xxx is a Date and should be greater or less than a given date
columns xxx - yyy must be less than a given number (e.g. delays from yyy to xxx) must be less than 30 days
column sex should be either male, female, or unknown
column age should be strictly positive, less than 150
column xxx should be of specific class
Really it seems to all boil down to:
entries in column xxx must fulfill a logical condition, e.g. xxx < whatever, xxx %in% something
entries in column xxx and yyy must fulfill a logical condition, e.g. xxx > yyy or xxx - yyy > something
I suspect we can use testthat as a backend, with an interface similar to the clean_spelling, e.g.
validate_variable(x, rule): validates a single variable
validate_data(x, rules = list (variable_xxx = rule_xxx, variable_yyy = rule_yyy)): applies validate_variable to a bunch of variables
Ideally validation rules could be provided in a table outside R e.g. in an excel spreadsheet, like we did for the cleaning rules in clean_spelling.
The text was updated successfully, but these errors were encountered:
Looks great indeed. Maybe still useful to build a wrapper around it? Being able to specify rules as a separate file would be cool - proved tremendously useful for dictionary-based data cleaning. Thoughts?
This may be implemented in other packages, so maybe just a wrapper or documentation matter. The idea is to create validation rules for a given
data.frame
"a la"testthat
.Specific use cases (examples):
xxx
is aDate
and should be greater or less than a given datexxx - yyy
must be less than a given number (e.g. delays fromyyy
toxxx
) must be less than 30 dayssex
should be eithermale
,female
, orunknown
age
should be strictly positive, less than 150xxx
should be of specific classReally it seems to all boil down to:
xxx
must fulfill a logical condition, e.g.xxx < whatever
,xxx %in% something
xxx
andyyy
must fulfill a logical condition, e.g.xxx > yyy
orxxx - yyy > something
I suspect we can use
testthat
as a backend, with an interface similar to theclean_spelling
, e.g.validate_variable(x, rule)
: validates a single variablevalidate_data(x, rules = list (variable_xxx = rule_xxx, variable_yyy = rule_yyy))
: appliesvalidate_variable
to a bunch of variablesIdeally validation rules could be provided in a table outside R e.g. in an excel spreadsheet, like we did for the cleaning rules in
clean_spelling
.The text was updated successfully, but these errors were encountered: