-
Notifications
You must be signed in to change notification settings - Fork 25
FAQ
A sample of CleanPlan using CSV file is like this
{
"source" : {
"type" : "csv",
"file" : ["test/src/qa/qcri/nadeef/test/input/dumptest.csv"]
},
"rule" : [
{
"name" : "myFd",
"type" : "fd",
"value" : ["B|A, C"]
}
]
}Currently we only support working with two CSV files, instead of two database sources. An example of such CleanPlan is as following
{
"source" : {
"type" : "csv",
"file" : [
"test/src/qa/qcri/nadeef/test/input/notused1.csv",
"test/src/qa/qcri/nadeef/test/input/bank1.csv",
"test/src/qa/qcri/nadeef/test/input/tran1.csv",
"test/src/qa/qcri/nadeef/test/input/notused2.csv",
]
},
"rule" : [
{
"type" : "udf",
"table" : ["bank1", "tran1"],
"value" : ["qa.qcri.nadeef.test.udf.MyRule5"]
}
]
}In the source file attributes, user can put a list of csv files which will be worked on. In the table attribute of the rule user needs to specify exactly which ones this rule is going to use. The names need to be exactly as the same as the filename (without extension name). Edit
The NADEEF API documents are included inside the source code. User can generate and view the JavaDoc files via command
ant doc
The API documents will be generated in the out/doc directory.
In NADEEF there is a module called FixDeicisionMaker which provides algorithm to decide the right violation fixes given a group of candidate fixes. The default algorithm NADEEF provides is the Equivalence Classes algorithm. User is able to replace this default algorithm with his own algorithm. Steps are the following:
- Inherits the class from FixDecisionMaker and implements the decide method. An example could be like this:
1 public class MyDecisionMaker extends FixDecisionMaker {
2 /**
3 * Decides which fixes are right given a collection of candidate fixes.
4 *
5 * @param fixes candidate fixes.
6 * @return a collection of right @see Fix.
7 */
8 @Override
9 public Collection<Fix> decide(Collection<Fix> fixes) {
10 // My decision making logic here
11 }
12 }- Make sure your class is in the CLASSPATH and make your class in the general entry of your NADEEF conf file as below:
{
"database" : {
"url" : "localhost/unittest",
"username" : "tester",
"password" : "tester",
"type" : "postgres"
},
"general" : {
"maxIterationNumber" : 1,
"fixdecisionmaker" : "MyDeicisonMakerClass"
},
"ruleext" : {
"fd" : "qa.qcri.nadeef.ruleext.FDRuleBuilder",
"cfd" : "qa.qcri.nadeef.ruleext.CFDRuleBuilder"
}
}Like NADEEF supports FD and CFD by default, putting more abstract rule is easy. The implementation is about translating abstract rules' text to the NADEEF Rule class. The steps to support more abstract rule is as following
- Creates a new Rule Builder based on the RuleBuilder class in NADEEF. An example could be as following:
public class MyRuleBuilder extends RuleBuilder {
/**
* Generates and compiles the rule .class file without loading it.
*
* @return Output class file.
*/
@Override
public Collection<File> compile() throws IOException {
// Here it tells how to compile the rule into a collection of .class files.
}
/**
* Parse the string value into a rule properties.
*/
@Override
protected void parse() {
// Here it translates string values into rule properties. An example could be a FD rule
// where the parse function will translate the FD text into left hand side and right hand side strings.
// Based on both sides it generates code for the Rule interfaces.
}
}- Make sure that the code is in the CLASSPATH and put the entry in the NADEEF conf file. An example could be
{
"database" : {
"url" : "localhost/unittest",
"username" : "tester",
"password" : "tester",
"type" : "postgres"
},
"general" : {
"maxIterationNumber" : 1
},
"ruleext" : {
"fd" : "qa.qcri.nadeef.ruleext.FDRuleBuilder",
"cfd" : "qa.qcri.nadeef.ruleext.CFDRuleBuilder",
"myRule" : "MyRuleBuilder"
}
}- Finally user can starts to use new Rule and put it in the CleanPlan. For example,
{
"source" : {
"type" : "csv",
"file" : ["test/src/qa/qcri/nadeef/test/input/dumptest.csv"]
},
"rule" : [
{
"name" : "myFd",
"type" : "myRule",
"value" : ["My Rule text"]
}
]
}By default NADEEF will generate FD/CFD Rule java files and compile it in the first time. In the later run NADEEF will check whether the FD rule stays the same, if so NADEEF will reuse the same class file from last time (if it exists). This design is to prevent redundant compiling every time when user is running the same FD.
However, under certain situations like keeping changing data source user wants to re-generate the file and do the compiling all the time. It can be done by setting the alwaysCompile flag in the NADEEF conf file. An example could be like
{
"database" : {
"url" : "localhost/unittest",
"username" : "tester",
"password" : "tester",
"type" : "postgres"
},
"general" : {
"alwaysCompile" : true
},
"ruleext" : {
"fd" : "qa.qcri.nadeef.ruleext.FDRuleBuilder",
"cfd" : "qa.qcri.nadeef.ruleext.CFDRuleBuilder"
}
}