|
| 1 | +# Evaluations |
| 2 | + |
| 3 | +The CarML platform enables easy evaluations of both performance and accuracy of models across frameworks. |
| 4 | +The evaluation run using the CarML library, without the website components and are available as subcommands to each agent. |
| 5 | +Utility functions are available to help run the experiments, summarize and analyize the data, and visualize the results. |
| 6 | + |
| 7 | + |
| 8 | +?> Evaluations currently only run on datasets known by [DLDataset](https://github.com/rai-project/dldataset) |
| 9 | + |
| 10 | +## Running Evaluations |
| 11 | + |
| 12 | +One can run evaluations across different frameworks and models or on a single framework and model. |
| 13 | +Both commands are similar and we will show them here. |
| 14 | + |
| 15 | +### Running Evaluations on all Frameworks / Models |
| 16 | + |
| 17 | +[evaluate.go](https://github.com/rai-project/dlframework/blob/master/framework/cmd/server/evaluate.go) is a wrapper tool exists to make it easier to run evaluations across frameworks and model. |
| 18 | +One can specify the [frameworks, models, and batch sizes](https://github.com/rai-project/dlframework/blob/master/framework/cmd/server/evaluate.go#L31-L72) to use within the file. |
| 19 | +The program can then be run |
| 20 | + |
| 21 | + |
| 22 | +#### Example Usage |
| 23 | + |
| 24 | +```{sh} |
| 25 | +./tensorflow_agent dataset --debug --verbose --publish=true --fail_on_error=true --gpu=true --batch_size=320 --model_name=BVLC-Reference-CaffeNet --model_version=1.0 --database_name=tx2_carml_step_trace --database_address=minsky1-1.csl.illinois.edu --publish_predictions=false --num_file_parts=8 --trace_level=STEP_TRACE |
| 26 | +``` |
| 27 | + |
| 28 | +- [ ]: TODO: allow one to specify the frameworks, models, and batch sizes from the command line |
| 29 | + |
| 30 | + |
| 31 | +### Running Evaluations on a single Framework / Model |
| 32 | + |
| 33 | +#### Example Usage |
| 34 | + |
| 35 | +```{sh} |
| 36 | +./tensorflow_agent dataset --debug --verbose --publish=true --fail_on_error=true --gpu=true --batch_size=320 --model_name=BVLC-Reference-CaffeNet --model_version=1.0 --database_name=tx2_carml_step_trace --database_address=minsky1-1.csl.illinois.edu --publish_predictions=false --num_file_parts=8 --trace_level=STEP_TRACE |
| 37 | +``` |
| 38 | + |
| 39 | + |
| 40 | +### Command line options |
| 41 | + |
| 42 | + |
| 43 | +``` |
| 44 | + -b, --batch_size int the batch size to use while performing inference (default 64) |
| 45 | + --database_address database.endpoints the address of the mongo database to store the results. By default the address in the config database.endpoints is used |
| 46 | + --database_name string the name of the database to publish the evaluation results to |
| 47 | + --dataset_category string the dataset category to use for prediction (default "vision") |
| 48 | + --dataset_name ilsvrc2012_validation the name of the dataset to perform the evaluations on. When using ilsvrc2012_validation, optimized versions of the dataset are used when the input network takes 224 or 22 (default "ilsvrc2012_validation") |
| 49 | + --fail_on_error turning on causes the process to terminate/exit upon first inference error. This is useful since some inferences will result in an error because they run out of memory |
| 50 | + --gpu whether to enable the gpu. An error is returned if the gpu is not available |
| 51 | + -h, --help help for dataset |
| 52 | + --model_name string the name of the model to use for prediction (default "BVLC-AlexNet") |
| 53 | + --model_version string the version of the model to use for prediction (default "1.0") |
| 54 | + --num_file_parts int the number of file parts to process. Setting file parts to a value other than -1 means that only the first num_file_parts * batch_size images are infered from the dataset. This is useful while performing performance evaluations, where only a few hundred evaluation samples are useful (default -1) |
| 55 | + -p, --partition_dataset_size int the chunk size to partition the input dataset. By default this is the same as the batch size |
| 56 | + --publish whether to publish the evaluation to database. Turning this off will not publish anything to the database. This is ideal for using carml within profiling tools or performing experiments where the terminal output is sufficient. (default true) |
| 57 | + --publish_predictions whether to publish prediction results to database. This will store all the probability outputs for the evaluation in the database which would be a few gigabytes of data for one dataset |
| 58 | + --trace_level string the trace level to use while performing evaluations (default "STEP_TRACE") |
| 59 | +``` |
| 60 | + |
| 61 | +## Model Names |
| 62 | + |
| 63 | +``` |
| 64 | +agent info models |
| 65 | +``` |
| 66 | + |
| 67 | + |
| 68 | +## Checking Divergence |
| 69 | + |
| 70 | + |
| 71 | + |
| 72 | +- [ ]: TODO |
| 73 | + |
| 74 | +To compare a single prediction's divergence you use |
| 75 | + |
| 76 | +``` |
| 77 | +agent database divergence --database_address=minsky1-1.csl.illinois.edu --database_name=carml --source=5a01fc48ca60cc797e63603c --target=5a0203f8ca60ccd42aa2a706 |
| 78 | +``` |
| 79 | + |
| 80 | + |
| 81 | + |
| 82 | +## Analysing / Summarizing Results |
| 83 | + |
| 84 | + |
| 85 | +- [ ]: TODO |
| 86 | + |
| 87 | +``` |
| 88 | +agent info evaluation --help |
| 89 | +``` |
0 commit comments