Skip to content

Commit f144487

Browse files
author
Abdul Dakkak
committed
evaluations
1 parent d16e237 commit f144487

File tree

2 files changed

+90
-0
lines changed

2 files changed

+90
-0
lines changed

docs/_sidebar.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@
1313
- [Prediction](prediction.md)
1414
- Development
1515
- [Installation](installation.md)
16+
- [Evaluations](evaluations.md)
1617
- [Server Configuration](configuration.md)
1718
- [REST API](api.md)
1819
- Other

docs/evaluations.md

Lines changed: 89 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,89 @@
1+
# Evaluations
2+
3+
The CarML platform enables easy evaluations of both performance and accuracy of models across frameworks.
4+
The evaluation run using the CarML library, without the website components and are available as subcommands to each agent.
5+
Utility functions are available to help run the experiments, summarize and analyize the data, and visualize the results.
6+
7+
8+
?> Evaluations currently only run on datasets known by [DLDataset](https://github.com/rai-project/dldataset)
9+
10+
## Running Evaluations
11+
12+
One can run evaluations across different frameworks and models or on a single framework and model.
13+
Both commands are similar and we will show them here.
14+
15+
### Running Evaluations on all Frameworks / Models
16+
17+
[evaluate.go](https://github.com/rai-project/dlframework/blob/master/framework/cmd/server/evaluate.go) is a wrapper tool exists to make it easier to run evaluations across frameworks and model.
18+
One can specify the [frameworks, models, and batch sizes](https://github.com/rai-project/dlframework/blob/master/framework/cmd/server/evaluate.go#L31-L72) to use within the file.
19+
The program can then be run
20+
21+
22+
#### Example Usage
23+
24+
```{sh}
25+
./tensorflow_agent dataset --debug --verbose --publish=true --fail_on_error=true --gpu=true --batch_size=320 --model_name=BVLC-Reference-CaffeNet --model_version=1.0 --database_name=tx2_carml_step_trace --database_address=minsky1-1.csl.illinois.edu --publish_predictions=false --num_file_parts=8 --trace_level=STEP_TRACE
26+
```
27+
28+
- [ ]: TODO: allow one to specify the frameworks, models, and batch sizes from the command line
29+
30+
31+
### Running Evaluations on a single Framework / Model
32+
33+
#### Example Usage
34+
35+
```{sh}
36+
./tensorflow_agent dataset --debug --verbose --publish=true --fail_on_error=true --gpu=true --batch_size=320 --model_name=BVLC-Reference-CaffeNet --model_version=1.0 --database_name=tx2_carml_step_trace --database_address=minsky1-1.csl.illinois.edu --publish_predictions=false --num_file_parts=8 --trace_level=STEP_TRACE
37+
```
38+
39+
40+
### Command line options
41+
42+
43+
```
44+
-b, --batch_size int the batch size to use while performing inference (default 64)
45+
--database_address database.endpoints the address of the mongo database to store the results. By default the address in the config database.endpoints is used
46+
--database_name string the name of the database to publish the evaluation results to
47+
--dataset_category string the dataset category to use for prediction (default "vision")
48+
--dataset_name ilsvrc2012_validation the name of the dataset to perform the evaluations on. When using ilsvrc2012_validation, optimized versions of the dataset are used when the input network takes 224 or 22 (default "ilsvrc2012_validation")
49+
--fail_on_error turning on causes the process to terminate/exit upon first inference error. This is useful since some inferences will result in an error because they run out of memory
50+
--gpu whether to enable the gpu. An error is returned if the gpu is not available
51+
-h, --help help for dataset
52+
--model_name string the name of the model to use for prediction (default "BVLC-AlexNet")
53+
--model_version string the version of the model to use for prediction (default "1.0")
54+
--num_file_parts int the number of file parts to process. Setting file parts to a value other than -1 means that only the first num_file_parts * batch_size images are infered from the dataset. This is useful while performing performance evaluations, where only a few hundred evaluation samples are useful (default -1)
55+
-p, --partition_dataset_size int the chunk size to partition the input dataset. By default this is the same as the batch size
56+
--publish whether to publish the evaluation to database. Turning this off will not publish anything to the database. This is ideal for using carml within profiling tools or performing experiments where the terminal output is sufficient. (default true)
57+
--publish_predictions whether to publish prediction results to database. This will store all the probability outputs for the evaluation in the database which would be a few gigabytes of data for one dataset
58+
--trace_level string the trace level to use while performing evaluations (default "STEP_TRACE")
59+
```
60+
61+
## Model Names
62+
63+
```
64+
agent info models
65+
```
66+
67+
68+
## Checking Divergence
69+
70+
71+
72+
- [ ]: TODO
73+
74+
To compare a single prediction's divergence you use
75+
76+
```
77+
agent database divergence --database_address=minsky1-1.csl.illinois.edu --database_name=carml --source=5a01fc48ca60cc797e63603c --target=5a0203f8ca60ccd42aa2a706
78+
```
79+
80+
81+
82+
## Analysing / Summarizing Results
83+
84+
85+
- [ ]: TODO
86+
87+
```
88+
agent info evaluation --help
89+
```

0 commit comments

Comments
 (0)