Incorrect example command in README , MNIST generate data script is broken and improving documentation in key parts of the code

The readme says to run dnse the command 
```python main_sda.py --method dsnet --cfg cfg/digits-a.json --bb lenetplus --bs 256 --src MT --tgt MM --nc 10 --size 32 --log-itv 100 --dropout --hybridize```
This needs a minor correction as "dsnet" should be replaced with "dsne" since dsnet is not currently an option for main_sda.py.

A larger problem is the the mnist gen_dataset.py script transforms the dataset into the incorrect format in several ways and consequentially errors out immediately when one tries to load the dataset with the example commands (I used dsne with train_sda.py)

- It creates images that are batch, height, width `.reshape(-1, rows, cols)` (on line 16 of gen_dataset.py)  instead of batch, height, width, channel (`.reshape(-1, rows, cols,1)`) which causes the resize transforms to fail
- it stores a pkl of a dictionary with keys TR and TE that each contain their particular data in lists [img, lbl]. While this is the correct shape for deeper in the code (e.g. training_sda.py), load_pkl in utils/io.py requires that the data be unpacked with `tr_x, tr_y, te_x, te_y = pkl.load(f)`
- It also by default uses `self.__class__.__name__` instead of `self.__class__.__name__.lower()` when the cfg file expects a lower cast pickle file name and the class is all-caps by default.

As a more minor correction, it would also be nice if the gen_dataset.py file was located in the datasets folder as that is where the datasets should be located. It would also be good to provide a fully documented README example where you go through every step of the calculation (including the data moving and conversion) so users don't have to dig through the code and example inputs files to find out where they need to put the datasets and with what directory names.

Also for future maintainabity and expandability, it would be good if the commenting or variable naming in the important pieces of the code was expanded and clarified. 

For example, at first glance for someone new to mxnet, the loop inside dnse's train_batch (ln 1018 of training_sda.py) appears to be using the zip function to loop over a list of samples in the minibatch but is actually looping over the list of ctx that the batch is split over. That may be obvious to someone very familiar with mxnet but will not be necessarily be obvious or readable at all for new users coming from e.g. pytorch  where the gpu context split is handled seamlessly in the background. This confusion is also not helped by the naming scheme in e.g. dSNELoss (in custom layers) which refers to sizes as "bs_src" and "bs_tgt" or "N, K" and "M, K" which are only obvious, if A the user has not assumed the loop in train_batch is over the batch index and B the user has a good idea what variable refers to which already.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect example command in README , MNIST generate data script is broken and improving documentation in key parts of the code #7

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Incorrect example command in README , MNIST generate data script is broken and improving documentation in key parts of the code #7

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions