Skip to content

Incorrect example command in README , MNIST generate data script is broken and improving documentation in key parts of the code #7

@rmcavoy

Description

@rmcavoy

The readme says to run dnse the command
python main_sda.py --method dsnet --cfg cfg/digits-a.json --bb lenetplus --bs 256 --src MT --tgt MM --nc 10 --size 32 --log-itv 100 --dropout --hybridize
This needs a minor correction as "dsnet" should be replaced with "dsne" since dsnet is not currently an option for main_sda.py.

A larger problem is the the mnist gen_dataset.py script transforms the dataset into the incorrect format in several ways and consequentially errors out immediately when one tries to load the dataset with the example commands (I used dsne with train_sda.py)

  • It creates images that are batch, height, width .reshape(-1, rows, cols) (on line 16 of gen_dataset.py) instead of batch, height, width, channel (.reshape(-1, rows, cols,1)) which causes the resize transforms to fail
  • it stores a pkl of a dictionary with keys TR and TE that each contain their particular data in lists [img, lbl]. While this is the correct shape for deeper in the code (e.g. training_sda.py), load_pkl in utils/io.py requires that the data be unpacked with tr_x, tr_y, te_x, te_y = pkl.load(f)
  • It also by default uses self.__class__.__name__ instead of self.__class__.__name__.lower() when the cfg file expects a lower cast pickle file name and the class is all-caps by default.

As a more minor correction, it would also be nice if the gen_dataset.py file was located in the datasets folder as that is where the datasets should be located. It would also be good to provide a fully documented README example where you go through every step of the calculation (including the data moving and conversion) so users don't have to dig through the code and example inputs files to find out where they need to put the datasets and with what directory names.

Also for future maintainabity and expandability, it would be good if the commenting or variable naming in the important pieces of the code was expanded and clarified.

For example, at first glance for someone new to mxnet, the loop inside dnse's train_batch (ln 1018 of training_sda.py) appears to be using the zip function to loop over a list of samples in the minibatch but is actually looping over the list of ctx that the batch is split over. That may be obvious to someone very familiar with mxnet but will not be necessarily be obvious or readable at all for new users coming from e.g. pytorch where the gpu context split is handled seamlessly in the background. This confusion is also not helped by the naming scheme in e.g. dSNELoss (in custom layers) which refers to sizes as "bs_src" and "bs_tgt" or "N, K" and "M, K" which are only obvious, if A the user has not assumed the loop in train_batch is over the batch index and B the user has a good idea what variable refers to which already.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions