Skip to content

Required file structure... #9

@htjb

Description

@htjb

Currently, there are several hard coded file paths that could be better handled. For example, in preprocess.py the code requires the testing and training data and labels to be saved in a very rigorous file structure.

Some thoughts on better handling:

  • Users could be able to pass training and testing data and labels as numpy arrays or pandas tables to preprocess.py
  • train_data, train_labels, AFB, data_mins etc should be attributes of the process() object
  • process() can then be pickled/saved and passed by the user to network.nn()

The only potential issue is that a lot of the stuff generated by process() is needed later for later evaluation of the network. We want to maintain the separation between preprocessing and training, since there is a random element to the splitting and shuffling of training and testing data that we don't want to repeat when we resume training. In principle, the required components of process() could be made attributes of network.nn() and network.nn() could then be pickled and saved rather than process(). As part of pickling network.nn() in theory we could save the tensorflow model too as an attribute rather than separately saving it, but I will need to investigate this.

Needs some thought... Very related to #7, and I think resolving these two issues would lead to a more accessible code base, but would also constitute a breaking change.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions