- Start a Ubuntu 20.04 Large Instance of type t3.large or bigger with EBSVolumeSize of 100 GB.
- SSH into the instance.
- Run the following commands
sudo apt update
sudo apt -y install python3-pip
git clone https://github.com/ben-dom393/teamedward.git
cd teamedward
pip install -r requirements.txt
# Make sure you are in teamedward/ directory
python3 predict_script.py sample_dataset.json predictions.csv
Output: predictions.csv
with the columns transcript_id, transcript_position and score (i.e. probability of m6A modification). Stored in current directory.
Description: Generate predictions for RNA-seq data
Usage: predict_script.py [-h] [-m MODEL] [-s SCALER] [-e ENCODER] json_data_dir output_dir
positional arguments:
json_data_dir File path for RNA-seq data (.json)
output_dir File path for predictions output (.csv)
optional arguments:
-h, --help show this help message and exit
-m MODEL, --model MODEL
File path for fitted model object (.h5). Default: models/fitted_model.h5
-s SCALER, --scaler SCALER
File path for fitted scaler object (.pkl). Default: models/fitted_scaler.pkl
-e ENCODER, --encoder ENCODER
File path for fitted one-hot encoder object (.pkl). Default: models/fitted_encoder.pkl
# Make sure you are in teamedward/ directory
python3 train_script.py sample_dataset.json data.info model1
Output: Fitted Keras model model1_model.h5
, scaler model1_scaler.pkl
and one-hot encoder model1_encoder.pkl
. Stored in current directory.
Description: Train a ML model to predict m6A modification
Usage: train_script.py [-h] json_data_dir data_info_dir model_name
positional arguments:
json_data_dir File path for RNA-seq data (.json)
data_info_dir File path for m6A labels (.info)
model_name Name of model. The scaler and encoder would be named after this as well.
optional arguments:
-h, --help show this help message and exit