Final Descriptive Notebook Report
Our project is a reproducible work on the original paper 'BEHRT: Transformer for Electronic Health Records'
The main task is to predict the multilabel of disease diagnosis of next visit given a patient's past visits on the MIMIC-III.
We perform the following tasks and an ablation study to understand feature importance.
- pre-trained masked language model for input features embedding
- downstream multilabel prediction of diagnosis code
-
For the pre-trained MLM, run the following code in order:
-
preprocess/create_a_data_set.py
-
task/MLM.py
The data output and model output is at output/ and modeloutput/ respectively.
-
-
For the downstream multilabel prediction of diagnosis code, run the following code in order:
-
preprocess/create_a_data_set.py
-
task/MLM.py
-
task/NextXVisit.py
The data output and model output is at output/ and modeloutput/ respectively.
-
-
For data visualization of feature embedding (diagnosis code), run the following code in order:
- visualization/t_SNE.py
-
For ablation study, run the following code in order:
- preprocess/create_data_set_ablation_delete_age.py
- task/MLM_ablation_delete_age.py
- task/NextXVisit_ablation_delete_age.py
-
Code at preprocess/create_a_data_set.py
-
Processed dataset output at output/*.pkl
-
dataloader at dataLoader/MLM.py
-
model at model/MLM.py
-
model training log at modeloutput/MLM_LOG
-
model checkpoint at modeloutput/MLM_MODEL
-
task at task/MLM.py
-
dataloader at dataLoader/NextXVisit.py
-
model at model/NextXVisit.py
-
model training and evaluation log at modeloutput/NextXVisit_LOG
-
model checkpoint at modeloutput/NextXVisit_MODEL
-
task at task/NextXVisit.py
-
Preprocess
- Code at preprocess/create_data_set_ablation_delete_age.py
- Processed dataset output at output_ablation_delete_age/*.pkl
-
MLM Task
- dataloader at dataLoader/MLM.py
- model at model/MLM.py
- model training log at modeloutput_ablation_delete_age/MLM_LOG
- model checkpoint at modeloutput_ablation_delete_age/MLM_MODEL
- task at task/MLM_ablation_delete_age.py
-
NextXVisit
- dataloader at dataLoader/NextXVisit.py
- model at model/NextXVisit.py
- model training and evaluation log at modeloutput_ablation_delete_age/NextXVisit_LOG
- model checkpoint at modeloutput/_ablation_delete_ageNextXVisit_MODEL
- task at task/NextXVisit_ablation_delete_age.py