python modules to facilitate the reading of various databaese from PhysioNet, CPSC, NSRR, etc.
Migrated and improved from DeepPSP/database_reader
After migration, all should be tested again, the progression:
Database | Source | Implemented | Fully Tested1 | Has Dataset |
---|---|---|---|---|
AFDB | PhysioNet | ✔️ | ✔️ | ❌ |
ApneaECG | PhysioNet | ✔️ | ❌ | ❌ |
CinC2017 | PhysioNet | ✔️ | ❌ | ❌ |
CinC2018 | PhysioNet | ❌ | ❌ | ❌ |
CinC2020 | PhysioNet | ✔️ | ✔️ | ✔️ |
CinC2021 | PhysioNet | ✔️ | ✔️ | ✔️ |
LTAFDB | PhysioNet | ✔️ | ❌ | ❌ |
LUDB | PhysioNet | ✔️ | ✔️ | ✔️ |
MITDB | PhysioNet | ✔️ | ✔️ | ✔️ |
QTDB | PhysioNet | ✔️ | ✔️ | ❌ |
SHHS | NSRR | ✔️ | ❌ | ❌ |
CPSC2018 | CPSC | ✔️ | ✔️ | ❌ |
CPSC2019 | CPSC | ✔️ | ✔️ | ✔️ |
CPSC2020 | CPSC | ✔️ | ✔️ | ✔️ |
CPSC20212 | CPSC | ✔️ | ✔️ | ✔️ |
SPH | Figshare | ✔️ | ✔️ | ❌ |
CACHET-CADB | DTU | ✔️ | ❌ | ❌ |
>>> from torch_ecg.databases import CINC2021
>>> dr = CINC2021("/path/to/the/directory/of/CINC2021-data/) # one should call `dr.download()` if not downloaded yet
converting dtypes of columns `diagnosis` and `diagnosis_scored`...
>>> len(dr)
88253
>>> dr.load_data(0, leads=["I", "II"], data_format="channel_last", units="uv")
array([[ 28., 7.],
[ 39., 11.],
[ 45., 15.],
...,
[258., 248.],
[259., 249.],
[259., 250.]], dtype=float32)
>>> dr.load_ann(0)
{'rec_name': 'A0001',
'nb_leads': 12,
'fs': 500,
'nb_samples': 7500,
'datetime': datetime.datetime(2020, 5, 12, 12, 33, 59),
'age': 74,
'sex': 'Male',
'medical_prescription': 'Unknown',
'history': 'Unknown',
'symptom_or_surgery': 'Unknown',
'diagnosis': {'diagnosis_code': ['59118001'],
'diagnosis_abbr': ['RBBB'],
'diagnosis_fullname': ['right bundle branch block']},
'diagnosis_scored': {'diagnosis_code': ['59118001'],
'diagnosis_abbr': ['RBBB'],
'diagnosis_fullname': ['right bundle branch block']},
'df_leads': filename fmt byte_offset ... checksum block_size lead_name
I A0001.mat 16 24 ... -1716 0 I
II A0001.mat 16 24 ... 2029 0 II
III A0001.mat 16 24 ... 3745 0 III
aVR A0001.mat 16 24 ... 3680 0 aVR
aVL A0001.mat 16 24 ... -2664 0 aVL
aVF A0001.mat 16 24 ... -1499 0 aVF
V1 A0001.mat 16 24 ... 390 0 V1
V2 A0001.mat 16 24 ... 157 0 V2
V3 A0001.mat 16 24 ... -2555 0 V3
V4 A0001.mat 16 24 ... 49 0 V4
V5 A0001.mat 16 24 ... -321 0 V5
V6 A0001.mat 16 24 ... -3112 0 V6
[12 rows x 12 columns]}
>>> dr.get_labels(30000, scored_only=True, fmt="f") # full names
['sinus arrhythmia',
'right axis deviation',
'incomplete right bundle branch block']
>>> dr.get_labels(30000, scored_only=True, fmt="a") # abbreviations
['SA', 'RAD', 'IRBBB']
>>> dr.get_labels(30000, scored_only=False, fmt="s") # SNOMED CT Code
['427393009', '445211001', '47665007', '713426002']
Each Database
has the following basic functionalities
-
Download from data archive (mainly PhysioNet) using the
download
method>>> from torch_ecg.databases import MITDB >>> dr = MITDB(db_dir="/any/path/even/if/does/not/exists/") >>> # download the compressed zip file of MITDB >>> # and extract to `dr.db_dir` >>> dr.download(compressed=True)
-
Loading data and annotations using
load_data
andload_ann
respectively (ref. Basic Usage). -
Visualization using
plot
functions. -
Get citations of corresponding databases
>>> from torch_ecg.databases import CINC2021 >>> dr = CINC2021(db_dir="/any/path/even/if/does/not/exists/") >>> dr.get_citation() # default format is `bibtex` @inproceedings{Reyna_2021, title = {Will Two Do? Varying Dimensions in Electrocardiography: The {PhysioNet}/Computing in Cardiology Challenge 2021}, author = {Matthew A Reyna and Nadi Sadr and Erick A Perez Alday and Annie Gu and Amit J Shah and Chad Robichaux and Ali Bahrami Rad and Andoni Elola and Salman Seyedi and Sardar Ansari and Hamid Ghanbari and Qiao Li and Ashish Sharma and Gari D Clifford}, booktitle = {2021 Computing in Cardiology ({CinC})}, doi = {10.23919/cinc53138.2021.9662687}, year = {2021}, month = {9}, publisher = {{IEEE}} } @misc{https://doi.org/10.13026/jz9p-0m02, title = {Will Two Do? Varying Dimensions in Electrocardiography: The PhysioNet/Computing in Cardiology Challenge 2021}, author = {Reyna, Matthew and Sadr, Nadi and Gu, Annie and Perez Alday, Erick Andres and Liu, Chengyu and Seyedi, Salman and Shah, Amit and Clifford, Gari D.}, doi = {10.13026/JZ9P-0M02}, publisher = {PhysioNet}, year = {2022} } >>> dr.get_citation(format="text") # default style "apa" Reyna, M. A., Sadr, N., Alday, E. A. P., Gu, A., Shah, A. J., Robichaux, C., Rad, A. B., Elola, A., Seyedi, S., Ansari, S., Ghanbari, H., Li, Q., Sharma, A., & Clifford, G. D. (2021). Will Two Do? Varying Dimensions in Electrocardiography: The PhysioNet/Computing in Cardiology Challenge 2021. 2021 Computing in Cardiology (CinC). https://doi.org/10.23919/cinc53138.2021.9662687 Reyna, M., Sadr, N., Gu, A., Perez Alday, E. A., Liu, C., Seyedi, S., Shah, A., & Clifford, G. D. (2022). <i>Will Two Do? Varying Dimensions in Electrocardiography: The PhysioNet/Computing in Cardiology Challenge 2021</i> (Version 1.0.2) [Data set]. PhysioNet. https://doi.org/10.13026/JZ9P-0M02 >>> dr.get_citation(format="text", style="mla") Reyna, Matthew A., et al. “Will Two Do? Varying Dimensions in Electrocardiography: The PhysioNet/Computing in Cardiology Challenge 2021.” 2021 Computing in Cardiology (CinC), Sept. 2021. Crossref, https://doi.org/10.23919/cinc53138.2021.9662687. Reyna, M., Sadr, N., Gu, A., Perez Alday, E. A., Liu, C., Seyedi, S., Shah, A., & Clifford, G. D. (2022). <i>Will Two Do? Varying Dimensions in Electrocardiography: The PhysioNet/Computing in Cardiology Challenge 2021</i> (Version 1.0.2) [Data set]. PhysioNet. https://doi.org/10.13026/JZ9P-0M02
For a PhysioNetDataBase
, one has the helper
function for looking up annotation meanings
>>> from torch_ecg.databases import MITDB
>>> dr = MITDB(db_dir="/any/path/even/if/does/not/exists/")
>>> dr.helper("beat")
MIT-BIH Arrhythmia Database
--- helpler - beat ---
{ '/': 'Paced beat',
'?': 'Beat not classified during learning',
'A': 'Atrial premature beat',
'B': 'Bundle branch block beat (unspecified)',
'E': 'Ventricular escape beat',
'F': 'Fusion of ventricular and normal beat',
'J': 'Nodal (junctional) premature beat',
'L': 'Left bundle branch block beat',
'N': 'Normal beat',
'Q': 'Unclassifiable beat',
'R': 'Right bundle branch block beat',
'S': 'Supraventricular premature or ectopic beat (atrial or nodal)',
'V': 'Premature ventricular contraction',
'a': 'Aberrated atrial premature beat',
'e': 'Atrial escape beat',
'f': 'Fusion of paced and normal beat',
'j': 'Nodal (junctional) escape beat',
'n': 'Supraventricular escape beat (atrial or nodal)',
'r': 'R-on-T premature ventricular contraction'}
>>> dr.helper("rhythm")
MIT-BIH Arrhythmia Database
--- helpler - rhythm ---
{ '(AB': 'Atrial bigeminy',
'(AFIB': 'Atrial fibrillation',
'(AFL': 'Atrial flutter',
'(B': 'Ventricular bigeminy',
'(BII': '2° heart block',
'(IVR': 'Idioventricular rhythm',
'(N': 'Normal sinus rhythm',
'(NOD': 'Nodal (A-V junctional) rhythm',
'(P': 'Paced rhythm',
'(PREX': 'Pre-excitation (WPW)',
'(SBR': 'Sinus bradycardia',
'(SVTA': 'Supraventricular tachyarrhythmia',
'(T': 'Ventricular trigeminy',
'(VFL': 'Ventricular flutter',
'(VT': 'Ventricular tachycardia'}
- use the attribute
_df_records
to maintain paths, etc. uniformly
Footnotes
-
GitHub Action. Since the classes are migrated from DeepPSP/database_reader, some are not tested for newly added features. ↩