Disclaimer: The scripts and models presented here have only been tested on a very limited dataset. This repository is not production-ready and is not intended to be used for clinical diagnosis. Please use it responsibly and at your own risk.
This repository is being made available for free, with the hope that it will prove useful to someone during this ongoing global pandemic. Attribution is not required, but will be appreciated, if you find this repository useful.
If you have access to reliable PA/AP Chest X-Ray images, which are not included in the training data listed in the Data section below, that you would like to share, to help improve this model, please respond here.
The inference pipeline uses two models:
- Segmentation model (unet - resnet34)
- Classifier (resnet34)
Prediction is performed as follows:
- Lungs are identified in the input image by the segmentation model
- The bounding box is computed for the region containing the lungs
- The input image is cropped and some additional preprocessing is performed on the cropped image (CLAHE, thresholding)
- A prediction (COVID-19 / Normal / Pneumonia) is obtained from the classifier model, along with an optional heatmap
Here are a few examples, for a visual representation of the steps above
Confusion matrices for the results produced on two test sets, are given below.
Covid-Net test set
COVID-19 | Normal | Pneumonia | Sensitivity | |
---|---|---|---|---|
COVID-19 | 94 | 4 | 2 | 0.9400 |
Normal | 4 | 863 | 18 | 0.9751 |
Pneumonia | 5 | 46 | 543 | 0.9141 |
P.P.V. | 0.9126 | 0.9452 | 0.9645 |
Non-public test set + 20% of RICORD data
COVID-19 | Normal | Pneumonia | Sensitivity | |
---|---|---|---|---|
COVID-19 | 117 | 1 | 0 | 0.9915 |
Normal | 4 | 25 | 0 | 0.8621 |
Pneumonia | 0 | 0 | 0 | - |
P.P.V. | 0.9669 | 0.9615 | - |
The env
folder contains scripts to help set up an environment, for using the code in this repository, on an Ubuntu 18.04 host. These scripts may work under other debian-based distros as well, but have not been tested. In any case, it should be trivial to adapt these scripts to work under most environments.
setup.sh
- This script is meant to be run as rootsetup-user.sh
- Run this script as the user that will use the repository. By default, the CPU version ofpytorch
is installed. To install the CUDA (v10.1) version, before running the script, comment out the line:and uncomment the line that reads:pip3 install torch==1.6.0+cpu torchvision==0.7.0+cpu -f https://download.pytorch.org/whl/torch_stable.html
pip3 install torch==1.6.0+cu101 torchvision==0.7.0+cu101 -f https://download.pytorch.org/whl/torch_stable.html
download-models.sh
- Download the latest version of the trained models. This is invoked automatically when you runsetup-user.sh
but is provided as a separate script to simplify acquisition of new models when they become available.
Once the environment is set up correctly, it should be possible to run inference.py
from the inference
folder to produce predictions on individual images or folders containing images.
python3 inference.py --help
usage: inference.py [-h] --config CONFIG --xraypath XRAYPATH
[--heatmappath HEATMAPPATH]
COVID-19_CXR_AI Inference
optional arguments:
-h, --help show this help message and exit
--config CONFIG Config file path
--xraypath XRAYPATH Full path to image (or dir containing images) to be
inferenced
--heatmappath HEATMAPPATH
Directory in which generated heatmaps are to be stored
Provided that the models are placed in the default location i.e. the models/current
folder, it should be possible to use the included model-config.json
file as-is.
Ideally, use full-sized X-Ray images in PNG format.
Should you wish to train the models further or retrain from scratch, the public data that was used for training is listed in the Data section below. The following notebooks can serve as guidelines for training:
segmentation/segmentation-train.ipynb
classification/classifier-train.ipynb
The notebooks used to create usable datasets for training, are in the datasets
folder. Please note that these notebooks create hard links to the original images, to avoid duplication. Therefore, it is advisable to put the final datasets on the same logical disk partition, as the original images.
The data used for training, was acquired from the sources listed below.
- NLM Tuberculosis Chest X-ray Image Data Sets
- Shenzhen subset segmentation masks
- Additional non-public, manually segmented (using Fiji) images
Compiled by the Covid-Net team:
Additional
To create the training datasets:
- Download images from the above links
- Convert DICOM images to PNG using tools of your choice (e.g.
mogrify
orconvert
fromimagemagick
). Please note thatcreate_COVIDx_v2_RICORD.ipynb
expects the converted RICORD images to retain the original folder structure. - Specify appropriate paths in
segmentation-prepare.ipynb
and run the notebook to create training data for segmentation - Specify appropriate paths in
create_COVIDx_v2_RICORD.ipynb
and run the notebook to create a Covid-Net style dataset - Specify appropriate paths in
segmentation-apply.ipynb
and run the notebook to- apply segmentation to the classification training images
- save lung bounds for all the training images and
- transform the dataset into the expected form for classifier training
- The Covid-Net project for their pioneering work in this field and for creating a comprehensive collection of training data