Skip to content

EdinburghClinicalNLP/chatgpt_icd_coding

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ICD Coding using ChatGPT

This repository contains the code to predict ICD-10 codes from clinical notes using ChatGPT

Table of Contents

🛠️ Setup

Cloning the codebase

git clone --recurse-submodules https://github.com/aryopg/chatgpt_icd_coding.git

Python packages

This codebase requires multiple dependencies.

Dependencies
- pip
- numpy
- pydantic
- python-dotenv
- black
- isort
- tqdm
- pandas
- matplotlib
- scikit-learn

We opted to use conda as our package manager. The following will install the necessary dependencies:

conda env create -f environment.yaml
conda activate chatgpt_icd_coding

Environment variables

There are multiple environment variables required to run the training:

  • AZURE_OPENAI_KEY: The key to access the OpenAI deployment on Azure.
  • AZURE_OPENAI_ENDPOINT: The endpoint URL to access the exact OpenAI deployment on Azure.

We use the python-dotenv package to load these environment variables. To set them:

mkdir env
nano env/.env

Write down all of the mentioned environment variables with the appropriate values inside that file. Certainly, you don't have to use nano, as long as the file name (env/.env) remains the same.

💾 Dataset

The datasets are generated using: https://github.com/joakimedin/medical-coding-reproducibility We only used the test split for the inference.

🤖 Inference

To run the prediction, we need a config file that contains the hyperparameters of ChatGPT. See the example below for a "deterministic" prediction run (temperature == 0, top_p == 0):

python scripts/inference.py --config_filepath configs/deterministic_chatgpt_mimic_iv_coding_system_user.yaml

🔬 Evaluation

After the inference, there will be an output folder containing all the predictions made by ChatGPT (e.g. outputs/2023_08_25__09_19_37/predictions). We need the path to that folder as well as the path to the ground-truth test split (e.g. data/disch_raw_test_split.csv) to provide the true labels.

python scripts/evaluate.py --predictions_dir <PREDICTIONS_DIR> --groundtruth_path <GROUNDTRUTH_PATH>

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published