Skip to content

[CS 598 DL4H] Add new datasets for Question Answering and Heart Disease #360

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

mguan2020
Copy link

@mguan2020 mguan2020 commented Apr 29, 2025

Authors (with net id): Don-Thuan Le (dlte2) Matthew Guan (mg95)
Type of contribution: dataset

Paper Name: Uncertainty-Aware Text-to-Program for Question Answering on Structured Electronic Health Records
Paper Link: https://arxiv.org/abs/2203.06918

Description:
This Pull Request does the following:

  • Add new dataset used for our DLH course project, along with tests.
  • Adds new dataset and task for heart disease prediction. The dataset is from Kaggle (link: https://www.kaggle.com/datasets/krishujeniya/heart-diseae). In order to test the correctness of the task (and therefore the dataset), I included a main function in the task file to print out the first sample from the dataset.

Files changed:
datasets/qa_dataset.py
pyhealth/datasets/init.py
datasets/heart_disease.py
tasks/heart_disease_prediction.py
configs/heart_disease.yaml
tasks/init.py

Once tested, the output will look something like the following:
sampleqa

myout

@mguan2020 mguan2020 changed the title [CS 598 DL4H] Add new dataset and task for heart disease [CS 598 DL4H] Add new datasets for Question Answering and Heart Disease Apr 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants