Skip to content

create airflow dag and dbt model for openFDA pregnancy category #387

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

saywurdson
Copy link
Collaborator

This PR introduces functionality to integrate drug pregnancy category information from the OpenFDA API into SageRx

Resolves #ISSUE NUMBER

Explanation

Airflow DAG for OpenFDA Pregnancy Categories:
* This DAG extracts data from the OpenFDA API's drug/label endpoint, specifically searching for records containing teratogenic_effects.
* It parses the pregnancy category (A, B, C, D, X) from the text.
* It formats associated NDCs to the 11-digit standard.
* The extracted data (NDC, RXCUI if available, Pregnancy Category) is saved to a JSON file.
* A subsequent task loads this data into the sagerx_lake.openfda_pregnancy_categories table.

dbt Model:
* This model joins the openfda_pregnancy_categories data with the int_rxnorm_clinical_products_to_ndcs intermediate model to link pregnancy categories to clinical product RXCUIs via NDCs.

Tests

  1. What testing did you do? Ran in local version of SageRx. Was able to build table in less than 2 minutes.
    Screenshot 2025-04-25 at 3 48 49 PM

@saywurdson saywurdson requested a review from jrlegrand April 25, 2025 19:49
@saywurdson saywurdson self-assigned this Apr 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants