create airflow dag and dbt model for openFDA pregnancy category #387
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR introduces functionality to integrate drug pregnancy category information from the OpenFDA API into SageRx
Resolves #ISSUE NUMBER
Explanation
Airflow DAG for OpenFDA Pregnancy Categories:
* This DAG extracts data from the OpenFDA API's
drug/label
endpoint, specifically searching for records containingteratogenic_effects
.* It parses the pregnancy category (A, B, C, D, X) from the text.
* It formats associated NDCs to the 11-digit standard.
* The extracted data (NDC, RXCUI if available, Pregnancy Category) is saved to a JSON file.
* A subsequent task loads this data into the
sagerx_lake.openfda_pregnancy_categories
table.dbt Model:
* This model joins the
openfda_pregnancy_categories
data with theint_rxnorm_clinical_products_to_ndcs
intermediate model to link pregnancy categories to clinical product RXCUIs via NDCs.Tests