Skip to content

Commit

Permalink
Merge branch 'release/2.1.0'
Browse files Browse the repository at this point in the history
  • Loading branch information
dermatologist committed Feb 6, 2023
2 parents eaa3c28 + da2b138 commit 8edaa99
Show file tree
Hide file tree
Showing 17 changed files with 662 additions and 58 deletions.
3 changes: 2 additions & 1 deletion .github/workflows/docs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,8 +22,9 @@ jobs:
- name: Create docs
run: |
make -C docs/ html
cp docs/_config.yml docs/_build/html/_config.yml
- name: Deploy Docs 🚀
uses: JamesIves/[email protected]
with:
branch: gh-pages # The branch the action should deploy to.
folder: docs/_build/html # The folder the action should deploy.
folder: docs/_build/html # The folder the action should deploy.
4 changes: 2 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# Changelog

## [Unreleased](https://github.com/dermatologist/fhiry/tree/HEAD)
## [2.0.0](https://github.com/dermatologist/fhiry/tree/2.0.0) (2022-03-17)

[Full Changelog](https://github.com/dermatologist/fhiry/compare/1.0.0...HEAD)
[Full Changelog](https://github.com/dermatologist/fhiry/compare/1.0.0...2.0.0)

**Closed issues:**

Expand Down
54 changes: 45 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,37 +1,71 @@
# :fire: fhiry - FHIR for AI and ML
# :fire: fhiry - FHIR to pandas dataframe for data analysis, AI and ML

![Libraries.io SourceRank](https://img.shields.io/librariesio/sourcerank/pypi/fhiry)
[![PyPI download total](https://img.shields.io/pypi/dm/fhiry.svg)](https://pypi.python.org/pypi/fhiry/)
![GitHub tag (latest by date)](https://img.shields.io/github/v/tag/dermatologist/fhiry)

## About
## Open Source Python library for import of FHIR resources to pandas dataframe

[Bulk data export using FHIR](https://hl7.org/fhir/uv/bulkdata/export/index.html) may be important if you want to export a cohort for analysis or machine learning.
:fire: **Fhiry** is a python package to facilitate this by converting a folder of FHIR bundles/ndjson into a pandas data frame for analysis and importing
:fire: **Fhiry** is a [python](https://www.python.org/) package to facilitate this by converting a folder of [FHIR bundles](https://www.hl7.org/fhir/bundle.html)/ndjson into a [pandas](https://pandas.pydata.org/docs/user_guide/index.html) data frame for analysis and importing
into ML packages such as Tensorflow and PyTorch. Test it with the [synthea sample](https://synthea.mitre.org/downloads) or the downloaded ndjson from the [SMART Bulk data server](https://bulk-data.smarthealthit.org/). Use the 'Discussions' tab above for feature requests.

## Installation

```
```shell
pip install fhiry
```

## Usage

### Synthea
### Import FHIR bundles (JSON) from folder to pandas dataframe

```
```python
import fhiry.parallel as fp
df = fp.process('/path/to/fhir/resources')
print(df.info())
```

### [SMART Bulk Data Server](https://bulk-data.smarthealthit.org/) Export
```
Example source data set: [Synthea](https://synthea.mitre.org/downloads)

Jupyter notebook example: [`notebooks/synthea.ipynb`](notebooks/synthea.ipynb)

### Import NDJSON from folder to pandas dataframe

```python
import fhiry.parallel as fp
df = fp.ndjson('/path/to/fhir/ndjson/files')
print(df.info())
```

Example source data set: [SMART Bulk Data Server](https://bulk-data.smarthealthit.org/) Export

Jupyter notebook example: [`notebooks/ndjson.ipynb`](notebooks/ndjson.ipynb)

### Import FHIR Search results to pandas dataframe

Fetch and import resources from [FHIR Search API](https://www.hl7.org/fhir/search.html) results to pandas dataframe.

Documentation: [`fhir-search.md`](fhir-search.md)

#### Example: Import all conditions with a certain code from FHIR Server

Fetch and import all condition resources with Snomed (Codesystem `http://snomed.info/sct`) Code `39065001` in the FHIR element `Condition.code` ([resource type specific FHIR search parameter `code`](https://www.hl7.org/fhir/condition.html#search)) to a pandas dataframe:

```python
from fhiry.fhirsearch import Fhirsearch

fs = Fhirsearch(fhir_base_url = "http://fhir-server:8080/fhir")

my_fhir_search_parameters = {
"code": "http://snomed.info/sct|39065001",
}

df = fs.search(resource_type = "Condition", search_parameters = my_fhir_search_parameters)

print(df.info())
```

## Columns
* see df.columns

Expand All @@ -49,8 +83,10 @@ resource.gender
```

### [Documentation](https://dermatologist.github.io/fhiry/)

## Contributors

* [Bell Eapen](https://nuchange.ca) | [![Twitter Follow](https://img.shields.io/twitter/follow/beapen?style=social)](https://twitter.com/beapen)
* [Markus Mandalka](https://github.com/Mandalka)
* WIP, PR welcome, please see CONTRIBUTING.md
* [![forthebadge](https://forthebadge.com/images/badges/built-with-love.svg) using CC](https://computecanada.ca)
* [![forthebadge](https://forthebadge.com/images/badges/built-with-love.svg) using CC](https://computecanada.ca)
1 change: 1 addition & 0 deletions dev-requirements.in
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
pytest-cov
pytest
recommonmark
responses
sphinx>=3.2.1
setuptools
setuptools_scm
Expand Down
27 changes: 20 additions & 7 deletions dev-requirements.txt
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
#
# This file is autogenerated by pip-compile with python 3.8
# To update, run:
# This file is autogenerated by pip-compile with Python 3.8
# by the following command:
#
# pip-compile dev-requirements.in
#
Expand Down Expand Up @@ -34,6 +34,8 @@ idna==3.2
# via requests
imagesize==1.2.0
# via sphinx
importlib-metadata==5.1.0
# via sphinx
iniconfig==1.1.1
# via pytest
jinja2==3.0.1
Expand All @@ -60,20 +62,24 @@ pygments==2.10.0
# via sphinx
pyparsing==2.4.7
# via packaging
pytest==7.1.0
pytest==7.1.2
# via
# -r dev-requirements.in
# pytest-cov
pytest-cov==3.0.0
# via -r dev-requirements.in
pytz==2021.3
pytz==2022.6
# via
# -c requirements.txt
# babel
recommonmark==0.7.1
# via -r dev-requirements.in
requests==2.26.0
# via sphinx
# via
# responses
# sphinx
responses==0.22.0
# via -r dev-requirements.in
setuptools-scm==6.4.2
# via -r dev-requirements.in
six==1.16.0
Expand Down Expand Up @@ -102,19 +108,26 @@ sphinxcontrib-serializinghtml==1.1.5
toml==0.10.2
# via
# coverage
# responses
# tox
tomli==1.2.1
# via
# pytest
# setuptools-scm
tox==3.24.5
tox==3.25.0
# via -r dev-requirements.in
types-toml==0.10.8.1
# via responses
urllib3==1.26.6
# via requests
# via
# requests
# responses
virtualenv==20.8.0
# via tox
wheel==0.37.1
# via -r dev-requirements.in
zipp==3.11.0
# via importlib-metadata

# The following packages are considered to be unsafe in a requirements file:
# setuptools
2 changes: 2 additions & 0 deletions docs/_config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
theme: jekyll-theme-leap-day
include: [_sources, _modules, _static]
107 changes: 107 additions & 0 deletions fhir-search.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
# Import FHIR search results to pandas dataframe

Import resources from [FHIR Search API](https://www.hl7.org/fhir/search.html) results to [pandas](https://pandas.pydata.org/docs/user_guide/index.html) dataframe by [fhiry](README.md):

## FHIR search query parameters

For filter options you can set by `search_parameters` see [FHIR search common parameters for all resource types](https://www.hl7.org/fhir/search.html#standard) and additional FHIR search parameters for certain resource types like [Patient](https://www.hl7.org/fhir/patient.html#search), [Condition](https://www.hl7.org/fhir/condition.html#search), [Observation](https://www.hl7.org/fhir/observation.html#search), ...

## Example: Import all observations from FHIR server

Fetch and import all resources (since empty search parameters / no filter) of type Observation to a pandas dataframe:

```python
from fhiry.fhirsearch import Fhirsearch

fs = Fhirsearch(fhir_base_url = "http://fhir-server:8080/fhir")

df = fs.search(resource_type = "Observation", search_parameters = {})

print(df.info())
```

## Example: Import all conditions with a certain code from FHIR server

Fetch and import all condition resources with Snomed (Codesystem `http://snomed.info/sct`) Code `39065001` in the FHIR element `Condition.code` ([resource type specific FHIR search parameter `code`](https://www.hl7.org/fhir/condition.html#search)) to a pandas dataframe:

```python
from fhiry.fhirsearch import Fhirsearch

fs = Fhirsearch(fhir_base_url = "http://fhir-server:8080/fhir")

my_fhir_search_parameters = {
"code": "http://snomed.info/sct|39065001",
}

df = fs.search(resource_type = "Condition", search_parameters = my_fhir_search_parameters)

print(df.info())
```

## Columns
* see [`df.columns`](README.md#columns)

## Connection settings

To set connection parameters like authentication, SSL certificates, proxies and so on, set or add standard [Python requests](https://requests.readthedocs.io/en/latest/) keyword arguments to the property `requests_kwargs`.

Examples:

### Authentication

Authentication is set by [requests parameter `auth`](https://requests.readthedocs.io/en/latest/user/authentication/).

Example using [HTTP Basic Auth](https://requests.readthedocs.io/en/latest/user/authentication/#basic-authentication):

```python
from fhiry.fhirsearch import Fhirsearch

fs = Fhirsearch(fhir_base_url = "http://fhir-server:8080/fhir")

# Set basic auth credentials (https://requests.readthedocs.io/en/latest/user/authentication/#basic-authentication)
fs.requests_kwargs["auth"] = ('myUser', 'myPassword')
```

### Proxy settings

You can set HTTP(S)-Proxies by [requests parameter `proxies`](https://requests.readthedocs.io/en/latest/user/advanced/#proxies).

Example:

```python
fs.requests_kwargs["proxies"] = {
'http': 'http://10.10.1.10:3128',
'https': 'http://10.10.1.10:1080',
}
```

## Performance

### Fetching all found resources from FHIR server

Since such search calls are fetching all found resources of the searched resource type matching the fhir search parameters (if none, fetching all resources of the resource type) from the FHIR server, dependent on the performance of the FHIR Server for example fetching one million resources by FHIR search (page thorough all the search results pages) can take an hour to load the resources into the resulting pandas dataframe which for this example has a RAM usage of few hundred MB RAM.

### Decrease RAM usage

If you want to analyze only certain elements, you can decrease RAM usage and network overhead by defining the elements you need for your data analysis by the [FHIR search option `_elements`](https://www.hl7.org/fhir/search.html#elements).

Example:

```python
from fhiry.fhirsearch import Fhirsearch

fs = Fhirsearch(fhir_base_url = "http://fhir-server:8080/fhir")

my_fhir_search_parameters = {
```
... Other FHIR search parameters / filters ...

```python

"_elements": "code,verification-status,recorded-date",
}

df = fs.search(resource_type = "Condition", search_parameters = my_fhir_search_parameters)

print(df.info())
```
10 changes: 5 additions & 5 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -1,16 +1,16 @@
#
# This file is autogenerated by pip-compile with python 3.8
# To update, run:
# This file is autogenerated by pip-compile with Python 3.8
# by the following command:
#
# pip-compile
#
numpy==1.22.3
numpy==1.23.5
# via pandas
pandas==1.4.1
pandas==1.5.2
# via fhiry (setup.py)
python-dateutil==2.8.2
# via pandas
pytz==2021.3
pytz==2022.6
# via pandas
six==1.16.0
# via python-dateutil
2 changes: 1 addition & 1 deletion src/fhiry/fhirndjson.py
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@ def convert_object_to_list(self):
del self._df[col]

def add_patient_id(self):
"""Create a patientId column with the resource.id of the first Patient resource
"""Create a patientId column with the id if a Patient resource or with the subject.reference if other resource type
"""
self._df['patientId'] = self._df.apply(lambda x: x['id'] if x['resourceType']
== 'Patient' else self.check_subject_reference(x), axis=1)
Expand Down
Loading

0 comments on commit 8edaa99

Please sign in to comment.