Merge branch 'release/2.1.0'

dermatologist · Feb 6, 2023 · 8edaa99 · 8edaa99
2 parents eaa3c28 + da2b138
commit 8edaa99
Show file tree

Hide file tree

Showing 17 changed files with 662 additions and 58 deletions.
diff --git a/.github/workflows/docs.yml b/.github/workflows/docs.yml
@@ -22,8 +22,9 @@ jobs:
     - name: Create docs
       run: |
         make -C docs/ html
+        cp docs/_config.yml docs/_build/html/_config.yml
     - name: Deploy Docs 🚀
       uses: JamesIves/[email protected]
       with:
         branch: gh-pages # The branch the action should deploy to.
-        folder: docs/_build/html # The folder the action should deploy.
+        folder: docs/_build/html # The folder the action should deploy.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,8 +1,8 @@
 # Changelog
 
-## [Unreleased](https://github.com/dermatologist/fhiry/tree/HEAD)
+## [2.0.0](https://github.com/dermatologist/fhiry/tree/2.0.0) (2022-03-17)
 
-[Full Changelog](https://github.com/dermatologist/fhiry/compare/1.0.0...HEAD)
+[Full Changelog](https://github.com/dermatologist/fhiry/compare/1.0.0...2.0.0)
 
 **Closed issues:**
 

diff --git a/README.md b/README.md
@@ -1,37 +1,71 @@
-# :fire: fhiry - FHIR for AI and ML
+# :fire: fhiry - FHIR to pandas dataframe for data analysis, AI and ML
 
 ![Libraries.io SourceRank](https://img.shields.io/librariesio/sourcerank/pypi/fhiry)
 [![PyPI download total](https://img.shields.io/pypi/dm/fhiry.svg)](https://pypi.python.org/pypi/fhiry/)
 ![GitHub tag (latest by date)](https://img.shields.io/github/v/tag/dermatologist/fhiry)
 
-## About
+## Open Source Python library for import of FHIR resources to pandas dataframe
 
 [Bulk data export using FHIR](https://hl7.org/fhir/uv/bulkdata/export/index.html) may be important if you want to export a cohort for analysis or machine learning.
-:fire: **Fhiry** is a python package to facilitate this by converting a folder of FHIR bundles/ndjson into a pandas data frame for analysis and importing
+:fire: **Fhiry** is a [python](https://www.python.org/) package to facilitate this by converting a folder of [FHIR bundles](https://www.hl7.org/fhir/bundle.html)/ndjson into a [pandas](https://pandas.pydata.org/docs/user_guide/index.html) data frame for analysis and importing
 into ML packages such as Tensorflow and PyTorch. Test it with the [synthea sample](https://synthea.mitre.org/downloads) or the downloaded ndjson from the [SMART Bulk data server](https://bulk-data.smarthealthit.org/). Use the 'Discussions' tab above for feature requests.
 
 ## Installation
 
-```
+```shell
 pip install fhiry
 ```
 
 ## Usage
 
-### Synthea
+### Import FHIR bundles (JSON) from folder to pandas dataframe
 
-```
+```python
 import fhiry.parallel as fp
 df = fp.process('/path/to/fhir/resources')
 print(df.info())
 ```
 
-### [SMART Bulk Data Server](https://bulk-data.smarthealthit.org/) Export
-```
+Example source data set: [Synthea](https://synthea.mitre.org/downloads)
+
+Jupyter notebook example: [`notebooks/synthea.ipynb`](notebooks/synthea.ipynb)
+
+### Import NDJSON from folder to pandas dataframe
+
+```python
 import fhiry.parallel as fp
 df = fp.ndjson('/path/to/fhir/ndjson/files')
 print(df.info())
 ```
+
+Example source data set: [SMART Bulk Data Server](https://bulk-data.smarthealthit.org/) Export
+
+Jupyter notebook example: [`notebooks/ndjson.ipynb`](notebooks/ndjson.ipynb)
+
+### Import FHIR Search results to pandas dataframe
+
+Fetch and import resources from [FHIR Search API](https://www.hl7.org/fhir/search.html) results to pandas dataframe.
+
+Documentation: [`fhir-search.md`](fhir-search.md)
+
+#### Example: Import all conditions with a certain code from FHIR Server
+
+Fetch and import all condition resources with Snomed (Codesystem `http://snomed.info/sct`) Code `39065001` in the FHIR element `Condition.code` ([resource type specific FHIR search parameter `code`](https://www.hl7.org/fhir/condition.html#search)) to a pandas dataframe:
+
+```python
+from fhiry.fhirsearch import Fhirsearch
+
+fs = Fhirsearch(fhir_base_url = "http://fhir-server:8080/fhir")
+
+my_fhir_search_parameters = {
+    "code": "http://snomed.info/sct|39065001",
+}
+
+df = fs.search(resource_type = "Condition", search_parameters = my_fhir_search_parameters)
+
+print(df.info())
+```
+
 ## Columns
 * see df.columns
 
@@ -49,8 +83,10 @@ resource.gender
 ```
 
 ### [Documentation](https://dermatologist.github.io/fhiry/)
+
 ## Contributors
 
 * [Bell Eapen](https://nuchange.ca) | [![Twitter Follow](https://img.shields.io/twitter/follow/beapen?style=social)](https://twitter.com/beapen)
+* [Markus Mandalka](https://github.com/Mandalka)
 * WIP, PR welcome, please see CONTRIBUTING.md
-* [![forthebadge](https://forthebadge.com/images/badges/built-with-love.svg) using CC](https://computecanada.ca)
+* [![forthebadge](https://forthebadge.com/images/badges/built-with-love.svg) using CC](https://computecanada.ca)
diff --git a/dev-requirements.in b/dev-requirements.in
@@ -3,6 +3,7 @@
 pytest-cov
 pytest
 recommonmark
+responses
 sphinx>=3.2.1
 setuptools
 setuptools_scm

diff --git a/dev-requirements.txt b/dev-requirements.txt
@@ -1,6 +1,6 @@
 #
-# This file is autogenerated by pip-compile with python 3.8
-# To update, run:
+# This file is autogenerated by pip-compile with Python 3.8
+# by the following command:
 #
 #    pip-compile dev-requirements.in
 #
@@ -34,6 +34,8 @@ idna==3.2
     # via requests
 imagesize==1.2.0
     # via sphinx
+importlib-metadata==5.1.0
+    # via sphinx
 iniconfig==1.1.1
     # via pytest
 jinja2==3.0.1
@@ -60,20 +62,24 @@ pygments==2.10.0
     # via sphinx
 pyparsing==2.4.7
     # via packaging
-pytest==7.1.0
+pytest==7.1.2
     # via
     #   -r dev-requirements.in
     #   pytest-cov
 pytest-cov==3.0.0
     # via -r dev-requirements.in
-pytz==2021.3
+pytz==2022.6
     # via
     #   -c requirements.txt
     #   babel
 recommonmark==0.7.1
     # via -r dev-requirements.in
 requests==2.26.0
-    # via sphinx
+    # via
+    #   responses
+    #   sphinx
+responses==0.22.0
+    # via -r dev-requirements.in
 setuptools-scm==6.4.2
     # via -r dev-requirements.in
 six==1.16.0
@@ -102,19 +108,26 @@ sphinxcontrib-serializinghtml==1.1.5
 toml==0.10.2
     # via
     #   coverage
+    #   responses
     #   tox
 tomli==1.2.1
     # via
     #   pytest
     #   setuptools-scm
-tox==3.24.5
+tox==3.25.0
     # via -r dev-requirements.in
+types-toml==0.10.8.1
+    # via responses
 urllib3==1.26.6
-    # via requests
+    # via
+    #   requests
+    #   responses
 virtualenv==20.8.0
     # via tox
 wheel==0.37.1
     # via -r dev-requirements.in
+zipp==3.11.0
+    # via importlib-metadata
 
 # The following packages are considered to be unsafe in a requirements file:
 # setuptools
diff --git a/docs/_config.yml b/docs/_config.yml
@@ -0,0 +1,2 @@
+theme: jekyll-theme-leap-day
+include: [_sources, _modules, _static]
diff --git a/fhir-search.md b/fhir-search.md
@@ -0,0 +1,107 @@
+# Import FHIR search results to pandas dataframe
+
+Import resources from [FHIR Search API](https://www.hl7.org/fhir/search.html) results to [pandas](https://pandas.pydata.org/docs/user_guide/index.html) dataframe by [fhiry](README.md):
+
+## FHIR search query parameters
+
+For filter options you can set by `search_parameters` see [FHIR search common parameters for all resource types](https://www.hl7.org/fhir/search.html#standard) and additional FHIR search parameters for certain resource types like [Patient](https://www.hl7.org/fhir/patient.html#search), [Condition](https://www.hl7.org/fhir/condition.html#search), [Observation](https://www.hl7.org/fhir/observation.html#search), ...
+
+## Example: Import all observations from FHIR server
+
+Fetch and import all resources (since empty search parameters / no filter) of type Observation to a pandas dataframe:
+
+```python
+from fhiry.fhirsearch import Fhirsearch
+
+fs = Fhirsearch(fhir_base_url = "http://fhir-server:8080/fhir")
+
+df = fs.search(resource_type = "Observation", search_parameters = {})
+
+print(df.info())
+```
+
+## Example: Import all conditions with a certain code from FHIR server
+
+Fetch and import all condition resources with Snomed (Codesystem `http://snomed.info/sct`) Code `39065001` in the FHIR element `Condition.code` ([resource type specific FHIR search parameter `code`](https://www.hl7.org/fhir/condition.html#search)) to a pandas dataframe:
+
+```python
+from fhiry.fhirsearch import Fhirsearch
+
+fs = Fhirsearch(fhir_base_url = "http://fhir-server:8080/fhir")
+
+my_fhir_search_parameters = {
+    "code": "http://snomed.info/sct|39065001",
+}
+
+df = fs.search(resource_type = "Condition", search_parameters = my_fhir_search_parameters)
+
+print(df.info())
+```
+
+## Columns
+* see [`df.columns`](README.md#columns)
+
+## Connection settings
+
+To set connection parameters like authentication, SSL certificates, proxies and so on, set or add standard [Python requests](https://requests.readthedocs.io/en/latest/) keyword arguments to the property `requests_kwargs`.
+
+Examples:
+
+### Authentication
+
+Authentication is set by [requests parameter `auth`](https://requests.readthedocs.io/en/latest/user/authentication/).
+
+Example using [HTTP Basic Auth](https://requests.readthedocs.io/en/latest/user/authentication/#basic-authentication):
+
+```python
+from fhiry.fhirsearch import Fhirsearch
+
+fs = Fhirsearch(fhir_base_url = "http://fhir-server:8080/fhir")
+
+# Set basic auth credentials (https://requests.readthedocs.io/en/latest/user/authentication/#basic-authentication)
+fs.requests_kwargs["auth"] = ('myUser', 'myPassword')
+```
+
+### Proxy settings
+
+You can set HTTP(S)-Proxies by [requests parameter `proxies`](https://requests.readthedocs.io/en/latest/user/advanced/#proxies).
+
+Example:
+
+```python
+fs.requests_kwargs["proxies"] = {
+    'http': 'http://10.10.1.10:3128',
+    'https': 'http://10.10.1.10:1080',
+}
+```
+
+## Performance
+
+### Fetching all found resources from FHIR server
+
+Since such search calls are fetching all found resources of the searched resource type matching the fhir search parameters (if none, fetching all resources of the resource type) from the FHIR server, dependent on the performance of the FHIR Server for example fetching one million resources by FHIR search (page thorough all the search results pages) can take an hour to load the resources into the resulting pandas dataframe which for this example has a RAM usage of few hundred MB RAM.
+
+### Decrease RAM usage
+
+If you want to analyze only certain elements, you can decrease RAM usage and network overhead by defining the elements you need for your data analysis by the [FHIR search option `_elements`](https://www.hl7.org/fhir/search.html#elements).
+
+Example:
+
+```python
+from fhiry.fhirsearch import Fhirsearch
+
+fs = Fhirsearch(fhir_base_url = "http://fhir-server:8080/fhir")
+
+my_fhir_search_parameters = {
+```
+... Other FHIR search parameters / filters ...
+
+```python
+
+    "_elements": "code,verification-status,recorded-date",
+}
+
+df = fs.search(resource_type = "Condition", search_parameters = my_fhir_search_parameters)
+
+print(df.info())
+```
diff --git a/requirements.txt b/requirements.txt
@@ -1,16 +1,16 @@
 #
-# This file is autogenerated by pip-compile with python 3.8
-# To update, run:
+# This file is autogenerated by pip-compile with Python 3.8
+# by the following command:
 #
 #    pip-compile
 #
-numpy==1.22.3
+numpy==1.23.5
     # via pandas
-pandas==1.4.1
+pandas==1.5.2
     # via fhiry (setup.py)
 python-dateutil==2.8.2
     # via pandas
-pytz==2021.3
+pytz==2022.6
     # via pandas
 six==1.16.0
     # via python-dateutil
diff --git a/src/fhiry/fhirndjson.py b/src/fhiry/fhirndjson.py
@@ -75,7 +75,7 @@ def convert_object_to_list(self):
                 del self._df[col]
 
     def add_patient_id(self):
-        """Create a patientId column with the resource.id of the first Patient resource
+        """Create a patientId column with the id if a Patient resource or with the subject.reference if other resource type
         """
         self._df['patientId'] = self._df.apply(lambda x: x['id'] if x['resourceType']
                                                == 'Patient' else self.check_subject_reference(x), axis=1)
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,2 @@
		theme: jekyll-theme-leap-day
		include: [_sources, _modules, _static]