Using Natural Language Processing to Indentify Patients with Family History of Breast and/orColon Cancer

Case Study Outcomes

Develop a rule-based nlp pipeline from templates, and fine tune the rules
Use python and jupyter notebook to execute the NLP pipeline
Visualize NLP output
Preform error analyses and evaluation; Compare Trade-offs between recall and precision
Identify nuances in NLP development

Pre-Requisite Skills

In order to participate in this module the student should have the following skills and knowledge

Basic knowledge and familiarity with python and jupyter notebook
Basic skill of using git
Basic understanding of family history, breast cancer, colon cancer.
Basic knowledge of regular expression

Data Science Components with Associated Knowledge and Skill Sets

Knowledge Representation

Identify patient family history of breast cancer or colon cancer from clinical notes
Gain rule-based NLP knowledge and skills

Computation

Use python to build a rule-based NLP pipeline
Use python to evaluate NLP performance

Visualization

Use built-in packages to visualize the NLP output

Statistical Analysis

Contingency tables
NLP performance measurements

Case Study Materials

This project demonstrates how to use pyConText to identify the patients with family history of breast cancer and/or colon cancer, with the nice real time visualization adopted from the Brat's design [2]. For example，

Use the following bash commnad in terminal to clone the notebooks into your local directory.

git clone https://github.com/jianlins/FHI_Hands_on.git

If you are using jupyterhub.med.utah.edu, you will need to open the remote 'terminal' through the jupyter's web interface.

Then execute the following commands:

The dataset for this project is sampled from MIMIC demo dataset [1].
The visualization tool used in this project is derived from arne-cl's project, which is made out of Brat[2].

References:

MIMIC-III, a freely accessible critical care database. Johnson AEW, Pollard TJ, Shen L, Lehman L, Feng M, Ghassemi M, Moody B, Szolovits P, Celi LA, and Mark RG. Scientific Data (2016). DOI: 10.1038/sdata.2016.35. Available at: http://www.nature.com/articles/sdata201635
BRAT:A Web-based Tool for NLP-Assisted Text Annotation. PontusStenetorp, SampoPyysalo, GoranTopic, TomokoOhta, Sophia Ananiadou, and Jun’ichi Tsujii. In Proceedings of the European Chapter of the Association for Computational Linguistics (2012), pages 102–107

This material presented as part of the Foundations of Healthcare Informatics Course, 2017 Fall, BMI, University of Utah. It's revised from the material of the DeCART Summer Program (Data, exploration, Computation, and Analytics Real-world Training for the Health Sciences) at the University of Utah in 2017.

Original presenters : Dr. Wendy Chapman, Jianlin Shi and Kelly Peterson.
Revised by: Jianlin Shi and Dr. Wendy Chapman

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
.idea		.idea
KB		KB
advanced		advanced
data		data
img		img
tmp		tmp
1.FHI_Project_Intro.ipynb		1.FHI_Project_Intro.ipynb
2.FHI_Project_Intro_pyConText.ipynb		2.FHI_Project_Intro_pyConText.ipynb
3a.FHI_Project_Hands_on_Breast_Cancer.ipynb		3a.FHI_Project_Hands_on_Breast_Cancer.ipynb
3b.FHI_Project_Hands_on_Colon_Cancer.ipynb		3b.FHI_Project_Hands_on_Colon_Cancer.ipynb
CanvasReader.py		CanvasReader.py
DocumentClassifier.py		DocumentClassifier.py
README.md		README.md
discusion.txt		discusion.txt
itemData.py		itemData.py
nlp_pneumonia_utils.py		nlp_pneumonia_utils.py
visual.py		visual.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Using Natural Language Processing to Indentify Patients with Family History of Breast and/orColon Cancer

Case Study Outcomes

Pre-Requisite Skills

Data Science Components with Associated Knowledge and Skill Sets

Knowledge Representation

Computation

Visualization

Statistical Analysis

Case Study Materials

References:

About

Releases

Packages

Languages

UUDBMI/FHI_Hands_on

Folders and files

Latest commit

History

Repository files navigation

Using Natural Language Processing to Indentify Patients with Family History of Breast and/orColon Cancer

Case Study Outcomes

Pre-Requisite Skills

Data Science Components with Associated Knowledge and Skill Sets

Knowledge Representation

Computation

Visualization

Statistical Analysis

Case Study Materials

References:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages