Using Natural Language Processing to Indentify Patients with Family History of Breast and/orColon Cancer
- Develop a rule-based nlp pipeline from templates, and fine tune the rules
- Use python and jupyter notebook to execute the NLP pipeline
- Visualize NLP output
- Preform error analyses and evaluation; Compare Trade-offs between recall and precision
- Identify nuances in NLP development
In order to participate in this module the student should have the following skills and knowledge
- Basic knowledge and familiarity with python and jupyter notebook
- Basic skill of using git
- Basic understanding of family history, breast cancer, colon cancer.
- Basic knowledge of regular expression
- Identify patient family history of breast cancer or colon cancer from clinical notes
- Gain rule-based NLP knowledge and skills
- Use python to build a rule-based NLP pipeline
- Use python to evaluate NLP performance
- Use built-in packages to visualize the NLP output
- Contingency tables
- NLP performance measurements
This project demonstrates how to use pyConText to identify the patients with family history of breast cancer and/or colon cancer, with the nice real time visualization adopted from the Brat's design [2]. For example,
Use the following bash commnad in terminal to clone the notebooks into your local directory.
git clone https://github.com/jianlins/FHI_Hands_on.git
If you are using jupyterhub.med.utah.edu, you will need to open the remote 'terminal' through the jupyter's web interface.
Then execute the following commands:
The dataset for this project is sampled from MIMIC demo dataset [1].
The visualization tool used in this project is derived from arne-cl's project, which is made out of Brat[2].
- MIMIC-III, a freely accessible critical care database. Johnson AEW, Pollard TJ, Shen L, Lehman L, Feng M, Ghassemi M, Moody B, Szolovits P, Celi LA, and Mark RG. Scientific Data (2016). DOI: 10.1038/sdata.2016.35. Available at: http://www.nature.com/articles/sdata201635
- BRAT:A Web-based Tool for NLP-Assisted Text Annotation. PontusStenetorp, SampoPyysalo, GoranTopic, TomokoOhta, Sophia Ananiadou, and Jun’ichi Tsujii. In Proceedings of the European Chapter of the Association for Computational Linguistics (2012), pages 102–107
This material presented as part of the Foundations of Healthcare Informatics Course, 2017 Fall, BMI, University of Utah. It's revised from the material of the DeCART Summer Program (Data, exploration, Computation, and Analytics Real-world Training for the Health Sciences) at the University of Utah in 2017.
Original presenters : Dr. Wendy Chapman, Jianlin Shi and Kelly Peterson.
Revised by: Jianlin Shi and Dr. Wendy Chapman