Skip to content

A simple MetaMap wrapper and post-processing XML parsing tool.

License

Notifications You must be signed in to change notification settings

Shakesbeery/MMapPy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MMapPy

MMapPy is a simple interface for use with the NLM's MetaMap program. The idea and implementation originated from my needs to quickly process large amounts of medical text and convert it into a flattened table. As such, pyMap offers a utility for converting MetaMap XML output into a dictionary of lists (which can easily be converted to a Pandas DataFrame object).

Getting Started

Dependencies

For MMapPy

None! pyMap was designed to use nothing outside of the standard library.

For MMapPy utils

  • bs4
  • pandas (recommended)
  • lxml (recommended)

Installation

If there is ever any interest, I'll create the necessary setup files.

Usage

Instantiate a MetaMap class instance and process a sentence:

from pyMap import MetaMap

MM = MetaMap()

#Assuming that the MetaMap directory is already in your path
MM.start_skrmedpost_server('skrmedpostctl_start.bat')

text = 'He presented two hours later and a heart attack was ruled out by the physician.'
results = MM.process_text(text, 'metamap14.bat --XMLf')

#All done!
MM.stop_skrmedpost_server()

Optionally converting the XML results into a useable table:

import pandas as pd
from utils import xml_to_soup, extract_results_from_soup

soup = xml_to_soup(results)
out, extra_metadata = extract_results_from_soup(soup)
df = pd.DataFrame(out)
print(df)

df looks like:

FullLexicalUnit Word LexicalCategory CUI SemanticType PreferredTerm Negated
He He pron 0
presented presented verb C0449450 idcn Presentation 0
two two adj C0205448 qnco Two 0
hours hours noun C0439227 tmco Hour 0
later later adv 0
and and conj 0
a a det 0
heart attack heart noun C0027051 dsyn Myocardial Infarction 1
heart attack attack noun C0027051 dsyn Myocardial Infarction 1
was was aux 0
ruled ruled verb C1446409 qlco Positive 0
out out adv C0439787 spco Out (direction) 0
by by prep 0
the the det 0
physician physician noun C0031831 prog Physicians 0
. . punc 0

Use with NLTK to process an entire document sentence-by-sentence

from nltk.tokenize import sent_tokenize
from collections import defaultdict

master_dict = defaultdict(list
sents = sent_tokenize(big_document)
for sent in sents:
    results = MM.process_text(sent, 'metamap14.bat --XMLf')
    soup = xml_to_soup(results
    out, extra_metadata = extract_results_from_soup(soup)
    for key in out:
        master_dict[key].extend(out[key])
        
document_df = pd.DataFrame(master_dict)

Authors

David Beery

License

GNU GPLv3

Acknowledgements

  • Thanks to Dr. Aronson and everyone at the NLM for creating and maintaining MetaMap in all its forms!

About

A simple MetaMap wrapper and post-processing XML parsing tool.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages