New to NLP

The pyConText algorithm is an extension of the original ConText algorithm. This newer version (pyConText) is more extensible and can have user-defined modifiers. For example, one project involving radiology reports added the following modifiers: Uncertainty: certain or uncertain. Quality of radiologic exam: limited or not limited. Severity: critical or non-critical. Sidedness: right or left as well as others.

In addition to capturing user-defined modifiers, pyConText algorithm can also encode events and entities when provided regular expressions or dictionaries containing said regex/cues and their literal normalization.

System development has been supported over the years by several projects including: 1 K22 LM008301-01 “Natural Language Processing for Respiratory Surveillance.”, ShARe project, Vårdal Foundation, NIH grant 1R01LM010964, NLM Fellowship 5T15LM007059, Interlock project, and the Stockholm University Academic Initiative, VA HSR&D Stroke QUERI RRP 12-185, NIH NHLBI 1R01HL114563-01A1,
& NIGMS R01GM090187

Know NLP

The pyConText algorithm is an extension of the original ConText algorithm. Specifically, the pyConText algorithm differs from the ConText algorithm in a number of ways:

1) This newer version (pyConText) is more extensible and can have user-defined modifiers. To provide new modifiers, you simple supply it with regular expressions for tagging associated trigger terms and termination terms for refining scope.

For example, one project involving radiology reports added the following modifiers: Uncertainty: certain or uncertain. Quality of radiologic exam: limited or not limited. Severity: critical or non-critical. Sidedness: right or left as well as others.

2) The user can provide regular expressions to encode events and entities including their literal normalized concept and associated synonyms.

3) The user can provide rules pyConText to support document-level assertions derived from encoded events and their modifiers e.g., IF Finding: stenosis AND Severity: critical AND Anatomical location: internal carotid artery THEN flag Document for review of SIGNIFICANT CAROTID STENOSIS.

Authors

Brian Chapman

Wendy Chapman

Danielle Mowery

Sumithra Velupillai

Associated Institutions

University of Utah

Minimum Requirements

Python

Download

NLP Task Performed

ClassificationInformation Extraction

Programming Languages

Python

Operating Systems

WindowsMacLinux

Related Publications

Chapman BE, Lee S, Kang HP, Chapman WW. Document-level classification of CT pulmonary angiography reports based on an extension of the ConText algorithm. J Biomed Inform. Apr 1 2011.

Use Cases

Encoding Spatial Locations of Pulmonary Embolism from Radiology Reports

Publication

Wilson RA, Chapman BE. Automated Capture of Pulmonary Embolism Spatial Location in Dictated Reports Using the ConText Algorithm. Radiological Society of North America (RSNA). Chicago, IL2011:(in press).

Document Types

De-identified CT PE studies

Sample Size

200 impression sections from CT PE studies

Performance

Overall accuracy: 58%;
Precision: 65%;
Recall: 44%;
F-measure: 53%

Annotating Uncertainty from Radiology Reports

Publication

Gentili A, Chapman BE. Use of Natural Language Processing to Classify Radiology Reports Containing Description of the Abdominal Aorta. Radiological Society of North America (RSNA). 2013:(in press).

Document Types

Radiology reports

Sample Size

n= 473 reports annotated by a radiologist

Performance

Sensitivity: 95%;
Specificity: 99%. 

Measuring Expressions of Uncertainty in Radiology Texts

Publication

Chapman BE, Gentili A, Chen J, Miyakoshi A, Chapman WW Measuring Expressions of Uncertainty in Radiology Texts for Natural Language Processing Applications Radiological Society of North America (RSNA). 2013:(in press).

Document Types

Comparison of probabilities assigned by radiologists
 against categories defined in pyConTextNLP

Sample Size

133 pyConTextNLP cues as definitely negated, probably negated,
probably existent, and definitely existent 

108 cues translated from Swedish clinical texts. 
3 radiologists assigned  single-point probabilities to each cue

Performance

Pairwise comparisons of single point probabilities:
Mean difference = 0.012
Mean standard deviations = 0.21.  
Mean (standard deviation) of point probabilities:
definitely negated 0.078 (0.11),
probably negated 0.17 (0.16), 
probably existent 0.71 (0.11), 
definitely existent 0.91 (0.083). 

Encoding Mentions of Carotid Stenosis from Carotid Radiology Reports

Publication

Mowery DL, Chapman BE, Conway M, South BR, Madden E, Keyhani S, Chapman WW. Extracting a stroke phenotype risk factor from Veteran Health Administration clinical reports: an information content analysis. J Biomed Semantics. 2016; 7: 26.

Document Types

RAD: mainly carotid ultrasounds; some angiograms, CT scans
TIU: mainly progress notes, carotid duplex exams, and carotid triplex exams

Sample Size

n=100 RAD train; 100 TIU train/ n=498 RAD test; 498 TIU test

Performance
Significant stenosis or not (document-level)-
RAD:
Recall= 88; PPV= 70; Specificity=84; NPV=95
TIU:
Recall= 73; PPV= 58; Specificity=87; NPV=92

Other Related Publication

Mowery DL, Franc D, Ashfaq S, Zamora T, Cheng E, Chapman WW, Chapman BE. Developing a knowledge base for detecting carotid stenosis with pyConText. AMIA Symp Proc. Washington DC. 2014.

Asserting the Uncertainty-level of Assessments from Swedish Clinical Texts

Publication

Velupillai S, Skeppstedt M, Kvist M, Mowery DL, Chapman BE, Dalianis H, Chapman WW. Cue-based assertion classification for Swedish clinical text – developing an assertion lexicon for PyConTextSwe. AIIM: Text Mining and Information Analysis. 2014.

Document Types

Three subsets of a clinical corpus in Swedish:
the Stockholm electronic patient record (EPR) Corpus
(SEPR-C); two subsets from the Stockholm EPR diagnosis
uncertainty corpus (SEPR-DUC), annotated for
uncertainty and negation on a diagnostic-statement level

Sample Size

n=454 cues to pyConTextSwe

Performance

F-score: 83% F-score, overall
F-score = 88% (definite existence),
F-score = 81% (probable existence),
F-score = 55% (probable negated existence),
F-score =63% (definite negated existence).

Binary classifications:
F-score = 97%/87% (existence yes/no)
F-score = 78%/86% (uncertainty yes/no)

Other Related Publication

Velupillai S, Skeppstedt M, Kvist M, Mowery DL, Chapman BE, Dalianis H, Chapman WW. Porting a rule-based assertion classifier for clinical text from English to Swedish. 4th International Louhi Workshop on Health Document Text Mining and Information Analysis, Louhi. Sydney, Australia. 2013.