New to NLP
The pyConText algorithm is an extension of the original ConText algorithm. This newer version (pyConText) is more extensible and can have user-defined modifiers. For example, one project involving radiology reports added the following modifiers: Uncertainty: certain or uncertain. Quality of radiologic exam: limited or not limited. Severity: critical or non-critical. Sidedness: right or left as well as others.
In addition to capturing user-defined modifiers, pyConText algorithm can also encode events and entities when provided regular expressions or dictionaries containing said regex/cues and their literal normalization.
System development has been supported over the years by several projects including: 1 K22 LM008301-01 “Natural Language Processing for Respiratory Surveillance.”, ShARe project, Vårdal Foundation, NIH grant 1R01LM010964, NLM Fellowship 5T15LM007059, Interlock project, and the Stockholm University Academic Initiative, VA HSR&D Stroke QUERI RRP 12-185, NIH NHLBI 1R01HL114563-01A1,
& NIGMS R01GM090187
The pyConText algorithm is an extension of the original ConText algorithm. Specifically, the pyConText algorithm differs from the ConText algorithm in a number of ways:
1) This newer version (pyConText) is more extensible and can have user-defined modifiers. To provide new modifiers, you simple supply it with regular expressions for tagging associated trigger terms and termination terms for refining scope.
For example, one project involving radiology reports added the following modifiers: Uncertainty: certain or uncertain. Quality of radiologic exam: limited or not limited. Severity: critical or non-critical. Sidedness: right or left as well as others.
2) The user can provide regular expressions to encode events and entities including their literal normalized concept and associated synonyms.
3) The user can provide rules pyConText to support document-level assertions derived from encoded events and their modifiers e.g., IF Finding: stenosis AND Severity: critical AND Anatomical location: internal carotid artery THEN flag Document for review of SIGNIFICANT CAROTID STENOSIS.
NLP Task PerformedClassificationInformation Extraction
Chapman BE, Lee S, Kang HP, Chapman WW. Document-level classification of CT pulmonary angiography reports based on an extension of the ConText algorithm. J Biomed Inform. Apr 1 2011.
Encoding Spatial Locations of Pulmonary Embolism from Radiology Reports
Wilson RA, Chapman BE. Automated Capture of Pulmonary Embolism Spatial Location in Dictated Reports Using the ConText Algorithm. Radiological Society of North America (RSNA). Chicago, IL2011:(in press).
De-identified CT PE studies
200 impression sections from CT PE studies
Overall accuracy: 58%;
Annotating Uncertainty from Radiology Reports
Gentili A, Chapman BE. Use of Natural Language Processing to Classify Radiology Reports Containing Description of the Abdominal Aorta. Radiological Society of North America (RSNA). 2013:(in press).
n= 473 reports annotated by a radiologist
Measuring Expressions of Uncertainty in Radiology Texts
Chapman BE, Gentili A, Chen J, Miyakoshi A, Chapman WW Measuring Expressions of Uncertainty in Radiology Texts for Natural Language Processing Applications Radiological Society of North America (RSNA). 2013:(in press).
Comparison of probabilities assigned by radiologists
against categories defined in pyConTextNLP
133 pyConTextNLP cues as definitely negated, probably negated,
probably existent, and definitely existent
108 cues translated from Swedish clinical texts.
3 radiologists assigned single-point probabilities to each cue
Pairwise comparisons of single point probabilities:
Mean difference = 0.012
Mean standard deviations = 0.21.
Mean (standard deviation) of point probabilities:
definitely negated 0.078 (0.11),
probably negated 0.17 (0.16),
probably existent 0.71 (0.11),
definitely existent 0.91 (0.083).
Encoding Mentions of Carotid Stenosis from Carotid Radiology Reports
Mowery DL, Chapman BE, Conway M, South BR, Madden E, Keyhani S, Chapman WW. Extracting a stroke phenotype risk factor from Veteran Health Administration clinical reports: an information content analysis. J Biomed Semantics. 2016; 7: 26.
RAD: mainly carotid ultrasounds; some angiograms, CT scans
TIU: mainly progress notes, carotid duplex exams, and carotid triplex exams
n=100 RAD train; 100 TIU train/ n=498 RAD test; 498 TIU test
Significant stenosis or not (document-level)-
Recall= 88; PPV= 70; Specificity=84; NPV=95
Recall= 73; PPV= 58; Specificity=87; NPV=92
Other Related Publication
Mowery DL, Franc D, Ashfaq S, Zamora T, Cheng E, Chapman WW, Chapman BE. Developing a knowledge base for detecting carotid stenosis with pyConText. AMIA Symp Proc. Washington DC. 2014.
Asserting the Uncertainty-level of Assessments from Swedish Clinical Texts
Velupillai S, Skeppstedt M, Kvist M, Mowery DL, Chapman BE, Dalianis H, Chapman WW. Cue-based assertion classification for Swedish clinical text – developing an assertion lexicon for PyConTextSwe. AIIM: Text Mining and Information Analysis. 2014.
Three subsets of a clinical corpus in Swedish:
the Stockholm electronic patient record (EPR) Corpus
(SEPR-C); two subsets from the Stockholm EPR diagnosis
uncertainty corpus (SEPR-DUC), annotated for
uncertainty and negation on a diagnostic-statement level
n=454 cues to pyConTextSwe
F-score: 83% F-score, overall
F-score = 88% (definite existence),
F-score = 81% (probable existence),
F-score = 55% (probable negated existence),
F-score =63% (definite negated existence).
F-score = 97%/87% (existence yes/no)
F-score = 78%/86% (uncertainty yes/no)
Other Related Publication
Velupillai S, Skeppstedt M, Kvist M, Mowery DL, Chapman BE, Dalianis H, Chapman WW. Porting a rule-based assertion classifier for clinical text from English to Swedish. 4th International Louhi Workshop on Health Document Text Mining and Information Analysis, Louhi. Sydney, Australia. 2013.