New to NLP

Topaz is an application, which extracts relevant medical concepts and modifiers, such as negation and temporality, from clinical documents.
When Topaz is run, clinical reports in a folder designated as "input_directory" in are analyzed and results are stored in XML format to corresponding files in a folder designated as "output_directory".

Know NLP

Topaz is a UIMA collection processing engine (CPE), which extract relevant medical concepts and modifiers, such as negation and temporality, from clinical documents.
A single distribution folder is provided, which contains information and configuration files. UIMA description files are located in the /desc directory. The main CPE description file is /desc/UTopazCPE.xml.


Wendy Chapman

Lee Christensen

Henk Harkema

Associated Institutions

University of California San Diego

University of Utah

University of Pittsburgh


Minimum Requirements

Java SE 6 JRE or JDK
Dual core processor
4 GB of RAM
1 GB of storage for program
10+ GB of storage recommended for large datasets


To download Topaz, please go to the University of Pittsburgh's website to request access.

NLP Task Performed

Information Extraction

Programming Languages


Operating Systems



Topaz User Documentation

Introduction, Installation and Configuration

Learn how to install and configure Topaz startup properties.

Editing Domain Knowledge

Learn how to edit and extend Topaz's Knowledge Base to make the tool fit your domain and information extraction needs.

Example of Enhancing Topaz Knowledge Base

A step by step walkthrough on how to enhance Topaz's knowledge base.

Running Topaz

Learn how to run Topaz on your machine.

Advanced User Topics

If you are familiar with UIMA, learn how you can integrate Topaz into other UIMA analysis engines. In addition, the primitive analysis engines contained in Topaz are described further.

Related Publications

Ye Y, Tsui FR, Wagner M, Espino JU, Li Q. Influenza detection from emergency department reports using natural language processing and Bayesian network classifiers. J Am Med Inform Assoc. 2014 Sep-Oct;21(5):815-23. doi: 10.1136/amiajnl-2013-001934. Epub 2014 Jan 9.

Pineda AL, Tsui FC, Visweswaran S, Cooper GF. Detection of Patients with Influenza Syndrome Using Machine-Learning Models Learned from Emergency Department Reports. ISDS 2012 Conference. Online Journal of Public Health Informatics. 5(1):e41, 2013

Samore MH. Natural Language Processing: Can it Help Detect Cases and Characterize Outbreaks? Advances in Disease Surveillance 2008;5:59

Use Cases