A significant amount of clinical data is embedded within clinical notes. The volume of data being generated and stored every day prohibits us from processing it effectively. To access the information locked in notes and reports, the Wright Center provides clinical data extraction via text analyses and natural language processing.
Natural Language Processing (NLP) is a powerful tool that unleashes the power of machines onto both written and spoken language. NLP methods can extract pertinent information from written text automatically to aid in the effective use of this data – and to identify patterns and trends that would otherwise be buried in unstructured texts.
The informatics team at the Wright Center has experience in implementing NLP workflows and pipelines, including cohort identification, medical topic analysis and publication reporting statistics. Using NLP in research is project specific. Thus, we work closely with principal investigators and subject matter experts to develop NLP pipelines and applications to advance research at VCU. If you have a project that may benefit from NLP, please contact us for a consultation.
NLP tools developed by the Wright Center informatics team include:
- TopExApp – an application designed for the exploration of topics in large sets of text using the TopEx Python library. Originally designed to identify common challenges experienced by acting interns through their reflective writing responses, this application can also be used for the exploration of topics in any set of texts.
- Chrono – a hybrid rule-based and machine learning application that identifies and normalizes temporal expressions in text. Chrono has been trained on both general domain and clinical texts and ranked first in the 2018 SemEval Task 6 temporal challenge.
- PubReporter (under development) – an application designed to summarize MeSH terms associated with a set of publications for reporting purposes.
- NLP@VCU – The NLP lab, led by Bridget McInnes, Ph.D., in VCU’s Computer Science Department, is actively developing NLP tools and is part of cross-campus collaborations with the Wright Center and VCU School of Medicine.
- CLAMP – An application developed out of the University of Texas Health with a user interface for drag-and-drop NLP pipeline development. Wright Center team members have experience building pipelines with CLAMP. Academic use/research licenses are free upon request.
- Olex A, DiazGranados D, McInnes BT, and Goldberg S. Local Topic Mining for Reflective Medical Writing. Full Length Paper. Accepted to AMIA Informatics Summit 2020. PMCID: PMC7233034.
- Olex A, Maffey L, Morgan N et al. Chrono at SemEval-2018 Task 6: A System for Normalizing Temporal Expressions. Full Length Paper. Proceedings of the 12th International Workshop on Semantic Evaluation. New Orleans, Louisiana: Association for Computational Linguistics, 2018, 97–101. DOI: 10.18653/v1/S18-1012
- Olex A, Maffey L, McInnes B. NLP Whack-A-Mole: Challenges in Cross-Domain Temporal Expression Extraction. Long Paper. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1. Minneapolis, Minnesota: Association for Computational Linguistics, 2019, 3682–3692. DOI: 10.18653/v1/N19-1369
Posters and Oral Presentations
- DiazGranados D, Olex AL, Garber A, Santen SA, McInnes BT, and Goldberg S. Utilizing Natural Language Processing to Automate the Identification of Acting Intern Challenges. Oral presentation by Olex, DiazGranados, and Goldberg as a team at the ChangeMedEd 2019 conference in Chicago, IL, Sept 18-21, 2019.
- Olex AL, Gal T, Afshar M, Dligach D, Karnik N, Oakes T, Sharma B, Xie M, McInnes BT, Solway J, Kho A, Cramer WC, and Moeller FG. Untapped Potential of Clinical Text for Opioid Surveillance. Poster presented by Amy Olex at the AMIA 2019 Annual Symposium, Washington DC.