Natural Language Processing in Healthcare – Current Applications

Niccolo Mejia

Niccolo is a content writer and Junior Analyst at Emerj, developing both web content and helping with quantitative research. He holds a bachelor's degree in Writing, Literature, and Publishing from Emerson College.

Natural Language Processing in Healthcare – Current Applications

When it comes to the healthcare industry, one might be able to think of numerous use cases for AI approaches like machine vision or predictive analytics. However, the applications of natural language processing (NLP) in healthcare are just as varied.

In this article, we’ll take a look at some of the applications of NLP made for hospitals and healthcare companies. We’ll do this by exploring four companies offering NLP software to healthcare providers:

  • IQVIA‘s platform makes use of unstructured and alternative data sources like social media in conjunction with medical documents to generate analytics regarding regulations and compliance. The software is advertised to find helpful information about changes to the client company’s compliance requirements.
  • 3M offers a system called CodeRyte CodeAssist that can recognize statements about diseases and treatments within a physician’s report. The software then labels the report with International Classification of Diseases (ICD) and Current Procedural Terminology (CPT) codes so that expenses can be automatically reimbursed by a patient’s insurer.
  • Amazon claims their NLP solution can be used for cohort analysis, or the process of finding the correct patients to be enrolled in a clinical trial for a new drug. The software combs through patient data to find which patients would make the best participants.
  • Nuance Communications has a solution for doctors and physicians called Dragon Medical One, which transcribes doctor’s words into an electronic health record (EHR).

We begin our analysis of NLP in the healthcare space with IQVIA and their medical coding and compliance solution:

NLP for Medical Coding and Compliance


Founded in 2016, Connecticut startup IQVIA offers a namesake software platform that they claim helps healthcare companies keep up with changes to industry compliance requirements. They also claim it can account for safety and quality compliance, as well as for healthcare industry and commercial regulations.

IQVIA states their platform for compliance solutions runs on what the company calls the “IQVIA Core,” which is the set of technology and services from which all of their solutions are derived. One part of this “core” is IQVIA’s analytics engine. This engine purportedly makes use of both predictive analytics and NLP to comb through data and find the information the user is looking for. IQVIA likely uses NLP in their analytics engine to comb unstructured data from social media, electronic medical records (EMR), clinical trials, and other medical documents. These documents would contain text data without any guidelines for how it should be written so that a computer could comprehend it. As a result, NLP could help to make sense of information that could prove valuable to a client.

IQVIA’s analytics engine appears to be focused on contracts and commercialization, so their use of NLP allows clients to find insights about complying with industry and legal standards without reading every single medical document.

IQVIA’s website states the company has access to millions of patient records, clinical trials, and text chains from hundreds of thousands of social media sources. The machine learning algorithm was likely trained on this data.  The data would involve issues such as patient or employee safety, how well new drugs or other health products are working. Keywords or phrases would be labeled as related to the type of compliance issue they may reflect.

For example, if a paragraph within a clinical trial document states a patient sustained a newly-discovered side effect of the drug, a data scientist might label that statement as being related to safety regulations compliance. This would train the algorithm to recognize text chains that compliance teams might interpret as important safety compliance information.

A client could then expose the algorithm to unlabeled documents or social media posts, and it would be able to recognize the important information within them to extract the information compliance officers were looking for. These categories would include safety compliance, regulatory compliance, quality control standards, and commercial compliance.

Below is a graphic from IQVIA’s website showing the flow of data and information through their analytics engine:

IQVIA’s analytics engine process

IQVIA does not make any case studies available showing success with their software.

IQVIA does not list any past clients by name, but they have raised $40 million in venture capital and are backed by Cota Healthcare.

Harietta Eleftherochorinou is the Global Senior Principal of ML and AI of IQVIA.. She holds a PhD in Medicine and Genomics Machine Learning from the Imperial College of London. Before IQVIA, Eleftherochoriou served as Head of Advanced Analytics and Data Science at Deloitte Consulting.


3M offers software called the Code Ryte Code Assist System, which it claims can help healthcare providers and physicians accurately report patient illnesses and received operations or services. The software would help by scanning physician’s reports containing unstructured data using NLP.

The Code Ryte Code Assist system is made to assign International Classification of Diseases (ICD) and Current Procedural Terminology (CPT) codes to physician’s reports. ICD codes are used to classify diseases and related health problems mentioned in a physician’s report. CPT codes are used for listing medical, surgical, and diagnostic procedures or services that a patient has undergone or received. They serve as shorthand used to make searching through reports easier. Both of these code types are necessary to the workflow of processing physician’s reports and labeling them in order to optimize medical reimbursement.

The machine learning algorithm behind 3M’s Code Ryte Code Assist software was likely trained on hundreds of thousands of physician’s reports related to diseases and treatments or services the patient received for those diseases.

Each disease mentioned in the report would have been labeled with an ICD code, and each service or treatment would have been labeled with a CPT code. For example, if a patient was injured while operating an agricultural vehicle, there would be an ICD code automatically attached to that report corresponding to injuries involving agricultural vehicles. This code would likely look like “V84.0”.

These labeled reports would then be run through the machine learning algorithm, which would train it to discern which codes are necessary for which reports. It would also train it to determine which text chains are indicators of needing an ICD code or a CPT code.

A client company could then run unlabeled physician’s reports through the software, and it would detect keywords and phrases that relate to each code type. It could assign those codes to reports automatically. This could allow for a faster way of determining ICD and CPT codes for accurate and optimized reimbursement.

3M claims to have helped California Medical Business Services (CMBS) automate coding of their reports to speed up the process and dissolve their backlog of uncoded reports. They also found that they had to comply with government demand for Physician Quality Reporting Systems (PQRS), which had numerous measures for CMBS to code their reports by. According to the case study, CMBS saw their three-week backlog disappear completely over time and automated the PQRS coding process.

3M also lists On Demand Solutions Inc. as one of their past clients.

Brian Stankiewicz is the Principal Data Scientist at 3M. He holds a PhD in Cognitive Science from UCLA. Previously, Stankiewicz was a Post Doctoral Research Scientist at the University of Minnesota.

NLP for Finding the Right Clinical Trial Participants


Amazon offers software called Amazon Comprehend Medical, which it claims can help healthcare companies and providers find business insights from medical records, code their medical records accurately, and find the correct patients for clinical trials. The software uses NLP to comb through these written documents to find the needed information.

In addition to medical coding and business intelligence, Amazon Comprehend Medical is stated to help with medical cohort analysis. The software is purportedly able to highlight the medical information most relevant to the client’s criteria for their clinical trials.

Amazon claims it can find this information from unstructured text, such as a physician’s notes in a report about the patient’s experience with their illness. The important data points would be extracted so they would be more easily accessible for the user.

This allows the user to make more informed decisions on patients to recruit for their clinical trial. For example, a drug made to treat the symptoms of multiple sclerosis could be the subject of a client’s next clinical trial. Text data from physician’s notes could tell an Amazon Comprehend Medical user all of the multiple sclerosis patients in their network or that they have the contact information from.

Amazon’s machine learning algorithm for Comprehend Medical was likely trained on millions of physician’s records, patient health records, and clinical trial reports. For medical cohort analysis, the text within all of these documents would have to be labeled according to multiple factors important to recruiting the right trial participants.

These factors would be current disease or diseases, the age and sex demographics of the participants, and history with drugs similar to the one being tested. These labeled documents would be run through the algorithm, which would “teach” the software to determine the user’s intent in order to pull up the correct participant records based on the keywords input by the user.

A user from a client company could then use the software to search through a database that contains multiple unlabeled documents. It would then be able to detect the words in the documents that correspond to the diseases, illnesses, or demographics the user is searching for. This would allow people with the same disease or illness that the drug being tested is supposed to help with to appear in search first and speed up the decision making process.

Below is a graphic from Amazon’s website that shows the flow of data into and out of Amazon Comprehend Medical and what the software does with that data:

Graphical demonstration of how Amazon Comprehend Medical works

Amazon does not make any case studies available that show success with Amazon Comprehend Medical.

Amazon also lists Fred Hutch, PWC, and Roche as some of their past clients.

Roland Miezianko is Machine Learning Lead at Amazon. He holds a PhD in AI Software Systems from Temple University. Previously, Miezianko served as Senior Principal Data Scientist and Director of Machine Learning at UnitedHealth Group.

Voice Recognition for Clinical Documentation

Nuance Communications

Nuance Communications offers software called Dragon Medical One, which they claim helps doctors and healthcare providers transcribe speech into a medical document such as an EHR using NLP.

The company advertises Dragon Medical One as able to turn speech into text while a doctor is speaking into the equipped microphone. Both the software and microphone are running on the doctor’s computer while an appointment goes on, and the doctor can speak into the microphone as though dictating notes to be written down. Dragon Medical One can detect this and “type” the doctor’s words into the EHR. This automates the doctor’s responsibility to transcribe their recorded notes into the EHR themselves.

Nuance Communication’s machine learning algorithm would have had to been trained on thousands of speech requests and hundreds of thousands of words, spoken in different tones, inflections, and accents. Requests could include commands that signal the system to start a new line or to end the sentence with a period. The machine learning algorithm would then be able to recognize the sounds the doctor makes as words and commands to be entered into the system or executed.

A client could then speak into a Dragon Medical One-enabled microphone, and the software would be able to transcribe what was said into the EHR system it is connected to. This may or may not require a change in inflection for certain words that are the same as verbal commands such as saying “new line” to make the software jump down to the next text line.

Below is a short demonstrative video showing how Nuance’s Dragon Medical One software works:

Nuance Communications claims to have helped Allina Health increase their clinical documentation and better the EHR experience for their doctors. Allina Health integrated Dragon Medical One into its EHR creation process. This relieved doctors from having to transcribe everything they spoke into the EHR microphone. According to the case study, Allina Health saw 70% of their voice-based documentation automated and a 167% increase in the overall amount of documentation captured after adopting Dragon Medical One.

Nuance Communications also lists Nebraska Medicine and Baptist Health South Florida as some of their past clients.

Paul Tepper is Head of the Cognitive Innovation Group AI Lab and Product Manager for AI and Machine Learning at Nuance Communications. He holds a PhD in computer science and communication studies from Northwestern University. Previously, Tepper served as Senior Computational Linguistics Engineer at Idibon.


Header Image Credit: Fairmont Regional Medical Center

Stay Ahead of the AI Curve

Discover the critical AI trends and applications that separate winners from losers in the future of business.

Sign up for the 'AI Advantage' newsletter: