Machine Learning for Medical Transcription – Current Applications

Niccolo Mejia

Niccolo is a content writer and Junior Analyst at Emerj, developing both web content and helping with quantitative research. He holds a bachelor's degree in Writing, Literature, and Publishing from Emerson College.

The Current State of Machine Learning for Medical Transcription

There are several companies claiming to offer AI-based medical transcription software, specifically speech recognition software, to hospitals and healthcare companies. We found that these solutions are intended to help hospitals and healthcare companies with medical transcription in different forms, transcribing speech into text in order to fill out and update patient medical records in electronic health record (EHR) and electronic medical record (EMR) databases.

Machine Learning for Medical Transcription – Insights Up Front

We can’t seem to find evidence that the prominent companies offering speech recognition software for medical transcription have what we would expect in terms of talent at their company, except for Nuance Communications. It isn’t clear how exactly their solutions could work without natural language processing, a kind of artificial intelligence. Nuance employs many data scientists with PhDs and Master’s degrees in computer science and hard sciences like physics. This is generally what we look for when it comes to vetting a company on their claims to leveraging artificial intelligence and cutting through the marketing hype we so often see on AI vendor websites.

That said, Nuance Communications offers natural language processing software to a variety of industries, not just healthcare companies. This makes them stand out from the other companies listed in this report. Ideally, a company sells into one or two niches, tailoring their software to specific use cases.

Natural language processing algorithms often require specificity in the way they’re trained. If a machine learning model built for a voice recognition system is fed only audio data from people with Boston accents, for example, the voice recognition system might have trouble picking up commands when they’re said by someone with a different accent. We explain this in further detail in our report on Crowdsourced Natural Language or Speech Training – Use Cases and Explanation.

Similarly, natural language processing models need to be trained on specific word and phrases, technical terms and argon. Machine learning engineers looking to build a voice recognition system for use in hospital and clinical settings will likely need to recruit subject-matter experts to provide audio data for the algorithm that involves jargon and commonly used phrases in those settings. In addition, these subject-matter experts would be required to correct the software as it transcribes the jargon incorrectly or fails to transcribe it altogether and feed those corrections back into the natural language processing algorithm. Only in doing this can the machine learning model behind the voice recognition system “learn” to transcribe jargon medical jargon.

What this means is that it is both resource and time-intensive to train voice recognition systems, making it difficult for companies to offer software for it to a variety of industries. This isn’t to say that Nuance’s voice recognition system is worse than other systems from companies that focus on the healthcare domain. In fact, the talent at the company, $2 billion in revenue, and 8,000 employees likely mean that they have the resources to train and update their voice recognition systems. Nuance is a rare case, however. Startups can’t often focus on more than one niche when it comes to building machine learning models, and business leaders should be skeptical of AI vendors claiming to offer robust machine learning software for more than three verticals.

Suki AI does employ a Senior Machine Learning Engineer who seems to have business experience in machine learning, but we were unable to find employees with similar credentials on the company’s LinkedIn page. Suki AI’s CTO used to work at Salesforce, which is certainly doing AI with its Einstein product, but he does not seem to have worked on Einstein. He was also Vice President of Big Data Services at Oracle from 2014 to 2016. In addition, he holds a Master’s in Engineering from Carnegie Mellon, which boasts one of the most renowned machine learning programs in the world. 

As such, it seems as though AI’s medical transcription use case is relatively nascent. Many of the companies listed in this report offer transcription services that do not use AI and are not done in real time. Users can upload audio involving medical jargon, and employees as the companies with an understanding of the technical jargon can transcribe the audio into text. Speech recognition software would theoretically allow this in real time, but again, it seems like such a use case is relatively untested given the state of talent at these companies.

Nuance and in some ways Suki AI seem the most likely to offer machine learning-based speech recognition software. Business leaders in healthcare may try working with these companies, but they may also want to wait for the use case to gain a little more traction in the coming years before spending money on transcription software.

The ML Medical Transcription Vendor Landscape

Nuance Communications

Nuance Communications offers software called Dragon Medical One, which it claims can help doctors and healthcare providers transcribe a doctor’s speech into an EHR using natural language processingIt’s likely that healthcare providers can integrate the software into an existing EHR system.

We can infer the machine learning model behind the software was trained first on requests to transcribe speech with word processing, in various accents, inflections, and pitches. The machine learning algorithm behind the voice recognition system would then transcribe those speech requests into text. Specialists that understand the field would then correct the transcription and upload the edited transcription into the machine learning algorithm again. This would have trained the algorithm to recognize medical jargon and better be able to transcribe what doctors and patients are saying when they discuss symptoms and treatments.

Below is a short 3-minute video demonstrating how Dragon Medical One works:

Nuance Communications claims to have helped Nebraska Medicine improve the efficiency of their healthcare providers when they updated patient medical records and wrote up documents. Nebraska Medicine integrated Nuance Communication’s software into its EHR.  Prior to integrating the software, Nebraska Medicine was paying for transcription services in order to transcribe doctor audio notes. According to the case study, Nebraska Medicine saw a 23% decrease in transcription costs. In response to a survey of all the company’s physicians,  “71% stated that the quality of their documentation has improved, and 50% stated that they have saved at least 30 minutes every day with Dragon Medical.” Nuance Communications also lists Allina Health and Baptist Health South Florida as some of their past clients.

Joe Petro is CTO at Nuance Communications. He holds an MSME in Computer Aided Engineering from Kettering University. Previously, Petro served as SVP of Research and Development at Eclipsys.

Suki.ai

Suki.ai offers a namesake software, which it claims can help doctors and healthcare providers take notes during appointments and update electronic health records with their voice using natural language processingSuki.ai claims hospital staff can integrate the software into their existing EHR databases.

We can infer the machine learning model behind the software was trained on hundreds of thousands of relevant speech snippets. These snippets would be from recordings of conversations between a doctor and patient, updating the patient’s EHR during their appointment, and making small notes or reminders in various accents and inflections from various types of people. The machine learning algorithm behind the voice recognition system would then transcribe those speech requests into text. Human editors would then correct the transcription and feed the edited text back into the machine learning algorithm. This would have trained the algorithm to recognize and correctly transcribe these speech snippets.

That said, we could not find a demonstration video showing how Suki.ai’s software works specifically. In addition, Suki.ai does not make available any case studies reporting success with their software, but they do list OrthoAtlanta and Piedmont Fayette Hospital as some of their past clients.

Karthik Rajan is CTO at Suki.ai. He holds a Master’s Degree in Engineering from Carnegie Mellon University. Previously, Rajan served as Vice President of Big Data Services at Oracle.

M*Modal

M*Modal offers software called Fluency Direct for Transcription, which it claims can help doctors and healthcare providers transcribe what is said during patient visits into EHR documents using natural language processingM*Modal Claims healthcare providers can integrate the software into their existing EHR databases, such as Epic, Cerner, and Athenahealth.

We can infer the machine learning model behind the software was trained first on hundreds of thousands of relevant speech snippets, such as a request to begin transcribing what a doctor says into an EHR or to navigate the EHR dashboard. These speech snippets would be in various accents and inflections from various types of people to ensure that the software could be used in a variety of places. The machine learning algorithm behind the voice recognition system would then transcribe those speech requests into text. Human editors would then correct the transcription and feed the edited text back into the machine learning algorithm. This would have trained the algorithm to recognize and correctly transcribe the speech as a doctor and patient say them in a clinical setting.

Below is a short 3-minute video demonstrating how Fluency Direct works:

M*Modal claims to have helped Floyd Memorial Hospital increase the number of medical documents they wrote up per day. Floyd Memorial Hospital integrated M*Modal’s software into its EHR system. According to the case study, Floyd Memorial Hospital saw their total lines of documentation rise from an estimated 113,000-124,000 to 235,000 per pay period as a resultM*Modal also lists Flager Hospital and Thomas Jefferson University Hospital as some of their past clients.

Detlef Koll is CTO at M*Modal. He holds an MS in computer science from the University of Karlsruhe. Previously, Koll served as Director of Speech Research at Lernout and Hauspie Speech Products.

Medical Transcription Billing, Corp. (MTBC)

Medical Transcription Billing, Corp. (MTBC) offers software called TalkEHR, which comes with a voice assistant feature called Allison. The company claims the software helps healthcare providers and doctors transcribe notes for EHR databases and navigate the EHR dashboard using natural language processing.  MTBC claims healthcare providers can integrate the software into their database where electronic health records are stored.

The company states the machine learning model behind the software was trained first on hundreds of thousands of speech requests that a doctor might make, such as to start recording, to transcribe speech that might include medical jargon, or to fill in a predetermined set of information in lieu of a doctor repeating themselves. For example, a doctor could verbally request the Allison assistant to refill a patient’s prescription, and the software would fill in the form with the same information as the last time the prescription was filled.

These requests would be in various accents and inflections from various types of people, allowing the voice recognition system to see use in various English-speaking parts of the world. The machine learning algorithm behind the voice recognition system would then transcribe those speech requests into text. In order to further train the algorithm, subject-matter experts would edit the transcription to make sure all of the medical jargon was transcribed correctly. They would then upload the corrected version back into the algorithm so that the software could in the future correctly transcribe the jargon.

We were unable to find a demonstrative video showing how TalkEHR and Allison works, and MTBC does not make available any case studies reporting success with their software. MTBC does list Metropolitan Foot and Ankle Specialists as one of their past clients, however.

Felix Cirillo is CTO at MTBC. He holds a Master’s Degree in Business Administration and Information Technology from Dowling College. Previously, Cirillo served as CIO at Advantedge Healthcare Solutions.

 

Header Image Credit: Compass Professional Health Services

Subscribe