Natural Language Processing in Insurance – Current Applications

Niccolo Mejia

Niccolo is a content writer and Junior Analyst at Emerj, developing both web content and helping with quantitative research. He holds a bachelor's degree in Writing, Literature, and Publishing from Emerson College.

Natural Language Processing in Insurance - Current Applications

The applications of natural language processing (NLP) have been increasing as more companies find uses for their text data. This includes insurance companies with large stores of data from claims and customer support tickets.

In this article, we’ll take a look at the applications of NLP in the insurance industry. We will do this by examining four software vendors offering NLP-based solutions to the insurance industry, and assessing the possibilities of applying NLP to insurance operations.

The companies detailed in this report that are offering NLP solutions to insurers are as follows:

  • IBM claims to leverage AI in numerous business solutions. Their text mining software called Watson Explorer can purportedly extract important information from unstructured text data to improve claims processing.
  • Taiger also offers claims processing software, but their solution helps insurance agents more easily find the information they need to process claims. They also claim their software can help automate onboarding.
  • Health Fidelity offers healthcare fraud detection software, but in this report, we will be looking more closely at the NLP algorithm behind that fraud detection software.
  • Progress Software offers chatbot software that they say can help customers choose an insurance policy as well as schedule appointments with healthcare providers.

Before exploring each vendor in depth, we’ll take a look at how NLP solutions are developing for the insurance industry.

Natural Language Processing in Insurance – Insights Up Front

IBM has the highest density of AI talent of any company in this report. Progress Software and Health Fidelity have AI staff of similar sizes, but neither is as robust as that of IBM. The smallest company, Taiger, has yet to gain much traction with its NLP solution, but this may be because it has not had enough time with the first round of select customers leveraging the software now.

IBM is the most likely to actually be leveraging AI in their product. Their track record and high amount of AI and data science talent make this apparent. However, Progress Software and Health Fidelity do not share that same likelihood. They each lack a robust list of staff with an AI background, and Progress software does not provide information regarding success with their software.

Taiger has AI leaders with a strong academic background, but the likelihood that they actually use AI is unclear. This is because their AI talent are either new to the company in 2018 or have just recently been promoted to an AI position.

Of the four companies examined in this report, only Progress Software did not have any case studies available showing success with their software. That being said, the other companies only had one case study for their NLP solution for insurance. This is a sign that this application has not seen much success in the field yet, because even IBM, the most established company of the four, cannot offer more than one instance of enterprise success.

We’ll begin our exploration of the state of NLP in insurance with claims processing software offered by IBM and Taiger:

NLP for Claims Processing


IBM offers software called IBM Watson Explorer, which the company claims can help insurance companies access and organize text data to improve their customer service and claims processing. The software uses natural language processing.

IBM Watson Explorer combs through structured and unstructured text data to find the right information to process insurance claims. This information usually comes from the customer making the claim, but further claims help the software to recognize more terms and phrases. This software can be applied to applications designed to help customer service agents, who may need to search for the correct information through an intranet or similar employee resource.

We can infer the machine learning model behind Watson Explorer needs to be trained on tens of thousands of their client’s insurance claims. Each claim would be labeled according to the sections of the claim application form, and by the terminology that commonly is filled into it. Then IBM or a data scientist at the client company would expose the machine learning algorithm to this labeled data.

The software would then be able to scan through a new claim application form and extract each data point from each of the sections. A customer service agent who may be speaking to the customer on the phone could then search for past claims that are similar to the client’s. The software would then provide the user with the option to open the list of those documents, find trends, and find possible causes. This would allow them to easily manage the data for verification through the client company’s specific procedures.

Below is a short 5-minute video demonstrating how IBM Watson Explorer works and showing an introduction to mining data from text content:

IBM claims to have helped a leading insurance provider organize their data from large storage systems and multiple sources. The data was supposed to be funneled into a database for call center agents processing insurance claims. According to the case study, Watson’s Explorer software reduced their client’s claims processing time from two days to ten minutes and saved 14,000 agents 3 seconds per call on average.

IBM also lists Toyota Financial Services and ING Direct as some of their past clients.

Tom Eck is the Global CTO of IBM. He holds a PhD in Computational Biology and Molecular Biophysics from Rutgers University. Previously, Eck was Head of Emerging Technologies at First Data Corporation.


Taiger offers a namesake software that they claim helps insurance companies automate their claims processing and customer onboarding using natural language processing.

Taiger’s website describes the software as a virtual assistant that can extract important information from a customer’s claim and provide insurance agents answers to questions they may have otherwise posed to the customer. These questions could be about why a field was left blank on the claim or asking for more information about the incident the claim is for.

This would save time in the transaction by preventing a back and forth of further questions after the initial claim. Taiger also claims the software can assist in the customer acquisition process, but it is unclear how the virtual assistant actually communicates information to a customer as opposed to an employee.

We can infer the machine learning model involved in the software would need to be trained on thousands of a client company’s insurance claims to begin working effectively. Claims would be labeled by section of the application form, and then be run through the machine learning algorithm along with keywords and phrases relating to the insurance claim, such as the type of damage on a car or a house.

This would train the algorithm to discern the chains of text that humans understand as pieces of information to be filled out in the application form.

The software would then be able to comb through a customer’s claim application form and extract the important information from it for the insurance broker processing the claim. This information would be accessible from a user dashboard that displays the claim itself and the information extracted from the software.

This could range from details about the damage to a car or house, to the number of lost items in a storm, to injuries sustained in a car accident. The software would list these for the insurance agent, who can then verify the claim faster with the important points listed up front.

NLP for Fraud Detection

Health Fidelity

Health Fidelity offers software called HF Reveal NLP, which they claim is a natural language processing engine that enables many functions in the risk management software they offer to insurers. They also claim the software can handle unstructured data such as written notes in clinical documents, which has helped past clients use data they could not before.

HF Reveal NLP serves as an engine for their risk adjustment solutions for both healthcare plan networks and providers. Health Fidelity focuses on risk adjustment and offers a thorough process for adjusting risk from every angle. This would eventually require unstructured data, such as a long email or an insurance claim. These types of documents, as well as clinical documents for health insurers, would need to be run through NLP software before the data points could be interpreted as indicative of high risk.

Health Fidelity also has a terminology engine they use to help HF Reveal NLP detect important terms and phrases for the insurance sectors they serve. This includes robust taxonomies like ICD-10 (International Classification of Diseases, 10th revision).

We can infer that the machine learning model behind HF Reveal NLP was trained on tens of thousands of clinical documents and health insurance claims. All of the claims would be labeled according to if they are fraudulent or not, and fields within the claims form that contain fraudulent information would be labeled to note this.

Then, a data scientist could expose the machine learning model to the labeled data. This would train it to discern text and chains of text that humans understand as information on a claims form. It would also have been trained to discern when that information is likely to be fraudulent.

A user could then feed a new document into the software, and the software could mark words, phrases, or sections of the document that are likely to be fraudulent or given fraudulent information. For example, an insurance claim could state the customer fractured their arm, but the clinical documentation for the customer’s emergency room visit may not state the same information. If the documentation says that a fracture was expected but the customer did not turn out to have one, the software could detect this and mark the claim as fraud or likely to be fraud.

Below is a screencap of a graphic shown on Health Fidelity’s website. It details their software platform and shows their terminology engine feeding into HF Reveal NLP:

Health Fidelity’s value proposition

Health Fidelity does not make any case studies available showing an insurer’s success with their software.

Health Fidelity does not list any past insurance clients by name on their website, but they have raised $19.3 million in venture funding and are backed by UPMC.

Raj Tiwari is Chief Architect for Health Fidelity. He holds an MS in Electrical Engineering from Oregon Health and Science University. Previously, Tiwari was Director of Technology at Surgent Networks.

Chatbots for Automating Appointments and Choosing a Policy

Progress Software

Progress Software offers a software called Kinvey Native Chat, which it claims can help insurance companies offer a chatbot for self-service transactions using natural language processing.

The Kinvey Native Chat chatbot is made to automate appointment scheduling, as well as finding the right insurance policy for the customer. With this solution, customers can purportedly do this without an agent and would save time for themselves and the client company.

We can infer the machine learning model for their chatbot software would need to be trained on hundreds of thousands of snippets from text chat conversations from customers. These conversations would involve asking insurance related questions, requesting help, or filing a claim. Each phrase, term, and the relevant sentence would be labeled according to which category of support request it is, and a data scientist would expose the algorithm to this data after it was labeled. This would train it to discern the chains of text that humans perceive as a customer service request and which sets of data the request refers to.

A customer could then message the chatbot, and Progress Software’s machine learning algorithm would be able to categorize the message as an insurance related question, asking for help, or needing to file a claim. The software likely has a confidence interval on how likely the categorization was correct. It would also be programmed to take certain actions depending on that confidence interval, such as directing the user to the correct page for filing claims. Under a certain percentage, the chatbot would route the message to a human agent for review.

Below is a short video showing how Native Chat helps a customer find the correct insurance policy:

Progress Software does not list any case study showing an insurance provider’s success with the software.

Progress Software also lists Cigna and Nationwide as some of their past clients.

Yogesh Gupta is the CEO of Progress Software. He holds an MS in Computer Science from University of Wisconsin-Madison. Previously, Gupta served as President and CEO of Kaseya.


Header Image Credit: SmartCompany