Big Data in Healthcare – 4 Data Management Software with AI Capabilities

Ayn de Jesus

Ayn serves as AI Analyst at Emerj - covering artificial intelligence use-cases and trends across industries. She previously held various roles at Accenture.

Big Data in Healthcare - 4 Current Applications

The healthcare industry is perhaps second only to finance when it comes to the sheer amount of historical data available for use with artificial intelligence. Data from EMRs, insurance claims, clinical trials, and drug research and development can all be pulled into a machine learning algorithm to generate insights on patient behavior, patient risk, and effective treatments for a variety of conditions, among a variety of others.

In this article, we’ll dive into software available for housing and storing the large volumes of data that healthcare companies collect over time, as well as AI software that uses that data to generate analytics dashboards and garner business value for hospitals, clinics, and other healthcare companies.

We do this to paint a broad picture of the capabilities of big data and robust data infrastructures in the healthcare space, focusing specifically on four companies that offer AI data management and machine learning software to the healthcare industry.

These companies are listed below in addition to the AI applications of their data platforms:

  • IBM – Tracking how patients progress through treatments
  • Health Catalyst – Predicting patient behavior, including the readmission rates of recently discharged patients
  • Intersystems – Sharing patient medical data across multiple EMR systems to improve the quality of patient care
  • Elsevier – Drug development, including the ability to understand the behavior of certain diseases and how they might respond to drugs

Before exploring the applications individually, we’ll explain some of the important findings from our secondary research on big data in healthcare, and explain some of the concepts and trends that will matter most to business readers:

Big Data in Healthcare – Insights Up Front

All of the software discussed in this report are data platforms of some sort. The machine learning built into them work from both client data and in some cases data from a corpus of data drawing from millions of patients. Some of these companies are unclear about where they acquired the stores of data on which their software are trained.

In general, software for housing and working with healthcare data come with built-in search functions that in some cases seem to run on natural language processing. That said, the insights that the software can generate for customers are for the most part garnered through standard machine learning methods and predictive analytics functionality.

This report focuses mainly on software for housing, managing, and working with big data in the healthcare industry, but readers interested in how AI can use that data to generate a variety of actionable insights for hospitals, clinics, and other healthcare companies may want to read our report on predictive analytics in healthcare.

The companies listed in this report are all likely to be genuine in their claims to leveraging artificial intelligence. IBM, Health Catalyst, and Elsevier all employ people with PhDs and Master’s degrees in either computer science, hard sciences (such as physics) or statistical fields. InterSystems is a large company with nearly 1,500 employees, but we were unable to find evidence of serious data science or machine learning talent on their team.

Older technology companies sometimes try to reposition themselves as AI-focused companies in order to win press and customers. Although it’s very likely that InterSystems offers a robust, useful platform for working with large volumes of healthcare, we simply weren’t able to verify that the platform makes use of artificial intelligence.

That all said, the healthcare space in general has a high density of case studies when it comes to AI applications. The data platforms discussed in this report are no exception. Every company we discuss presents significant evidence of success, which bodes well for their clients.

In addition, we should note that, of the companies covered in this report, Health Catalyst and InterSystems are the most targeted to the healthcare industry. IBM and Elsevier offer software to a large variety of industries.

We’ll start our analysis of four AI vendors offering data platforms and data management software to healthcare companies with IBM:


IBM offers IBM Explorys, which it claims can help healthcare companies better understand a disease’s history, progression, and economic impact on populations, as well as identify patient segments that could benefit from a treatment for the disease.

The company claims that the platform, called Explorys Therapeutic Dataset, which falls under the purview of IBM Watson Health, contains data on lab tests, biometrics, and patient-reported information of about 50 million unique and anonymized individuals and 344,000 unique health care providers.

The company explains that the data is collected from connected primary care, specialty, and community centers; hospital inpatient, emergency, and surgical settings; as well as post-acute care settings such as long-term care, rehabilitation, and home health. The system administrators can schedule periodic refreshes with new patients and data as needed.

To access the data, a user types a query into the browser-based search tool. IBM Watson’s algorithms then identify relationships and recognize patterns and trends in the data. The company did not specify the type of AI used, but we can infer that this could involve natural language processing and classic machine learning.

Below is a short five-minute video demonstrating how Explorys works:

IBM claims to have helped SmartAnalyst Inc improve how the company markets its medications. SmartAnalyst Inc.’s pharmaceutical clients wanted to know how psoriasis patients’ progressed from topical to oral and finally to injectable treatments.

SmartAnalyst used IBM Explorys to identify more than 6,500 psoriasis patients and uncovered how these patients progressed through their treatment. The data showed that patients tended to skip from topical remedies to injectables in 206 days on average, insufficient time for the topical treatments to take effect.

SmartAnalyst also discovered that in more than 50% of cases, the prescriber was a general practitioner, not a dermatologist. As a result, SmartAnalyst’s client started to consider an initiative to educate general practitioners about psoriasis treatment and encourage patients to use topical medications longer or to try oral medications first, rather than jump to expensive injectable treatments.

Scott Spangler is Chief Data Scientist, Distinguished Engineer at IBM Watson Health. He holds an MS in computer science from the University of Texas in Austin. Spangler also served as Principal Scientist at IBM Watson Innovations for 19 years and Knowledge Engineer at General Motors for eight years.

Health Catalyst

Health Catalyst is a US-based company that offers the Data Operating System (DOS) data warehouse and application development platform, which the company claims can help life science companies improve patient care quality, patient outcomes, patient safety, and waste reduction using natural language processing.

Health Catalyst claims that DOS contains data from more than 100 million patients, encompassing over 1 trillion data points. These include structured and unstructured data such as biometrics, genomic data,  and data from insurance claims and physician’s notes.

The company explains that the DOS includes an analytics platform driven by NLP algorithms that combines linguistics, pattern recognition, and machine learning to extract meaning from text and formulate predictions.

One use for this system is for predicting the readmission risks among recently discharged elderly patients. A hospital could search the database for correlation patterns between factors such as age, length of stay in the hospital, gender, ailment for which they were readmitted, compliance to follow-up consultations, family education.

The algorithms could go through the database and search patient records, imaging data, physician’s notes, disease history, among others. If the systems finds that, for instance, 20 percent of elderly people who did not return for follow-up check-ups within 5 days of discharge are readmitted within 30 days, hospital management could use this information to create a program to address this issue.

The program could include educating patients, families, and clinicians; setting up a system to remind patients of appointments the day prior to the follow-up check-up; and implementing other predictive analytics to identify other patient groups at risk of readmission. The hospital could also use analytics to measure the impact of the education program.

Below is a short three-minute video demonstrating how the Health Catalyst solution works:

Health Catalyst claims to have helped Crystal Run Healthcare transition from a decade-old enterprise data warehouse (EDW) to a more robust one. Crystal Run’s legacy EDW struggled to integrate data from a variety of sources. Manually integrating the data from these systems required heavy coding and expertise to develop complex processes from the business intelligence (BI) team, and took time and effort to maintain. The legacy EDW also lacked a well-defined data model.

The growing organization needed an EDW that would allow individual clinicians and staff to access the data themselves and enable the business intelligence team to spend more time on processes and business analytics.

Crystal Run turned to Health Catalyst, who in turn designed, built and launched the new EDW platform in 54 days. The two companies also collaborated to implement financial and clinical improvements on the EDW, such as creating teams that would deploy and monitor the improvements in specific clinical areas. As a result, Crystal Run’s BI team was no longer tied up with producing reports and was able to focus on developing processes to manage and improve the business.

Health Catalyst also lists Texas Children’s Hospital, Stanford Hospitals & Clinics, Acuitas Health, University of Kansas Health System, and Westchester Medical Center Health Network as some of its past clients.

Elia Stupka is the Chief Analytics Officer at Health Catalyst. He holds a PhD in computer science from Leiden University. Previously, Stupka served as Senior Director for Data Science and BioInformatics at Dana-Farber Cancer Institute, Director at Boehringer Ingelheim, Co-Director at the Center for Translational Genomics and Informatics, and Lead Consultant at IBM.


InterSystems is a US-based company that offers HealthShare Unified Care Record, which the company claims can help healthcare providers, payers, patients, and government healthcare systems identify patient groups that need intervention, discover patterns in data that will improve patient care, coordinate care with other healthcare providers, and measure the progress of initiatives targeted at large populations using predictive analytics, natural language processing and machine learning.

HealthShare works with InterSystems’ Health Informatics Platform called IRIS Data Platform to create applications and algorithmic models that will enable businesses to integrate their data with HealthShare and create predictions.

InterSystems claims that its HealthShare database contains more than 1 billion data points, consisting of data about eight million patients and hundreds of millions of diagnoses, observations, results, and other information. The company reports that the data was collected from 23 hospitals, 655 outpatient facilities, and more than 18,500 affiliated physicians.

Below is a short four-minute video demonstrating how Intersystems IRIS works to train its machine learning models:

InterSystems claims to have helped Northwell Health improve care delivery, care coordination, outcomes, and business performance. Northwell was conducting a pilot project that would follow and potentially improve care coordination and results for expectant mothers with high-risk pregnancies.

The organization needed access to obstetrics information across over 100 healthcare providers that were collected using two outpatient EMRs and three inpatient EMRs.

The case study claims that the pilot succeeded in that critical information was available whenever the patient encountered the healthcare system. For example, when a high-risk expectant mother needed emergency room (ER) care, the system notified her obstetrician and primary care doctor while providing ER physicians access to her medical history. If lab results indicated a problem for mother or baby, the ER physician would be alerted through an indicator on the Labor and Delivery EMR.

After this pilot, Northwell used HealthShare in its care coordination application called Care Tool, which identifies high-risk patients, assesses needs, shares care plans across providers and locations, supports efficient workflows, and provides metrics for continuous improvement. The initial assessment showed that Care Tool resulted in:

  • 6% fewer readmissions for cardiac valve replacement patients
  • 18% to 28% more patients discharged to home instead of a nursing facility
  • Lowered risk of infections
  • Up to 56% greater use of in-network home care
  • Greater patient satisfaction

After the projects, the case study reports that Northwell continued to use Healthshare when it established the Center for Health Information Technology and Innovation and began creating new systems to:

  • Target gaps in clinical workflows
  • Simplify the management of at-risk contracts
  • Automate the identification of patient groups for population health management and analytics

InterSystems also lists Kaiser Permanente, TD Ameritrade, the U.S. Department of Veterans Affairs, European Space Agency, Roche Diagnostics, The Johns Hopkins Hospital, and Kimberly-Clark as some of its past clients.

Jonathan Teich is Chief Medical Informatics Officer at InterSystems. He holds a PhD in electrical engineering and computer science from the Massachusetts Institute of Technology. Previously, Teich served as Chief Medical Information Officer at Elsevier for 11 years and Corporate Director for Clinical Information Systems, R&D/Emergency Physician at the Brigham Women’s Hospital for 28 years.

Data from Science and Medical Publications


Elsevier is a Netherlands-based company that offers Pathway Studio database, which they claim can help scientists and research centers explore molecular interactions, along with the cause-and-effect relationships associated with biological processes among organisms using natural language processing and machine learning.

Elsevier claims the data consists of more than 10,000 journals, 1,700 plus full-text journals, over 25 million PubMed abstracts, and more than 200,000 clinical trials. The company explains that the data focuses on genomics and proteomics knowledge, specifically in relation to how the proteins, cell processes, and small molecules interact, modify and regulate each other.

To use the database, scientists and researchers must enter a query in the search interface. Pathway Studio then uses natural language processing algorithms to extract unstructured data from the published literature, experimental data, and internal documents.

Below is a short five-minute video demonstrating how Pathway Studio works:

Elsevier claims to have helped pharmaceutical researchers understand how melanoma proteins suppress the local immune response. Current drug development efforts focus on anti-melanoma immunotherapeutics, but researchers also need to identify other causes of the disease. Access to information sources needed to answer those questions, as well as being able to analyze and visualize molecular interactions in context, is critically important.

Scientists used two of Elsevier’s data sources, Pathway Studio and CellEffect, to explore molecular interactions and cause-and-effect relationships. The analytical and visualization tools allowed the scientists to better understand experimental results and literature about the development and drug responsiveness of the disease.

The analysis tool also enabled them to understand the impacts of differential gene or protein expression and protein-protein interactions on the disease and streamline the process of drug target discovery. Using the databases, scientists were able to connect the results with a few queries and completed the workflow in less than 20 minutes. The case study reports that Pathway Studio helped the researchers increase the chance of developing a successful drug.

The client was unnamed, so we caution readers to take this case study with a grain of salt.

Elsevier also lists NHS England, NHS Wales and Great Ormond Street Hospital as some of its clients.

Richard Loomis is Chief Informatics Officer at Elsevier. He holds an MD degree from the University of Michigan Medical School and is a Postdoctoral Fellow in Biomedical Informatics from the Harvard Medical School. Previously, Loomis served as Lead Physician Informatician of Kaiser Permanente, and Senior Medical Director for Informatics and Chief Medical Officer at Practice Fusio.


Header Image Credit: Future Healthcare Today

Stay Ahead of the AI Curve

Discover the critical AI trends and applications that separate winners from losers in the future of business.

Sign up for the 'AI Advantage' newsletter: