Artificial Intelligence in the Pharmaceutical Industry – An Overview of Innovations

Ayn de Jesus

Ayn serves as AI Analyst at Emerj - covering artificial intelligence use-cases and trends across industries. She previously held various roles at Accenture.

Artificial Intelligence in the Pharmaceutical Industry 950×540

Several factors have contributed to the advancement of AI in the pharmaceutical industry. These factors include the increase in the size of and the greater variety of types of biomedical datasets, as a result of the increased usage of electronic health records.

We researched the use of AI in the pharma space to better understand where the technology comes into play in the industry and to answer the following questions:

  • What types of AI applications are currently in use in the pharmaceutical industry?
  • What tangible results has AI driven in pharma?
  • Are there any common trends among these innovation efforts? How could these trends affect the future of pharmaceuticals?

This report covers companies offering software across the following four applications in the pharmaceutical space:

  • Predicting Treatment Results
  • Drug Design
  • Data Preprocessing

This article intends to provide business leaders in the pharmacy space with an idea of what they can currently expect from Ai in their industry. We hope that this report allows business leaders in pharma to garner insights they can confidently relay to their executive teams so they can make informed decisions when thinking about AI adoption. At the very least, this report intends to reduce the time business leaders in the pharma spend researching AI companies with whom they may (or may not) be interested in working.

Predicting Treatment Results

GNS Healthcare

GNS Healthcare offers Reverse Engineering & Forward Simulation (REFS), a machine learning software that automates work that previously involved trial and error to match drug interventions with individual patients.

The company claims that REFS-generated machine learning models are capable of predicting a patient’s response to possible drug treatments by inferring possible relationships among factors that might be affecting the results, such as the body’s ability to absorb the compounds, the distribution of those compounds around the body, and a person’s metabolism.

To do this, the REFS algorithms reverse-engineer to find elements that are affecting the patient’s response to the drug. It refers to its database to find information and patterns in genomics and genetics found in medical claims, lab results, and electronic health records, among others. According to the website, REFS algorithms will search for possible combinations of elements that are affecting the patient’s response to the drug treatment, and compare them with the patient’s case.

The algorithms then run simulations and ask multiple “what if?” questions until the user can determine which drug treatment might produce the best result for each patient, based on what the reverse engineering process reveals.

In the 4-minute video below, CEO Colin Hill explains how REFS uses data from previous treatments contained in electronic medical records, patient registries, claims, and other data sources to enable the machine learning to better predict drug treatment results.

The company does not have available case studies but announced a collaboration with Amgen and Alliance for clinical trials that will apply REFS to identify factors that drive responses to Panitumumab, Amgen’s treatment for patients with metastatic colorectal cancer.

Another client, SomaLogic, obtained a license to apply REFS to SomaLogic’s SOMAscan-derived protein datasets. This agreement aims to leverage REFS algorithms to derive meaning from SomaLogic’s protein data for use in individual and population health management.

GNS also claims Alexion and Sema4 as among its clients. It has raised $54.3M in funding from  Alexandria Real Estate Equities and Amgen Ventures, among others.

Bruce Church, Chief Mathematics Officer, is responsible for further developing the machine learning algorithms for REFS and leading the development of new products and technologies. Previously, he spent 10 years at Cornell University, developing global optimization methods for computational protein folding. He also served as principal investigator on major grants including a $2.5 million award from the Department of Energy. Bruce received a PhD in applied physics from Cornell University.

Drug Design


Atomwise developed AtomNet technology, a deep learning neural network application for structure-based drug design and discovery.

The company claims that their technology is based on convolutional neural networks – the same technology that powers face and speech recognition and self-driving cars.

The company did not describe the exact process of how the application is used in drug design, but we can infer that the researcher uses the application by virtually combining atoms to come up with possible molecules. The company claims that its algorithms are trained to process and analyze billions of atoms to find which ones will bond, and will then simulate their testing on the computer, as seen in the video below:

The company explains that while other technologies present users with 2D pixels with channels of red, green, and blue colors, AtomNet represents the molecule-protein binding as 3D pixels with channels for carbon, oxygen, nitrogen, and other atoms. This representation trains AtomNet to learn the features of molecular binding and avoids manual tweaking and “over-parameterizing” to express the system, process, or model as equations, according to the company.

In 2015, Atomwise worked with the University of Toronto and IBM to develop a treatment for Ebola virus infections that caused about 11,310 deaths in some African and European countries and the United States. Atomwise provided the AI technology to perform the drug research, the University of Toronto contributed biological insights about the virus, while IBM supplied the 64,000-CPU Blue Gene/Q supercomputer.

In the drug discovery process, Atomwise defined a region to investigate for potential small molecules. This region was then screened for molecules that bind to glycoprotein. The compounds already had safety data for use in patients and could be rapidly brought forward for clinical trials. Using the AtomNet system, the team analyzed and predicted the molecules that could potentially bind to glycoprotein.

The University of Toronto tested the compounds in vivo. The identified compound turned out to have no previous anti-viral activity. This showed that AtomNet is able to identify new drug uses.

The company claims to have Merck, University of Toronto, Abbvie, Duke University School of Medicine Drugs for Neglected Diseases Initiative, and Autodesk as partners. It has received $51.3 million in funding from B Capital Group, Y Combinator, Baidu, Data Collective, Tencent Holdings, Dolby Family Ventures, and Monsanto Growth Ventures.

Kong T. Nguyen, Principal Scientist at Atomwise, is responsible for initiatives in deep learning for drug discovery. He holds a PhD in Computer-Aided Drug Discovery from the Universitat Bern. Earlier in his career, he served as a postdoctoral fellow at the University of California, San Francisco and at the Structural Genomics Consortium.

Insilico Medicine

Insilico Medicine launched its Pharm.AI project, which the company claims can create, train, sell or permit use of its deep neural network solutions to pharmaceutical and biotechnology companies, as well as to academia.

The company did not reveal the exact processes of its offering, but according to Founder and CEO Alex Zhavoronkov, the company mainly applies generative adversarial networks (GAN) and reinforcement learning algorithms to generate new molecular structures and to find the biological origin of a disease.

GAN is a branch of AI that works when a generator produces real-looking images while the discriminator attempts to recognize which one is a fake, to increase the accuracy in generating images, videos, and text. The CEO’s profile explained that the company uses this AI method in areas such as oncology, fibrosis, dermatology, and aging. Reinforcement learning, meanwhile, is a method of machine learning where the algorithms are rewarded after it performs an action.

The website explains that these AI methods are used to develop new ideas in biomarkers, drug discovery, computer simulation, and validation systems.

The four-minute video below explains that lnsilico uses AI to help pharmaceutical companies improve the R&D processes, develop companion diagnostics and improve clinical trials enrollment practices. The company claims to collect the human tissue of young, middle-aged, old, and very old patients to build its own data, which can be used to train other datasets.

The company also collects and compares healthy and disease-afflicted tissue to identify patterns in the aging process and diseases to help in developing drug treatments, and drug scoring to evaluate the effectiveness of drugs. The video adds that using AI could reduce animal testing and human clinical trials.

The company also develops and licenses biomarkers and drug discovery tools for cancer and age-related diseases. Biomarkers, according to the US National Center for Biotechnology Information, are medical characteristics that can be measured and evaluated to show normal biological processes, pathogenic processes, or responses to a drug-based intervention.

The company has not made available any case study but announced a multiyear agreement with Juvenescence AI Ltd. to develop an AI engine that can expedite the development of new drugs.

The company also announced last year that it was collaborating with GSK to improve the drug discovery process after Insilico completed pilot projects. In the initial phase of the collaboration, GSK will evaluate Insilico’s AI capability in identifying the biological origins of diseases. No other details were provided.

Insilico also lists ChemDiv, Life Extension, BioTime, Bitfury as clients. Insilico received $20 million in funding from A-Level Capital, an early-stage venture capital fund founded by the Johns Hopkins University alumni and students to support innovative businesses started and lead by alumni, students, and staff of the Johns Hopkins University.

Before Insilico Medicine, Zhavoronkov was the Chief Science Officer at Biogerontology Research Foundation, Director of the International Aging Research Portfolio, and International Adjunct Professor at the Moscow Institute of Physics and Technology, among others. He holds a doctorate in Physics from the Lomonosov Moscow State University.


Nuritas claims to have developed a machine learning application that finds and unlocks naturally occurring bioactive peptides from food sources. Peptides are chains of amino acids that potentially provide therapeutic solutions.

The company claims that it combines AI and DNA analyses to discover ingredients in different food sources that have therapeutic qualities, including the management of chronic metabolic diseases.

It’s unclear what input is required or what kind of interaction the user has with the application, but the video below explains that Nuritas’ application reduces thousands of hours of trial and error by automating the search for peptides in various food sources.  

The technology is also used to label or tag diseases, which helps the algorithms search for their origins faster. Once the peptides are predicted or discovered, the team can unlock the peptides from the food source for use in developing treatments.

The company does not make available case studies but has received about $28 million in funding from Enterprise Ireland, Cultivian Sandbox Ventures, European Union, Marc Benioff, VisVires New Protein, Ali Partovi, Enterprise Ireland and NDRC.

It also signed an agreement with Nestle last April 2018 to discover bioactive peptides through AI. Under the agreement, Nuritas will deploy its application to predict, unlock, and validate peptides from natural food sources. Nestlé for its part will apply its scientific know-how and technology to validate the efficacy of the peptides for specific applications.

Nuritas also claims BASF as another client.

Nora Khaldi is the Founder and Chief Scientific Officer of Nuritas. Prior to Nuritas, she was a research fellow at the University College of Dublin. She holds a doctorate in Bioinformatics and Molecular Evolution at Trinity College in Dublin and a Master’s degree in Mathematics from the Universite de Provence.

Data Preprocessing


In December 2017, BioSymetrics launched Augusta, a data preprocessing and analytics application which the company claims can analyze and integrate different types of biomedical and healthcare data with existing business processes using machine learning. According to the company, this capability expedites the deployment of AI initiatives in precision medicine, drug discovery, and health data applications.

The machine learning engine, called SymetryML, is capable of processing and analyzing raw data such as images, genomics data, streaming data, and compounds. Digital versions of compounds consist of alphanumeric text strings that can be used in printed and electronic sources of chemical data and information.

To digitize compounds, a set of mathematical rules are used to convert the graphical representations of the compound structures to unique digital codes. The codes, known as IUPAC Chemical Identifiers (IChIs), are alphanumeric text strings.

Specifically, it can recognize and process data in MRI/fMRI and other imaging formats, EEG, EKG/ECG, genetics, proteomics, data from wearables such as smartwatches and heart rate monitors, EHR/EMR formats. This capability allows the application to combine genomics with clinical data that will be used by businesses to develop drug treatments.

The company claims that this capability is relevant in drug compound and small molecule activity prediction, as well as drug discovery and drug development analysis which identifies, determines, quantifies and purifies the compound.

The company does not have a demonstration video available showing how its software works but cites an example where it studied 334 patients suffering from Alzheimer’s Disease. The company claims that the machine learning-driven application was able to examine information from several thousand MRIs, genome-wide association screening of many people to find variations of the disease, metabolic profiling to measure substances involved in each person’s unique metabolic process and family history. As a result, it was better able to predict the progress of the disease among patients compared with a single type of data, for example, using only respiratory or cardiovascular data.

The company reports that results show combining data sources enabled better diagnosis among the sample group, compared with just one type of data.

Gabriel Musso, Chief Scientific Officer, oversees research and development for machine learning. Previously, he was an Associate Scientist at Brigham and Women’s Hospital, focusing on machine learning frameworks to predict gene and small molecule function, and identification of disease-causal genes using large-scale genomic datasets. Gabe received his PhD in Molecular Genetics from the University of Toronto.

Takeaways for Business Leaders in the Pharmaceutical Industry

Most of the companies discussed in this report are involved in drug discovery and predicting drug treatment results among patients. Nuritas is focused on finding new and naturally occurring compounds from food sources rather than from synthetic compounds.

Most of the companies report that their solutions can be easily integrated into the client’s system. However, none of them provided detailed information about how one interacts with their products, and they also aren’t transparent about their integration process. As a result, we need to take their claims with a grain of salt.

The companies covered in this report are relatively newly established, mostly less than 10 years old. The newest company covered, BioSymetrics,  just launched its AI offering last year. The company’s offering, however, may yet prove to be a critical application in the future, especially since the pharmaceutical industry, and by association, the health industry manages and churns out data in various formats. It is important to be able to process a variety of data types in order to provide more robust and well-rounded research results to support drug discovery.

None of the startups make case studies available on their websites, but often announce through press releases their collaborations with global pharmaceutical companies such as GSK, Sanofi, Johnson & Johnson, Janssen, and Novartis, among others. The lack of detailed case studies might also be a result of protective measures in the battle for patents in the pharmaceutical industry.

Insilico Medicine indicated that it requires its researchers and scientists to sign nondisclosure agreements to protect the confidentiality of its research. For this reason, the company tightly guards the identity of its professionals and invests heavily in their training. This scenario might be true for most companies in the pharmaceutical industry.

The AI experts mentioned in this report hold doctorates or a combination of academic degrees in the deep sciences such as genetics, genomics, molecular biology, physics, as well as bioinformatics and mathematics. The have experiences ranging from 10 to 20 years in their fields of expertise.


Header Image Credit: BioSpectrum India

Stay Ahead of the AI Curve

Discover the critical AI trends and applications that separate winners from losers in the future of business.

Sign up for the 'AI Advantage' newsletter: