Artificial Intelligence for Pharmaceutical Research and Development

The facilitation of research and development (R&D) is perhaps the most common use case for AI applications in the pharmaceutical industry. There are numerous solutions for extracting and organizing research data from clinical trial notes and other medical documents. Additionally, there is software that can purportedly analyze data from images of drug compounds at the molecular level.

AI could also help pharmaceutical companies manufacture newly discovered drugs more efficiently. This is accomplished by making predictions about how the drug may react when manufacturers try to turn it into a pill, liquid medicine, or topical salve, for example. Data scientists can find out if a drug is likely to break down or become less useful when processed into these types of products before the pharmaceutical company actually takes that risk.

In this overview of the possibilities of AI for pharmaceutical R&D, we cover:

Natural Language Processing for Drug Discovery: Searching through databases of unstructured data such as digitized past clinical trial reports.
Predictive Analytics for Drug Discovery: Predicting how certain molecules will react with each other to figure out how effective a drug might be at treating a disease.
AI for Salt and Polymorph Testing: Using AI to determine the solubility of a drug compound.

We begin our overview of AI for pharmaceutical R&D with the possibilities for natural language processing (NLP) software and how it facilitates the use of unstructured data.

Natural Language Processing for Drug Discovery

Most software vendors that offer drug research and discovery solutions purport that they handle big data analytics or microscope imaging of molecular drug compounds. In contrast, natural language processing software could enrich drug research by extracting information from unstructured data sources to be incorporated into the testing of current and future drug molecules.

NLP applications could sift through previous research documents for findings that are more relevant to a pharmaceutical company’s research than they were when originally discovered. This could include digital lab notes and statistics within the pharmaceutical company’s R&D database.

If a data scientist used an NLP software to search for previously discovered chemical reactions of the drug they were testing, they may no longer find the need to conduct a given experiment. In this case, the scientist would have found that they have already researched the answer and can factor that information into later experiments.

Pharmaceutical companies typically save their lab notes and clinical trial data into a database in order to keep track of their experiences with certain drugs, molecules, and chemicals. These notes are likely written by one of the company’s scientists during or shortly after the experiment and include pharmaceutical jargon along with some colloquialisms.

An NLP software developer would need to label these documents and use them to train their machine learning model to recognize distinct fields on each type of document.

Clinical trial data, along with physician’s notes, can also reveal patients’ experiences with adverse effects from drugs. Because of this, an NLP software could be trained on clinical trial reports and data from electronic medical records (EMRs) in order to identify these experiences and mark any relevant drugs or compounds as possible contributors to the adverse effects discovered in those documents.

That said, pharmaceutical companies do not typically have access to hospital records unless they have some type of partnership with a hospital or healthcare company. A partnership such as this could help pharmaceutical companies can access the hospital’s data and label it for training their machine learning model. This could result in deeper training on a wider range of medical terms, topics, and diseases.

We spoke to Amir Saffari, Senior Vice President of AI at BenevolentAI, about what the future holds for AI and drug discovery. When asked about how his company consumes all of the scientific literature related to their drug discovery research, Saffari offered the following insight regarding the use of data from other companies, or external data:

One area our company focuses on is to machine read all the literature, patents, and documents. There is an enormous amount of research that gets published every day. Often, people that work in the scientific domain would just focus on one area and not read the other journals. But there is a lot of relevant data in other journals that can inform decisions in the areas that a person is researching on. So we machine read all available literature and pool it together in a database of facts that can be extracted from this literature. That forms the basis of the hypothesis that we generate to find therapeutic targets for diseases.

One of NLP software’s capabilities that has a particularly helpful use case in pharmaceuticals is text mining. This is a process by which the pharmaceutical company in question runs their research database through an NLP engine to search for topics, phrases, and terms within that database. The NLP engine would then attempt to find all the data related to the user’s topic and present them as search results or on a dashboard.

Pharmaceutical researchers could use text mining to identify chemical or dosage amounts that may have contributed to the results. This could include any imbalances the patients may have had during the clinical trial, the dose size given to each patient, and how frequently each dose was given.

One example of a software vendor that offers a text mining solution for pharmaceutical research is Linguamatics. The company has a considerable amount of dedicated AI staff with an academic background in the field. Additionally, they documented their clients’ success in the form of case studies that detail how the solutions are employed and the kind of ROI they offer.

Linguamatics claims their software, I2E, can help pharmaceutical companies search through their own research data regarding different chemical compounds and experiments. This allows their clients to more efficiently access information about similar projects so they can understand how a chemical may react before testing it with another.

Below is a short 7-minute video from Linguamatics that gives an introduction to the capabilities of text mining. The steps shown in the video are as follows:

1:00: Using Search Engines and Keywords effectively for Text Mining
3:23: Interpreting the Meaning of Text found with the software
5:10: Automatically extract relevant data to the keywords used
5:33: Comparing the two Approaches on how they each may benefit the user

One of Linguamatics’ case studies states that their I2E software helped Roche Pharma Research and Early Development more efficiently discover new drugs. They allowed medical chemists at Roche to search through both internal and external databases for data regarding the relationships between certain chemical compounds and their target disease to treat.

The case study also states that Linguamatics was able to help Roche Pharma develop their own AI platform. Roche named this platform Artemis, and it allowed them to search for chemical and pharmacological terms efficiently. Roche Pharma purportedly saved $10,000 per search based on an equivalent full-time cost of $200,000 per year.

Predictive Analytics for Drug Discovery

Pharmaceutical companies could also use predictive analytics to isolate a specific molecule and test its effectiveness at treating diseases and illnesses they are working to treat.

The capability of predictive analytics technology to use past drugs, molecules, and research assets that may have had some previous clinical testing is a large factor in setting AI software apart from drug discovery methods of the past. AI and data analytics could drive efficiency by utilizing this previously established data to find deeper and more relevant insights on previous chemical testing.

Some AI applications are advertised to handle large amounts of pharmaceutical data to discern the physical and chemical traits of a drug molecule that could be useful in predicting success in the long term. These traits could include the way the molecules interact with others and how the individual atoms within that molecule must fit together.

Pharmaceutical companies could use predictive analytics software to search through these data points about chemicals and molecules to find similarities to the one being tested. With this information, they could compare the success or failure of the past molecule to how the current one is intended to be used. This allows them to have a more keen understanding of the molecule’s potential commercial value.

A company might use predictive analytics to determine how likely a newly identified drug is to do well in clinical trials. This would happen before the clinical trial happens so that researchers and business leaders can find the safest drug to test. This would help prevent harm from adverse effects of the drug as well as drive sales by selecting the most likely drug to succeed.

This would be based on past documentation of clinical trials that involve drugs with a similar chemical makeup to the one being tested, as well as drugs used to treat the same illnesses.

Predictive analytics could also benefit from training data from outside sources. This includes information about patients’ immune systems and oncology history, especially for cancer research enterprises. In addition to a partnership with a hospital or healthcare company, pharmaceutical companies could acquire this data by collecting it from their clinical trials. Additionally, they could buy access to a vendor’s database to further enrich their training data.

We interviewed Murali Aravamudan, founder and CEO of Qrativ Biotech, about the possibilities of AI in pharmaceuticals and how external data streams could help drug discovery. When asked about medical areas that are challenging or have seen a considerable amount of buzz, Aravamudan said:

So we view the immune system as an integral part in most many disease conditions. Many of immune system diseases turn out to have signals that emerge from oncology data sets. So we are very excited about applying those kinds of immune system signals that bring precision to precision neurology or precision ophthalmology. We could think of various such conditions, which all of them can benefit from precision medicine and not just oncology.

In order to make predictive analytics useful for pharmaceutical companies when working at the molecular level, other technologies may be necessary for facilitating data collection to train the machine learning algorithm on data regarding microscopic substances. The AI vendor Xtalpi attempts to solve this problem by combining predictive analytics with cloud-based high process computing. They also purportedly use quantum physics and chemistry to aid in analyzing this data accurately.

Below is a graphic from Xtalpi that highlights the stages of their workflow to produce accurate predictions. These include analysis of the molecule being tested, targeted searches for known crystal structures the drug could take, listing and ranking these possibilities by usefulness, and testing for how the molecules behave at room temperature:

Xtalpi claims their software helps pharmaceutical companies identify the optimal form to produce their drugs such as the form of a pill or eye drops. The client company also claims this is helpful in late-stage drug development where they are deciding how the drug will be mass produced and packaged.

This predictive analytics solution could also offer this information for the purpose of driving clinical trial success. It can also help defend against challenges to the pharmaceutical company’s patent on the molecule.

The solution Xtalpi offers to pharmaceutical companies is called “crystal structure prediction” because it can purportedly discern how likely a drug is to form a certain type of crystallization when the molecules are combined. They list four drugs on their website that have been tested using crystal structure prediction and each has an accompanying case study illustrating the software’s predictions.

These case studies, however, do not mention any pharmaceutical clients by name and instead focus on the large amount of data the software might produce from a single sample of a potential drug product.

The case studies each contain the drug’s number of H bond receptors, or atoms in the molecule or compound that can bond with hydrogen. They also show how likely the drug is to form a certain type of crystal structure using the observed energy of the chemical reaction and density of the atoms involved.

AI for Salt and Polymorph Testing

An important step in pharmaceutical research and development is determining a drug compound’s level of solubility in water and other liquids, its crystalline form once it becomes solid, and the stability of that structure once bound together. Salt and polymorph testing or “screening” is the process of choosing the best physical form in which to manufacture and distribute the drug. This also helps to determine how long the drug can sit before it expires.

Pharmaceutical companies could use machine learning and AI applications to facilitate this process on multiple levels. Predictive analytics applications could find any existing data on crystalline structures of drug molecules to give the user an idea of what the drug may look like under a microscope once in pill form. This would help them determine if the molecule is still useful once bound in that way or if the active ingredients would react better when processed differently.

A data scientist at a pharmaceutical research facility may be able to predict the results of a salt or polymorph screening using machine learning and thus be able to augment the drug before it begins. This allows scientists to effectively “skip” research steps they had initially planned to take but found that the answer already existed within their database.

It is important to note however that this would be difficult to accomplish without an expert in data science or on the relationship between different types of pharmaceutical information. This is because if a pharmaceutical researcher is not familiar enough with the machine learning technology aggregating his company’s data, they won’t be able to use it in any helpful way. We spoke to Grant Wernick, CEO and co-founder of Insight Engines, about this issue.

In our interview with him, Wernick stressed the importance of subject matter experts in any data science environment. When asked about the challenges faced by industry professionals who are not as familiar with AI software, Wernick said:

And so today, we’ve been in this world of “put it somewhere, just put it somewhere, and we’ll figure it out later” kind of thing. ‘Just the fact that we have it stored somewhere, we’re going to be able the do something with it.’ And then people start realizing, ‘ok, now we have it, what can we do with it?’ There’s this huge talent gap. In every organization I spend time with, there’s one or two people who are really the deep folks that can dig into that data, and everybody else on the team wants access to this stuff. They have ideas. These people are oftentimes domain experts. I live and breathe in security and IT, and you have people who are awesome at security, but they don’t always have the technical chops for the various different products an organization may bring on.

Tessella is a data science consultancy which offers AI and data science solutions for drug discovery and manufacturing. They claim their software can help develop pharmaceutical treatments more quickly by helping their clients develop machine learning models for what they are specifically studying.

According to a case study from their website, Tessella claims they helped GlaxoSmithKline (GSK) facilitate and improve their salt and polymorph screenings for new drug compounds. Tessella worked with GlaxoSmithKline on a machine learning model that automated liquid mixing, solid dispensing, heating, cooling, shaking, and transportation of samples.

Users could input details regarding the screening into the software to produce the protocol for each of the experiments, which totaled up to 400. This included the correct type of solvent to mix the drug into, as well as counter-ions that may pose a risk to the process. The software would then determine the correct amounts to be dispensed according to the purity and solubility of the chemicals as determined by the user.

The system collected data such as Raman spectra, X-Ray Powder Diffraction (XRPD) patterns, and optical microscopy images during the screening. GSK then process this data to find the polymorphic shapes of these compounds. The software could then present all of these experiments as an array of data points. This enabled the user to visualize trends and compare findings between each experiment.

The case study also states that the new automated screening process drove consistency to make sure that all test methods and results could be replicated. This also minimized cross-contamination within the samples. GSK’s new workflow also let them see reduced company spending, eliminated manual note taking, and centralized the new research data.

Header Image Credit: Drug Attourney