Episode summary: In this episode, we talk to Murali Aravamudan, Founder and CEO of AI-driven drug discovery startup Qrativ, a joint venture by the Mayo Clinic and biotech/data science firm nference. Murali and I discuss the surge of medical information and data in the medical industry, the role of artificial intelligence in developing drugs for treatments to various diseases, and the future of AI in drug discovery.
Murali and I will both be speaking at the Healthcare AI Applications Summit in Boston – on December 11th and 12th, 2017. I appreciate the folks at the summit connecting me directly with Murali, and I hope you’ll enjoy this episode:
Guest: Murali Aravamudan, Founder and CEO of Qrativ
Expertise: Healthcare information technology, data science, software development
Brief recognition: Murali Aravamudan is the Founding CEO of Qrativ. He also co-founded the software company Veveo in 2004 and served as its CEO for nine years until it was acquired by Rovi corporation (now TiVo). He earned a degree from the Indian Institute of Science in Bangalore.
(For readers with a broader interest in drug-related AI use cases, please see our recent full article titled: “Machine Learning Drug Discovery Applications – Pfizer, Roche, GSK, and More“.)
It takes billions of dollars to develop a drug. Machine learning may help make the most of this upfront cost by finding new conditions that might be treated by existing drugs. This “drug repurposing” process is immensely challenging, but AI may be able to use patient data and medical journals to find new application (without more billions spent on developing a drug from scratch).
Drug repurposing, or studying approved drugs used to treat one medical condition and testing its impact to the treatment of other illnesses and rare diseases, is done to speed up the discovery and integration of a new drug in the healthcare industry. It builds on previous research and drug development data to identify which drugs can work in treatment.
Another important information that researchers can use in this field is patient-related data. However, with the enormous amount of genomic information available, it is impossible for researchers to keep up with the pace of data processing without the help of artificial intelligence. Researchers can use neural networks and AI in identifying new therapeutic conditions for medicines.
(For readers with an interest in the vendor companies applying AI in healthcare IT security, check out our AI healthcare vendor list.)
Interview Highlights with Qrativ’s Murali Aravamudan
The following is a condensed version of the full audio interview, which is available in the above links on Emerj’s SoundCloud and iTunes stations.
[2.49) What’s possible today at the intersection of machine learning and drug discovery? How are new opportunities being spotted that couldn’t have been found before?
Murali Aravamudan: Today, the set of data that is available for machines to look at is exponentially growing compared to even three, four or five years back because of genetic sequencing kind of data, coupled with various digital sensors and so on, have provided more and more clinical data. And given all of these various data sources, one coming from the clinical operation practice but for acute conditions, say cancer, there are ranges of data from radiographic images on the cures of cancer are what kind of chemotherapy did they went through and how their bodies reacted, and so on.
And going all the way to data sets that are completely pre-clinical in nature that lab experiments on mice or lab experiments on test tubes that have been using certain chemicals for a certain risk, so when you have this collection of data sets, the problem then becomes one of triangulation. How can you connect the dots of how a certain drug is going to work well with a certain disease condition? Or it can work well with the disease condition only of certain genetic characteristics, so only a subset of population of a certain cancer may react well differently to a particular drug.
So these kinds of connecting the dots was harder a few years back. Modern AI techniques in the last three years have given new sets of tools that have built on top of various machine learning techniques that happened in the last ten years or so. And that enables us to make this triangulation question do in a more meaningful sense. And it is now possible for us to generate a set of hypothesis, which is far, higher likely without being correct, without having to spend years on pre-clinical experiments.
So the ability to essentially take lots of drugs and assets that have gone through some amount of pre-clinical testing, but make use of the other assets of data that we have and come across some new pieces of information. So this particular drug now has a chance of being very effective for a different disease indication, which was never even researched by the original innovator of the drug. That is absolutely possible to them.
(5.58) And it sounds like, from a business perspective, there are two different large aspects to this. Number one, there might be a new opportunity for a fit for a new drug where there might be a larger addressable market that can be coaxed out from this kind of information. And then the other side is making use of, as you would put it, research that’s already been done. I have no idea what it costs over the ten years of getting a drug tested and out into the market, you would have a much better than I, but this is millions of dollars, right?
Murali Aravamudan: That’s correct. There is a controversial figure as to what that money is per drug cost. For an approved drug, you could see estimates of a billion of dollars. That is based on how they include the cost of failed drugs as well, but it’s easier said than done. Without loss of generality, you could assume it’s in the hundreds and millions of dollars.
(6.57) It seems safe to assume. We’re talking ten years; we’re talking very high-priced equipment and professionals, lengthy processes, anything that takes a decade in today’s day and age. And that’s a lot of professionals waiting that amount of time. So there’s also that kind of ability to make up for previous researches as you and I were talking off microphone, you were talking about how sometimes, five years into that ten-year process, something will have to be stopped for regulatory reasons, patent, change of direction for the company, or whatever the case may be.
It would seem a shame if that same information couldn’t be mined, leveraged, and maybe couldn’t be used to find a new opportunity for a new drug match that would still be fruitful, making use of previous work in a way that it’s possible. So there seems to be kind of both efficiencies and new revenue that are sort of part of the potential business facet here.
Murali Aravamudan: Yeah, and let’s not forget the patients – the patients, who today may not even have any possibility of a drug to treat them. Now for them it’s God-send, right? From a purely commercial perspective, there is a possibility of us making a lot of revenue doing this process. What I find particularly fascinating and helpful is the fact that with an enormous amount of human work that has been done in the past, we could leverage in turn to save or improve human lives. To me it is as important as commercially making money.
(8.50) Then there’s that aspect of the healthcare field. I think healthcare and finance get a lot of venture money in this AI domain in part because it’s just a tremendous amount of information taking around. But with healthcare, there are so many issues with accessing that information. But I think it’s also in part because of the noble aim of what if new patterns could be found to the colloquial beat-up example is to cure cancer. And it could be any condition that people are suffering with.
If we could find treatment to some kind of condition, isn’t that a noble thing itself? Clearly, being able to find new ones would be, and that part of you guys that are working on. I’m interested in how some of this works. This is reasonably large amounts of information here from various clinical trials and patient data information, past research and wherever else. To be able to find the overlaps, gaps or opportunities in all of that, why is AI necessary and what is its role in that process? Why is it critical to take that, making that a reality?
Murali Aravamudan: If you look the number of people who are writing papers generally in biomedicine compared to 20 years back, say mid ‘90s to now, that number literally has exponentially increased. So on every day, if you look at the number of new papers that come out, that number dramatically is in the hundreds compared to what was one paper a day.
So if you go to different conferences, in the past – 20 years back, there might be a hundred to 200 papers presented by different physician scientists.
Now, a single conference like the American Society of Clinical Oncology or ASCO, has 35,000 abstracts. Think about it: one conference has 35,000 abstracts so it’s just not possible for human beings, no matter how great our mind is, to grasp and consume that kind of data. So this is the reason why AI is coming into play. The fact that we have these enormous amounts of data, can the machines find certain patterns and synthesize all these knowledge that can be more easily consumed by the human brain.
(11.39) Just for clarity there because I imagine the audiences wondering about this, I had thought about the present application that you folks are working on as looking at research data, maybe from pharma companies (many who whom we’ve covered in the past) and Mayo clinic, and from patient information. I wasn’t thinking about this pulling in from academic literature as well. Is this part of the mix in some way?
Murali Aravamudan: It is in fact not just of the mix – that is the fundamental mix.
(12.06) That’s interesting because if we have all these genetic information about patients and really deep blood sampling stuff and whatever else, and then we have information about chemicals and pharma, I could see some overlaps quite literally within the data itself if we know what we’re looking for. With academic literature, we’re looking now at natural language processing and trying to find correlations between words and terms and chemicals that might be creating a pattern worth looking at.
Murali Aravamudan: Absolutely. You nailed it on the head. What happens then is you ask a question about a disease, which can be a phrase like ‘non-small cell lung cancer,’ which looks like five works, right? But that is a phrase. And you ask, what genes and drugs are associated with this phrase? And you these kinds of questions, right?
You find very similar interesting patterns between diseases and genes, diseases and drugs, and various other diseases what biologists call ‘phylotypes.’ That is one kind of signal we call the ‘semantic signal.’ We then use the semantic signal and see if the semantic signal has support from the clinical data, from the genetic data – that’s how the triangulation process works.
(13.41) So this will be good, and we can clarify this conceptually. It seems as though you have to possibly define a problem and maybe define relevant terms that might relate those phrases and whatever sign of looking at academic literature. And then we also have an understanding of what the chemical traces whatever this condition might look like in a person or in a drug, and see what we match and cluster for within the drug-side of things and with the patient-side of things.
And so in my mind, seeing a triangle of research papers, patient data and maybe more pharmacological data and once we have our terms defined, we can then start to see what the overall lapse are and if there’s any large gap that humans are doing, the thinking and the priming ahead of time. Am I on or off in terms of what’s being lined up and run here?
Murali Aravamudan: You are right on most of the processes. Humans have to initiate the process. The machine is not suddenly going to be interested in curing a neurological disease called ‘multiple sclerosis.’ So we have to start it. But once we start, what happens in this process is it tends to be very iterated.
The machine will give you some surprising insights, showing, like in a Mayo clinic setting, we actually share some of the short-listed things with physician scientists, and these guys refine it by adding new sets of terms, and so on. So what we try to process between machine and people is the final process like, “Here are the final four drugs, which have a capability to cure a particular disease with certain characteristic.
(16.05) Definitely. Oftentimes, when people are selling an AI tool that other companies have to use, they have to, consciously or subconsciously, have to explain it in a way that makes it seem very, very simple to use. I think in your case fortunately, licensing and working with big pharma companies would be more of a game, and from what I gather, you’re not selling software that does this, more that you guys are actually doing the crunching yourself.
There’s initial hypothesis, tweaks and testing among these sets of data, additional adjusting between these sets of data, and what the machine does is to bring forth what you asked it to bring, and maybe find the overlaps and underlying connections and correlations between data that would be far too much for a human to manually label in a spreadsheet and to do that very quickly.
Murali Aravamudan: Absolutely. In fact, you used a very interesting word. One of the reasons why in the last few years, we have been able to do this and not before is because we used a technique called, ‘unsupervised training.’ Millions and millions of these documents are being consumed by the machine without labels.
We don’t have labels here so that unsupervised training is what makes this feasible for us to tell what are the incipient connections from the data. What does the data teach us? All the machines can do is to tell the initial instructions but then humans have to help.
(18.41) What you’re talking about is instead of telling someone to go in and come up with some new manual human label for any patient data that needs a certain criteria, this is a system that can cluster things according to related pattern without a person tagging it. The machine knows what it’s looking for and be able to coax out and find those patterns itself without a person having to name and identify things.
The last thing that I want to touch on, is what you’re most excited about for future promise. We talked about what is possible today, clearly of these are new. The organization that you’re with at the time of this recording, there’s still a lot to be done but clearly, it’s very exciting field. And you must be eager about the future. In terms of the big, low-hanging fruit in the next half-decade ahead or so, what are you most excited about this kind of technology?
Murali Aravamudan: So they say, “The proof is in the pudding.” We really have to make these insights from strong hypothesis to medicines that work in human beings. What I’m excited about is the possibility of advancing few drugs into the clinic and being really helpful to human beings in that particular fashion.
There is an underlying notion of improving unsupervised training and that is a very hard problem so we have several five to 10 years of work to do in unsupervised training that when we do this iterated process, can we limit the human beings’ involvement as much as possible? So we are not there yet. Those are the kinds of improvements that we need to refine the technology for. It’s about can we really help bring more drugs to people who really need it.
(21:25) At the end of the day, that’s the low-hanging utilitarian benefit if we could be so bold as to hypothesize that. And also it’s critical for the business. If no one at the end game here is benefitting, what the heck are we doing? How are we going to keep the gears turning? Clearly, curing challenging diseases is very important work. Moving forward on unsupervised learning, are there domains within medicine or pharma that you are more excited about? Are there kinds of conditions that tend to be challenging when humans try to manually find patterns that you think might be more likely than others, area where you’ve already seen interest? Or is it really all the same white canvas right now?
Murali Aravamudan: We could do some rough idea of that. Oncology cancer is the place where we have the most data to date because of various government-funded efforts where over 10,000 people’s tumors have been genetically sequenced, and so on. But one of the key things that we say, is the notion of precision medicine of a drug for a particular generative cohort, need not be constrained to oncology.
So we view the immune system as an integral part in most many disease conditions. Many of immune system diseases turn out to have signals that emerge from oncology data sets.
So we are very excited about applying those kinds of immune system signals that bring precision to precision neurology or precision ophthalmology. We could think of various such conditions, which all of them can benefit from precision medicine and not just oncology.
(23:43) It sounds like in terms of where the breadth of data that’s been collected, there have been government initiatives that sounds really to knuckle down cancers, which clearly they are right up there in terms of mortality for Americans and for First World people in general. It sounds there are ardent efforts to add them and collect information, particularly, genetic information, but that what we see as tracers and ties to genetic footprint and to the immune system in general can potentially be extrapolated so the insights might have been unearthed for the sake of cancer but there are ways to tug, pull and find ties and meaning in that information that might spill out into other areas and create some benefit.
Murali Aravamudan: That’s absolutely correct.
Header image credit: CALmatters