Building Trust in Data Science in Life Sciences – Bikalpa Neupane of Takeda Pharmaceuticals

Sharon Moran

Sharon is a former Senior Functional Analyst at a major global consulting firm. She now focuses on the data pre-processing stage of the machine learning pipeline for LLMs. She also has prior experience as a machine learning engineer customizing OCR models for a learning platform in the EdTech space.

01 – Building Trust in Data Science in Life Sciences@1x-min

AI and data science have seen increased use in life sciences, particularly in the pharmaceutical industry. However, despite the multitude of AI use cases, pharmaceutical companies tend not to embrace every emerging AI trend eagerly. 

Pharmaceutical companies must maintain compliance with regulatory requirements. Those requirements are continually evolving, and many have pre-dated the emergence of AI in the industry. Additionally, the FDA has proposed a regulatory framework regulating the use of AI in the medical industry. 

The issue of trust is of paramount importance for pharmaceutical companies, particularly when it comes to drug discovery. A significant contributing factor is that the cost of developing a new drug, from its initial idea to its eventual launch and availability for patient use, is a costly endeavor, in some cases exceeding $1 billion. 

How do pharmaceutical companies build trust? What does that trust look like specifically in the context of drug discovery?

Emerj Senior Editor Matthew DeMello recently met with Bikalpa Neupane, Takeda, on the ‘AI in Business Podcast’ for an in-depth look at the challenge of trust in life sciences and the promise that new generative AI use cases offer to address that concern. 

This article will focus on two critical takeaways from DeMello’s conversation with Neupane:

  • Enabling data from diverse sources: Addressing challenges related to ingesting data from disparate sources to make it available for data science and analytics activities.  
  • Addressing the issue of trust within data science practices: Equipping an AI data engine with the needed input systems to form trustable insights.

Listen to the full episode below:

Guest: Bikalpa Neupane, Head of AI and NLP at Takeda 

Expertise: NLP, AI, predictive analytics, leading enterprise-wide Generative AI strategy 

Brief Recognition: Previously, Neupane worked in product development and research at IBM Watson.

Enabling Data from Diverse Sources

At the beginning of Neupane’s appearance on the podcast, he describes how the biggest challenges in life sciences look from a data perspective. Neupane explains that from a data and AI perspective, life sciences is a data-rich vertical and recognizes that both opportunities and challenges are abundant. 

He goes on to clarify further, “I do see the challenges in data science at multiple layers,” then establishes his intent to explain that statement from the technical layer and the business layers, which include the corresponding operational aspect. The numerous challenges are evident. He demonstrates that the challenges include needing robust data foundations, infrastructure elements, data governance, and data quality audits. Reproducibility and scaling are also challenges.

Neupane explains that a barrier to enabling AI ambitions lies in the complexity of the data itself and also the diversity of data in the general healthcare sector. However, the challenges are more extensive than that. Neupane further explains, “The biggest challenge to me that I’ve seen across the company is how we enable the data from diverse sources and make it available for downstream data science and analytics activities.”

Neupane explains that Takeda is focused on how they can bring together data from diverse sources in one place to accomplish data science activities. He offers, “I guess all of us are pretty aware that one of the biggest challenges the industry is facing today is having disconnected data. In life sciences, though, the problem goes further in terms of data acquisition and data aggregation.” 

Neupane further elaborates that the issue becomes even more complex when looking at scientific drug discovery. He also doesn’t hesitate to mention that additional challenges also occur later on in the process, including running analysis, finding correlations, and developing data products.

Neupane then goes on to explain why drug discovery is a highly complex activity, emphasizing the needed preparation in the following statement, “It is important from the very beginning to spend some time understanding the data landscape. 

He also emphasizes the need to look at the entire picture: clinical data, experimental data, patient data, and lab data. Additionally, he explains the type of data that they need to look at data that occurs both pre and post-drug discovery, including:

  • FDA data
  • Operational data
  • Commercial data
  • Social media data

The volume of data generated from such a large number of sources – he mentions personal devices and the Internet of Things – is additive. Neupane offers, “Today, approximately 30% of the global data is being generated by the healthcare industry, and it is expected to grow, I would say, at the compound annual growth rate of 36%.”

Addressing the Issue of Trust within Data Science Practices

According to Neupane, he is focused on determining what the input systems are that his company could fit into the AI data engine to enable them to form trustable insights. This problem isn’t unique to Takeda, though. 

Neupane cautions that anyone bringing lifesaving medicines to patients, or anyone in healthcare or well-being, needs to be thinking about this. 

On behalf of Neupane elaborates, “It is our responsibility to become a trust-centered organization.”

Clarifying he means trust as a very technical term, Neupane further expounds, “The bigger question is how do we enable trust within our data science practices, making sure we try as much as possible to mitigate issues around data, biases, discriminations, ethics, which is what I mean by trust.”

He explains that the issue of trust is company-wide and external. “Now, the bigger question is how do I enable that trust within the life science community?” he asks. “How do I get the scientists to work at the same level as the technologist when it comes to data and AI?” 

A second consideration that he mentions is trust among stakeholders or external consumers. He also mentions that as a data science practice, you need to think not only about the current patient that you’re making a drug for but also about future patients.

“It’s not only the patients for today but also the patients for the future,” he tells the Emerj executive podcast audience. That focus on the future is how Neupane says he can bring data science into the equation in part because he’s pursuing predictions and forecasting as part of that future-oriented approach.

He then turns to the subject of going to market, explaining how it takes ten or more years to come up with a drug and execute that innovative drug in the market. Neupane mentions that pharma and life sciences tend to put trust at the core of everything mainly because companies want their data initiatives to stay within GDPR and compliance.

Furthermore, he says that there is a significant movement in the data space to manage data as a product rather than a service.

He describes that in much of life sciences and pharmaceuticals, there is governance-driven deployment when you are preparing to pilot and execute into production. He talks about the fact that organizations have guardrails in place today. As an example, he explains that at his company, for large language models in Generative AI, they allow team members to innovate and embrace this innovation, but all within the perimeters of the guardrails.

Neupane also explains developments that are happening within the R&D lifecycle and clinical side in the industry at large. He describes target discovery for drugs and the fact that there are LLMs trained on DNA sequences. He doesn’t fail to mention the rise of AI-based drug discovery companies across his industry, including companies like Recursion Pharmaceuticals and InSilico Medicine, both of which have models that are already being implemented. 

He explains at length the level of awareness his team has for things happening in this space, declaring, “There’s a whole movement going on.”

Neupane then describes use cases for LLMs in life sciences. He explains that with the preclinical and post-clinical aspects of drug discovery, he’s seen both his colleagues and other companies use LLMs to synthesize literature reviews. 

The use cases he mentions involve looking at publications and data from non-clinical studies for things like investigational new drug application filings and processing tons of unstructured data using natural language processing (NLP). He further explains that, even within the clinical trials, they consider social media listening and patient listening to inform clinical trial design and enhance patient recruitment strategy.

Neupane then opines that it takes a lot of work to get patients for clinical trials because models are being developed to identify patients who could benefit from a drug and help caretakers understand inclusion and exclusion criteria for a patient. These models help ensure patient safety once in the clinical trial.

He concludes by explaining the more accessible use cases that companies are already working on. They are operational use cases that involve LLMs, AI, and data science. They ingest a massive amount of data from multiple sources, including SharePoint data, unstructured data, chatbot data, JIRA issues, and ServiceNow tickets. They are aimed at trying to determine how a company can improve, become faster, perform things cheaper, be more efficient and productive, and enhance the user experience.

Stay Ahead of the AI Curve

Discover the critical AI trends and applications that separate winners from losers in the future of business.

Sign up for the 'AI Advantage' newsletter:

Subscribe