[seopress_breadcrumbs]

Driving Patient Experiences Through Data Science-Driven Approaches to Infrastructure – with Xiong Liu of Novartis

•

August 19, 2024

Explainable AI models are essential in pharmaceutical R&D because they provide transparency and understanding of how AI-driven predictions are made. In drug discovery and development, stakeholders, including researchers, regulatory bodies, and healthcare professionals, need to trust and understand AI models’ outputs to make informed decisions. Without explainability, AI models can be seen as “black boxes,” leading to skepticism and reluctance to adopt these technologies in critical decision-making processes.

This lack of transparency can hinder the approval of new drugs and slow down innovation, posing significant business problems by increasing the time and cost associated with bringing new treatments to market. As noted by many published sources, from Nationwide Children’s Hospital to Eli Lilly, it takes an average of ten years and hundreds of millions of dollars to get a new medication approved by the FDA.

Knowledge graphs can address these challenges by enhancing the explainability of AI models in pharmaceutical R&D. They provide a structured representation of complex biomedical data, linking entities such as genes, proteins, and diseases with transparent, interpretable relationships.

Emerj Senior Editor Matthew DeMello recently spoke with Dr. Xiong Liu from Novartis on the ‘AI in Business’ podcast to discuss the integration of AI, advanced data management techniques, and knowledge graphs in pharmaceutical R&D. They discussed challenges and strategies related to the technology stack, data quality, algorithm performance, and infrastructure requirements.

In the following analysis of their conversation, we examine two key insights:

Driving model explainability by addressing bias: Enhancing model accuracy and user trust by ensuring diverse data representation to avoid prejudice and by identifying specific features that contribute to predictions to enhance the explainability of models ultimately.
Utilizing knowledge graphs for enhanced predictions: Capturing and querying relationships between entities like genes and diseases with knowledge graphs to improve prediction capabilities.

Listen to the full episode below:

Guest: Dr. Xiong Liu, Director of Data Science and AI at Novartis

Expertise: Technology innovation, Partnerships, and Strategies.

Brief Recognition: Dr. Xiong Liu has ten years of experience in pharma R&D (diabetes, neuroscience, immunology, and oncology) and 20 years of experience in data mining and machine learning. He has led data and AI programs to accelerate drug development, from target discovery to clinical trials and post-marketing research.

Driving Model Explainability by Addressing Bias

Xiong begins the podcast by discussing the integration of AI in pharmaceutical R&D, highlighting the following points:

Technology Stack and Business Applications: The technology stack in pharmaceutical R&D includes data, algorithms, and platforms. Each layer presents unique challenges that must be addressed to support drug development stages, from early discovery to clinical trials and real-world evidence.
Data Challenges: Data comes from various platforms, leading to inconsistencies (batch effects). Harmonizing these datasets and preprocessing them for machine learning is a significant challenge.
Algorithm Performance: AI algorithms don’t always perform perfectly. Extensive training, testing, and evaluation are required to determine the best algorithms for specific use cases, ensuring practical, real-life applications.
Infrastructure Requirements: Robust infrastructure is needed for data storage, GPU computing, model training, testing, evaluation, and deployment. Existing platforms in AI and health tech can help, but it’s essential to leverage their benefits while addressing specific challenges.

He discusses the complexities and challenges related to data in pharmaceutical R&D, focusing on five key issues:

Data Quality and Preprocessing: Data quality can be inconsistent due to the various platforms and technologies used to generate it. An example is single-cell RNA sequencing, where different platforms create batch effects, necessitating quality control and preprocessing to remove artifacts before applying machine learning algorithms.
Data Sharing: Data sharing faces challenges but can be facilitated by principles like FAIR (Findable, Accessible, Interoperable, and Reusable). Implementing these principles on cloud computing platforms can enhance data sharing across organizations.
Ethics in Data Handling: Ethical considerations include ensuring patient data is de-identified and anonymized before modeling. Standard algorithms are available to achieve this, and it’s crucial to adhere to ethical standards.
Bias in Modeling: Models may exhibit bias if the data represents only some patient populations, leading to inaccurate predictions for underrepresented groups. Ensuring diverse data representation is essential to characterize features accurately across populations.
Algorithm Accuracy and Explainability: Deep learning, particularly with transformer models, has improved precision and recall in predictions. However, these models often need more explainability, which is crucial for user trust and understanding. For example, in predicting patient outcomes like hospital readmission, it’s essential to identify specific features (keywords, patterns) that contribute to the prediction to address explainability issues.

“So, for example, if we think about explainability when we make patient outcome predictions using the clinical notes in the EHR systems, we can say, ‘Okay, now the algorithms can accurately predict the outcomes.’ For example, hospital readmission. Now, what are the specifics? What are the features of the data? So then, we can find that specific keyword patterns are associated with the high occurrence of hospital readmission. For example, if there are serious symptoms, and if they do not have those preventive surgeries, then there’s a likelihood of readmission. So those kinds of expandability issues we have to address as well.”

– Dr. Xiong Liu, Director of Data Science and AI at Novartis

The FAIR data principles are a set of guidelines aimed at making data more accessible and reusable in the field of data science. FAIR stands for Findable, Accessible, Interoperable, and Reusable. It means:

Findable: Data needs sufficient metadata, a unique identifier, and must be indexed in a searchable resource.
Accessible: Metadata and data should be machine-readable and stored in a trusted repository.
Interoperable: Data should have a standard structure, with metadata using recognized terminologies.
Reusable: Data must have clear usage licenses, provenance, and meet relevant community standards.

Utilizing Knowledge Graphs for Enhanced Predictions

Xiong explains how knowledge graphs are beneficial for organizing and utilizing data in pharmaceutical R&D. Firstly, knowledge graphs effectively capture relationships between data entities, such as gene regulatory networks, protein-protein interactions, and gene-disease-drug relationships. They enable users to query entities and relationships of interest quickly, supported by associated databases and query technologies.

Additionally, knowledge graphs enhance prediction capabilities through representation learning. Throughout the process, different entities (e.g., genes, diseases, cells, patients) are represented in a hidden space. Leveraging knowledge graphs This method, similar to dimensionality reduction, captures underlying information and creates embeddings—new data structures that significantly improve prediction capabilities.

These applications have practical benefits. For example, in gene function prediction, knowledge graphs can increase accuracy by learning embeddings from diverse data. Similarly, learning patient representations from knowledge graphs built from EHR data can enhance the prediction of patient outcomes, outlooks, or prognoses.

Xiong discusses the collaboration between tech companies and pharmaceutical companies, emphasizing the role of leading cloud computing infrastructures like Azure and AWS. He notes that while these infrastructures are widely used, the challenge lies in translating them into decision-making and predictive power within the pharma and healthcare sectors. The process is influenced by business considerations, such as goals and costs, which shape the development of initial platforms, data models, and use cases.

Xiong highlights the importance of agility, which involves understanding customer and stakeholder requirements. Understanding these factors helps to determine the scope of data and the selection of algorithms. While many open-source algorithms are available, choosing suitable initial use cases is challenging. Once use cases are selected, it becomes clearer how to collect data, set up models, and determine the necessary platforms for running these models.

In early exploration within life sciences, starting with more minor, manageable projects is expected. For instance, high-performance computing (HPC) might be used initially, and depending on the needs, GPUs may or may not be required. By achieving initial results and communicating them to stakeholders, support, and interest can be garnered, paving the way for scaling up computing power, use cases, and applications across different disease areas.

Xiong discusses the technical aspects of storing knowledge graphs, explaining various methods and tools. He begins by mentioning simple storage methods, such as using text or tabular formats. These methods involve capturing entities and their attributes in tables and mapping the relationships between entities. While the approach can be helpful for computational purposes, it is not ideal for efficient storage and retrieval.

To address these limitations, Xiong highlights advanced tools like Neo4j and MongoDB. Neo4j is an open-source tool that stores data in a graph format, allowing for complex queries and retrieval of subgraphs. MongoDB, a NoSQL database, addresses scalability and can be used in combination with other technologies to enhance data management.

Xiong also notes that many vendors are working on customizing and combining different technologies to better manage large-scale healthcare data. These efforts aim to leverage the strengths of various tools to create more efficient and practical solutions for handling knowledge graphs.

[mrj_paywall] unauthorized access

Recommended from Emerj

Managing End Point Storage in Hybrid Data Strategies for Financial Services – with Yonas Yohannes of Oracle

Transparency in AI is a major hurdle for businesses, particularly in the financial services industry. Generative AI (GenAI) models, particularly non-deterministic models, are often viewed as “black boxes,” making it difficult to understand the underlying decision-making processes. Due to this black box risk, banks can experience multiple types of AI incidents, including system glitches, data…

Riya Pahuja

•

March 31, 2025

NLP Logix’s AI Collaborate 2024: A Look at the Future of GenAI Experiences from Sports to HR

This article is sponsored by NLP Logix and was written, edited, and published in alignment with our Emerj sponsored content guidelines. Learn more about our thought leadership and content creation services on our Emerj Media Services page. Event Title: AI Collaborate 2024 Event Host: NLP Logix Location: Florida, USA Date: November 19-20, 2024 What Happened…

Riya Pahuja

•

March 26, 2025

Artificial Intelligence at Aflac – Two Use Cases

Aflac is a global leader in supplemental health and life insurance, providing financial protection to over 50 million policyholders in the U.S. and Japan. In 2023, Aflac reported an annual revenue of $18.7 billion. With approximately 12,785 employees worldwide, Aflac continues to drive innovation in cancer and medical insurance. Although Aflac's total investment in AI…

Ashwin Telang

•

March 24, 2025

Understanding AI’s Expanding Role in Drug Discovery and Life Sciences R&D – Liran Belenzon of BenchSci

This interview analysis is sponsored by BenchSci and was written, edited, and published in alignment with our Emerj sponsored content guidelines. Learn more about our thought leadership and content creation services on our Emerj Media Services page. The intricate nature of biological systems significantly complicates drug development. Diseases often involve complex interactions among genes, proteins,…

Riya Pahuja

•

March 4, 2025

Long-Term ROI for GenAI in Healthcare – with Ylan Kazi of Blue Cross Blue Shield North Dakota

Mobile technology, including smartphones and wearable devices, collects health-related data such as physical activity metrics (e.g., step counts, heart rate), sleep patterns, and self-reported health information through surveys and applications. A 2022 research paper from Stanford University shows how these data enable AI systems to monitor health trends, predict potential health issues, and personalize healthcare…

Riya Pahuja

•

March 3, 2025

Tackling Chargeback Challenges at Scale with Machine Learning and Dynamic Arguments – with Roenen Ben-Ami at Justt

This interview analysis is sponsored by Justt and was written, edited, and published in alignment with our Emerj sponsored content guidelines. Learn more about our thought leadership and content creation services on our Emerj Media Services page. Merchants face an escalating challenge with chargebacks, leading to significant financial losses. According to a report by Mastercard,…

Riya Pahuja

•

February 26, 2025

Generative AI as a Catalyst for Enterprise Innovation – with Deborah Golden of Deloitte

This interview analysis is sponsored by Deloitte and was written, edited, and published in alignment with our Emerj sponsored content guidelines. Learn more about our thought leadership and content creation services on our Emerj Media Services page. Large enterprises' swift adoption of generative AI (GenAI) is driven by the quest for cost reduction and operational…

Matthew DeMello

•

February 25, 2025

Artificial Intelligence at Wells Fargo- Two Use Cases

Wells Fargo, a major financial services company, has its headquarters in San Francisco, United States. The company operates in 35 countries and has approximately 70 million customers worldwide. Financially, Wells Fargo reported strong performance in 2023 and generated $19.1 billion in net income, an 11 percent increase from the previous year. The company also reported…

Riya Pahuja

•

February 24, 2025

Keeping Up with AI Regulations in a New Age of Data Privacy – with Leaders from OneTrust, Microsoft, and TELUS

This interview analysis is sponsored by OneTrust and was written, edited, and published in alignment with our Emerj sponsored content guidelines. Learn more about our thought leadership and content creation services on our Emerj Media Services page. The rapid evolution of privacy laws, such as the General Data Protection Regulation (GDPR) and the California Consumer…

Riya Pahuja

•

February 19, 2025

Transforming Contact Center Operations With GenAI – with Venkatesh Korla of HGS Americas

Generative AI (GenAI) is rapidly transforming customer service, helping enterprises cut costs, boost efficiency, and meet regulatory requirements. According to research from the US Bureau of Labor Statistics, implementing a GenAI-powered assistant led to a 14% increase in successfully resolved customer issues in a Fortune 500 company. Such growth demonstrates AI's ability to efficiently handle…

Branden S

•

February 18, 2025

A Blueprint for AI-Driven Process Optimization for Regulated Industries – with Vrinda Khurjekar and Paul Pallath at Searce

This interview analysis is sponsored by Searce and was written, edited, and published in alignment with our Emerj sponsored content guidelines. Learn more about our thought leadership and content creation services on our Emerj Media Services page. Regulated industries, such as healthcare and finance, often face significant challenges in adopting new technologies due to stringent…

Riya Pahuja

•

February 18, 2025

AI Regulation and Risk Management in 2024 – with Micheal Berger of Munich Re

As AI adoption grows, so do the associated risks, including errors, biases, and unexpected outcomes. However, as AI becomes more integral to core business operations, managing operational and financial risks becomes paramount. One way to offset that risk is to ensure AI models. However, that can be difficult to achieve as the process is complex. …

Sharon Moran

•

February 17, 2025

Search site

Search site

Driving Patient Experiences Through Data Science-Driven Approaches to Infrastructure – with Xiong Liu of Novartis

Driving Model Explainability by Addressing Bias

Utilizing Knowledge Graphs for Enhanced Predictions

Recommended from Emerj

Managing End Point Storage in Hybrid Data Strategies for Financial Services – with Yonas Yohannes of Oracle

NLP Logix’s AI Collaborate 2024: A Look at the Future of GenAI Experiences from Sports to HR

Artificial Intelligence at Aflac – Two Use Cases

Understanding AI’s Expanding Role in Drug Discovery and Life Sciences R&D – Liran Belenzon of BenchSci

Long-Term ROI for GenAI in Healthcare – with Ylan Kazi of Blue Cross Blue Shield North Dakota

Tackling Chargeback Challenges at Scale with Machine Learning and Dynamic Arguments – with Roenen Ben-Ami at Justt

Generative AI as a Catalyst for Enterprise Innovation – with Deborah Golden of Deloitte

Artificial Intelligence at Wells Fargo- Two Use Cases

Keeping Up with AI Regulations in a New Age of Data Privacy – with Leaders from OneTrust, Microsoft, and TELUS

Transforming Contact Center Operations With GenAI – with Venkatesh Korla of HGS Americas

A Blueprint for AI-Driven Process Optimization for Regulated Industries – with Vrinda Khurjekar and Paul Pallath at Searce

AI Regulation and Risk Management in 2024 – with Micheal Berger of Munich Re

Customize Your Experience

Driving Patient Experiences Through Data Science-Driven Approaches to Infrastructure – with Xiong Liu of Novartis

Driving Model Explainability by Addressing Bias

Utilizing Knowledge Graphs for Enhanced Predictions

Related Posts

Share article

Subscibe to updates

Recommended from Emerj

Managing End Point Storage in Hybrid Data Strategies for Financial Services – with Yonas Yohannes of Oracle

NLP Logix’s AI Collaborate 2024: A Look at the Future of GenAI Experiences from Sports to HR

Artificial Intelligence at Aflac – Two Use Cases

Understanding AI’s Expanding Role in Drug Discovery and Life Sciences R&D – Liran Belenzon of BenchSci

Long-Term ROI for GenAI in Healthcare – with Ylan Kazi of Blue Cross Blue Shield North Dakota

Tackling Chargeback Challenges at Scale with Machine Learning and Dynamic Arguments – with Roenen Ben-Ami at Justt

Generative AI as a Catalyst for Enterprise Innovation – with Deborah Golden of Deloitte

Artificial Intelligence at Wells Fargo- Two Use Cases

Keeping Up with AI Regulations in a New Age of Data Privacy – with Leaders from OneTrust, Microsoft, and TELUS

Transforming Contact Center Operations With GenAI – with Venkatesh Korla of HGS Americas

A Blueprint for AI-Driven Process Optimization for Regulated Industries – with Vrinda Khurjekar and Paul Pallath at Searce

AI Regulation and Risk Management in 2024 – with Micheal Berger of Munich Re

This Content is Exclusive to Emerj Plus Members

In-Depth Analysis

Exclusive AI Capabilities Matrix

Exclusive AI White Paper Library

Best Practices and executive guides

Register

Customize Your Experience