Drug development carries one of the greatest financial risk profiles in the life-science industry. Approximately 90% of drug candidates fail, according to The American Society for Biochemistry and Molecular Biology — a significant share stemming from regulatory hurdles and a resulting lack of clinical efficacy.
Laws like HIPAA in the U.S. and GDPR in the EU impose strict privacy protections on patient data, limiting how it can be accessed and analyzed. The expansion of isolated sources of data from labs, providers, and records further compounds the difficulty of identifying successful drug candidates.
With the absence of access to comprehensive patient data, commercial teams need help determining which patients may benefit from new drugs. Commercial teams consequently need more understanding of market needs and patient behavior, creating blindspots and delaying market entry.
Despite these hurdles, advancements in AI are helping companies integrate disparate patient datasets securely and navigate regulatory complexities.
Emerj Senior Editor Matthew DeMello recently spoke with Jane Chen, Director of Commercial Analytics for Women’s Cancer at Novartis, on the ‘AI in Business’ podcast to discuss the promise and perils of AI adoption in healthcare. Chen touches on a range of privacy obstacles unique to the healthcare industry while also tracing how AI can better streamline patient identification and segmentation. Novartis employs generative chemistry to identify novel compounds and AI to identify patients for drug development, committing itself to ethical uses of AI and privacy protection. The company often develops solutions for rare diseases, which the company claims will facilitate earlier diagnoses and targeted intervention.
In the following analysis of their conversation, we examine two actionable insights for healthcare leaders:
- Aggregating siloed patient data amid privacy barriers: Adopting AI analytics to bring fragmented data into a single view without compromising patient confidentiality.
- Mitigating bias in patient datasets: Using machine learning and generative AI (GenAI) to detect and overcome demographic biases in patient datasets, creating more targeted drug candidates.
Listen to the full episode below:
Guest: Jane Chen, Director of Commercial Analytics for Women’s Cancer, Novartis
Expertise: Commercial Analytics, Patient Segmentation, and Healthcare Data Privacy
Brief Recognition: Jane Chen specializes in addressing challenges related to identifying and segmenting patient groups with rare diseases — particularly in women’s health and cancer. She has extensive experience navigating privacy concerns in healthcare data. She has also been instrumental in integrating innovative technologies, such as GenAI, to enhance decision-making and accelerate progress in oncology care.
Aggregating Siloed Patient Data Amid Privacy Barriers
Data necessary for drug development is inherently fragmented, with information scattered across labs, healthcare providers, and patient records.
Chen explains that no single dataset provides a complete view of patients. For example, lab datasets may include disease-specific markers but lack coverage across all patient populations. As a result, data comes from various platforms, leading to inconsistencies and batch effects.
The fragmentation is further complicated by strict privacy regulations like HIPAA and CMIA, which limit access to granular patient data. Critical privacy limitations like these set healthcare apart from other industries — like finance — where specific consumer data is readily available.
Privacy restrictions like HIPAA present obstacles on multiple levels, Chen notes:
- Individual patient identifiers (i.e. medical history) are concealed in patient data, limiting the data’s utility in targeted drug development
- Data de-identification involves stripping variables like dates or geographic information — critical for analyzing and controlling for a drug’s effects on specific populations
- Efforts to gain a full view of the patient population are frequently avoided, even if possible, since researchers and providers risk accidentally violating HIPAA
Together, these factors contribute to the difficulty of creating targeted drugs for niche diseases.
According to Chen, AI analytics show promise in stringing together disparate data sources and layering them atop each other. Advanced AI models at Novartis are trained to bring lab results, healthcare provider data, and patient records into a single view to maximize insights.
By layering anonymized records, AI capabilities can generate actionable insights without compromising patient confidentiality. Chen further notes that AI must be deployed within secure environments to ensure compliance with HIPAA.
She points out that several companies are exploring private server deployments and setting business rules that comply with privacy regulations, facilitating secure data sharing between research groups.
Chen also emphasizes that recent advances in computing power have made it easier to process large, siloed datasets quickly and iteratively. This capability allows Novartis to continuously refine its models in a matter of days rather than months, improving accuracy despite limitations imposed by privacy regulations.
By harmonizing the disparate data sources, AI analytics help fill the information gaps caused by HIPAA blinders. This approach enables researchers to identify more precise drug candidates that are tailored to the populations they serve.
Mitigating Bias in Patient Datasets
Bias in healthcare datasets presents a significant challenge for drug development, particularly for identifying underrepresented patient populations and ensuring equitable access to therapies.
Beyond data shortfalls from privacy restrictions, Chen found that biases arise from:
- Geographic and Demographic Disparities: Lab datasets may disproportionately reflect urban populations or specific healthcare networks, creating blindspots in who can benefit from targeted therapies.
- Incomplete Data Overlap: Due to differences in collection methods and missing variables (such as diagnostic measures), patients who do not fit neatly into existing frameworks are often excluded.
These gaps make it difficult to build comprehensive models that account for the full spectrum of patient needs.
“Labs might have identifiers for specific types of breast cancer versus others. But this is a smaller subset and the lab doesn’t cover all patients across the country. They may have biases based on location or the types of patients that visit. So, it’s hard to find all possible patients that might be able to benefit from my therapy.”
– Jane Chen, Director of Commercial Analytics for Women’s Cancer at Novartis
Chen then explains that AI has the potential to mitigate these biases by integrating and analyzing diverse datasets to uncover insights that traditional methods might overlook.
For example, machine learning models can identify underrepresented patient groups by detecting trends across multiple fragmented sources — such as lab results, electronic health records (EHRs), and population-level data. These models can also adjust for gaps in representation, improving the inclusivity and accuracy of predictions.
Predictive analytics can identify potential disease cases among populations that are often overlooked due to gaps in traditional data sources. This capability is particularly valuable for rare diseases, where small patient populations require exact targeting.
GenAI could also help researchers simulate missing data points or predict outcomes for underrepresented groups based on similar cases, according to Chen.
Deep learning, particularly with transformer models, has improved precision and recall in predictions. However, one limitation is that these models often need more explainability — crucial for accuracy and user understanding. For example, in predicting patient outcomes like hospital readmission, it’s essential to identify specific features (keywords, patterns) that contribute to the prediction.
Naturally, correcting bias requires a gradual approach, Chen indicates. Improvements in computing power can now facilitate more frequent updates and refinements, allowing algorithms to adapt to new data and progressively reduce bias over time.
By filling crucial gaps in healthcare data, Chen tells Emerj’s executive audience that AI is poised to deliver the following capabilities for commercial teams at life sciences enterprises:
- Enable More Effective Engagement Strategies: With enhanced prediction of patients in underrepresented areas, engagement teams can target more potential patients and healthcare providers
- Create More Targeted Medicine: By leveraging AI-driven insights from genetic profiles and medical histories, healthcare businesses can develop drugs more tailored to diverse patient needs — reducing the incidence of adverse reactions and enhancing patient outcomes
- Accelerate Drug Development Timelines: Addressing bias gaps enables AI to improve patient recruitment for clinical trials — particularly for rare diseases — leading to faster trial completion and quicker time-to-market for new therapies