[seopress_breadcrumbs]

Artificial Intelligence at AstraZeneca

•

June 17, 2024

AstraZeneca is a global biopharmaceutical company that researches, develops, manufactures, and markets prescription drugs and vaccines. Its key therapeutic areas include oncology, cardiovascular, renal, metabolism, respiratory, and immunology. In 2022, the company reported revenue of $42.67 billion and a profit of $4.08 billion. The company has a significant global presence, employing around 89,900 people across more than 60 countries as of 2023.

The pharma company has invested more than USD 250 million in AI research and developing an antibody for cancer. The company also claims to have data and AI embedded across its research and development.

In this article, we will examine two use cases showing how AI initiatives currently support AstraZeneca’s business goals:

Overcoming data integration challenges: Leveraging NLP to process and analyze a vast library of scientific literature and data sources, thus facilitating the integration of disjointed data and helping build scalable and performant data pipelines.
Streamlining machine learning model deployment: Using fully managed machine learning services to build, train, and deploy machine learning models efficiently.

Overcoming Data Integration Challenges with AI

On average, it takes 10 to 15 years to research the drug and complete all 3 phases of clinical trials. Even after that, a whopping 90 percent of drugs fail to meet their intended goals. With this significant investment and much less success rate, scientists and researchers need to assess the data sets and innovate in a fast-track manner.

At AstraZeneca, the team of researchers felt that they were not able to make decisions even with all the information available at their fingertips. They faced several challenges:

Disjointed Data Sources: Scientists needed help with data scattered across various internal and external sources, making it challenging to access and integrate information necessary for drug discovery and clinical trials.
Infrastructure Complexity: The need for a flexible yet low-maintenance infrastructure was critical. Existing systems required constant upkeep, which diverted resources from the core scientific tasks.
Scaling Data Science Efforts: Existing tools and workflows, particularly open-source Python notebooks, couldn’t scale effectively to support the extensive data science efforts required.

Additionally, the necessity to ingest, parse, and analyze millions of data points from hundreds of sources, including scientific literature and public databases, was a significant technical challenge. Moreover, ensuring that the data pipelines and machine learning models could handle the vast and growing volume of data while maintaining performance was critical for the company.

As a solution to these challenges, AstraZeneca adopted Databricks to leverage its fully managed platform with the aim of simplifying cluster management and maintaining the analytic resources at scale.

According to use case documentation from Databricks, AstraZeneca used the Databricks platform to build scalable and performant data pipelines, utilizing NLP to process and analyze a vast library of scientific literature and data sources.

Screenshot from Databricks Video (Source: Databricks)

Below is a five-minute video for the demo of the Databricks Data Intelligence Platform:

This platform also empowered data scientists to:

Build and train models that provide ranking predictions, enhance decision-making capabilities, and
Construct a knowledge graph that powers a recommendation system, allowing scientists to generate novel target hypotheses for various diseases using all available data.

The case study claims that after the adoption of the data bricks platform, AstraZeneca was able to:

Improve operational efficiency
Increase data science team productivity
Faster time to insight

While the company did not share the quantifiable results of the data bricks platform at AstraZeneca, it reported the below numbers for some of its other customers:

14 databases replaced by one delta lake
Six seconds to perform a complex analytics task, which previously consumed 6 hours
20 % faster performance after unifying data through Databricks Data Intelligence Platform

Streamlining Machine Learning Model Deployment

In managing the vast amount of data, companies often need to realize the missed opportunity of actually gaining meaningful insights from these vast data sources. AstraZeneca was going through something similar. Additionally, the machine learning development process was heavily manual, which demanded more effort from the data scientists. Moreover, The prior system was not extensible, flexible, or scalable enough to meet the needs of AstraZeneca’s commercial data analysis.

In addition to this, the company also observed the following challenges:

Inefficient Development Process: The company needed an efficient process for creating and deploying machine learning models into production, slowing down data analysis and insight generation.
Slow Insight Generation: The previous machine-learning solution needed to be faster. It required over a month to set up an environment for data scientists and thus delayed insight delivery.
Lack of Cohesion Among ML Tools: The existing technology stack had no cohesive way to integrate various ML tools, making it challenging to create a seamless environment for data scientists.

AstraZeneca needed an efficient development process to create and deploy machine learning (ML) models into production, enabling rapid data analysis at scale and generating business insights. It would enhance research and development, accelerate the commercialization of new therapeutics, and ultimately speed up the delivery of life-changing medicines to patients.

To solve these issues for the researchers and the data science team, AstraZeneca adopted AWS Sagemaker with the aim of streamlining the preparation, building, training, and deployment of machine learning models.

AWS Sagemaker helps companies build, train, and deploy ML models using tools like notebooks, debuggers, profilers, pipelines, and MLOps. It supports governance requirements with simplified access control and transparency over the ML projects.

Additionally, the companies can access pre-trained models via Sagemaker.

Screenshot from AWS Sagemakaer (Source: AWS)

Below is an Eight-minute video demo of AWS Sagemaker:

Here is a 6-point workflow for using AWS SageMaker based on the above video.

Data Preparation: Loading and preparing data for model training using SageMaker Data Wrangler or custom scripts.
Model Training: Using SageMaker’s built-in algorithms or custom code to train machine learning models on the prepared data.
Model Tracking: Logging all steps of the model training workflow, creating an auditable trail to reproduce models and troubleshoot issues.
Model Registry: Centrally tracking different versions of trained models, their metadata, and performance metrics to select the right model for deployment.
Model Deployment: Deploying the selected model for inference using SageMaker’s built-in hosting capabilities or custom deployment options.
Model Monitoring: Monitoring the deployed model’s performance, data drift, bias, and other metrics using SageMaker Model Monitor.

Since SageMaker provides all the above tools in one platform, it makes it easier for the data sciences to access information.

AWS claims that AstraZeneca observed the below business results from the adoption of the said tech stack:

Increased Speed to Insights: The time to generate insights decreased from over six months to less than 2.5 months, a 150% improvement.
Improved Efficiency: Automating the ML development process within Amazon SageMaker Studio reduced the manual workload, allowing data scientists to focus on valuable tasks.
Scalability and Repeatability: The solution’s infrastructure as code made it simple to repeat and share across internal and external partners, enhancing collaboration and scalability.

[mrj_paywall] unauthorized access

Recommended from Emerj

Driving Disease Risk Prediction and Preventative Healthcare with AI – with Dan Elton of the National Human Genome Research Institute

The landscape of preventative healthcare and genetic research is "awakening" with data, enabling earlier and more precise disease risk prediction. The evolution is particularly critical as the healthcare industry shifts from reactive treatment to proactive care. Integrating advanced capabilities with genomic data allows researchers and clinicians to analyze vast numbers of genetic variants, providing more…

Emily Smith

•

April 21, 2025

Artificial Intelligence at Foxconn – Two Use Cases

Foxconn, officially Hon Hai Precision Industry Co., Ltd., is a multinational electronics contract manufacturing company headquartered in Taiwan. Founded in 1974, it is renowned for producing consumer electronics for major companies like Apple, Microsoft, and Amazon. As of 2023, it employed approximately 90,221 people globally and reported an estimated annual revenue of $4.1 billion. The…

Riya Pahuja

•

April 14, 2025

Leveraging AI for Better Insurance Outcomes From Risk Management to Customer Care – with Mark McLaughlin of IBM

AI is transforming the insurance industry from reactive claims processing to proactive risk management. Rising competition from fintech and insurtechs and growing consumer demands for personalized, real-time experiences are driving such widespread industry adoption. Academic research highlights how insurers are increasingly using digital technologies and behavioral data to personalize services and influence customer behavior, underscoring…

Emily Smith

•

April 7, 2025

Managing End Point Storage in Hybrid Data Strategies for Financial Services – with Yonas Yohannes of Oracle

Transparency in AI is a major hurdle for businesses, particularly in the financial services industry. Generative AI (GenAI) models, particularly non-deterministic models, are often viewed as “black boxes,” making it difficult to understand the underlying decision-making processes. Due to this black box risk, banks can experience multiple types of AI incidents, including system glitches, data…

Sharon Moran

•

March 31, 2025

NLP Logix’s AI Collaborate 2024: A Look at the Future of GenAI Experiences from Sports to HR

This article is sponsored by NLP Logix and was written, edited, and published in alignment with our Emerj sponsored content guidelines. Learn more about our thought leadership and content creation services on our Emerj Media Services page. Event Title: AI Collaborate 2024 Event Host: NLP Logix Location: Florida, USA Date: November 19-20, 2024 What Happened…

Riya Pahuja

•

March 26, 2025

Artificial Intelligence at Aflac – Two Use Cases

Aflac is a global leader in supplemental health and life insurance, providing financial protection to over 50 million policyholders in the U.S. and Japan. In 2023, Aflac reported an annual revenue of $18.7 billion. With approximately 12,785 employees worldwide, Aflac continues to drive innovation in cancer and medical insurance. Although Aflac's total investment in AI…

Ashwin Telang

•

March 24, 2025

Understanding AI’s Expanding Role in Drug Discovery and Life Sciences R&D – Liran Belenzon of BenchSci

This interview analysis is sponsored by BenchSci and was written, edited, and published in alignment with our Emerj sponsored content guidelines. Learn more about our thought leadership and content creation services on our Emerj Media Services page. The intricate nature of biological systems significantly complicates drug development. Diseases often involve complex interactions among genes, proteins,…

Riya Pahuja

•

March 4, 2025

Long-Term ROI for GenAI in Healthcare – with Ylan Kazi of Blue Cross Blue Shield North Dakota

Mobile technology, including smartphones and wearable devices, collects health-related data such as physical activity metrics (e.g., step counts, heart rate), sleep patterns, and self-reported health information through surveys and applications. A 2022 research paper from Stanford University shows how these data enable AI systems to monitor health trends, predict potential health issues, and personalize healthcare…

Riya Pahuja

•

March 3, 2025

Tackling Chargeback Challenges at Scale with Machine Learning and Dynamic Arguments – with Roenen Ben-Ami at Justt

This interview analysis is sponsored by Justt and was written, edited, and published in alignment with our Emerj sponsored content guidelines. Learn more about our thought leadership and content creation services on our Emerj Media Services page. Merchants face an escalating challenge with chargebacks, leading to significant financial losses. According to a report by Mastercard,…

Riya Pahuja

•

February 26, 2025

Generative AI as a Catalyst for Enterprise Innovation – with Deborah Golden of Deloitte

This interview analysis is sponsored by Deloitte and was written, edited, and published in alignment with our Emerj sponsored content guidelines. Learn more about our thought leadership and content creation services on our Emerj Media Services page. Large enterprises' swift adoption of generative AI (GenAI) is driven by the quest for cost reduction and operational…

Matthew DeMello

•

February 25, 2025

Artificial Intelligence at Wells Fargo- Two Use Cases

Wells Fargo, a major financial services company, has its headquarters in San Francisco, United States. The company operates in 35 countries and has approximately 70 million customers worldwide. Financially, Wells Fargo reported strong performance in 2023 and generated $19.1 billion in net income, an 11 percent increase from the previous year. The company also reported…

Riya Pahuja

•

February 24, 2025

Keeping Up with AI Regulations in a New Age of Data Privacy – with Leaders from OneTrust, Microsoft, and TELUS

This interview analysis is sponsored by OneTrust and was written, edited, and published in alignment with our Emerj sponsored content guidelines. Learn more about our thought leadership and content creation services on our Emerj Media Services page. The rapid evolution of privacy laws, such as the General Data Protection Regulation (GDPR) and the California Consumer…

Riya Pahuja

•

February 19, 2025

Search site

Search site

Artificial Intelligence at AstraZeneca

Overcoming Data Integration Challenges with AI

Streamlining Machine Learning Model Deployment

Recommended from Emerj

Driving Disease Risk Prediction and Preventative Healthcare with AI – with Dan Elton of the National Human Genome Research Institute

Artificial Intelligence at Foxconn – Two Use Cases

Leveraging AI for Better Insurance Outcomes From Risk Management to Customer Care – with Mark McLaughlin of IBM

Managing End Point Storage in Hybrid Data Strategies for Financial Services – with Yonas Yohannes of Oracle

NLP Logix’s AI Collaborate 2024: A Look at the Future of GenAI Experiences from Sports to HR

Artificial Intelligence at Aflac – Two Use Cases

Understanding AI’s Expanding Role in Drug Discovery and Life Sciences R&D – Liran Belenzon of BenchSci

Long-Term ROI for GenAI in Healthcare – with Ylan Kazi of Blue Cross Blue Shield North Dakota

Tackling Chargeback Challenges at Scale with Machine Learning and Dynamic Arguments – with Roenen Ben-Ami at Justt

Generative AI as a Catalyst for Enterprise Innovation – with Deborah Golden of Deloitte

Artificial Intelligence at Wells Fargo- Two Use Cases

Keeping Up with AI Regulations in a New Age of Data Privacy – with Leaders from OneTrust, Microsoft, and TELUS

Customize Your Experience

Artificial Intelligence at AstraZeneca

Overcoming Data Integration Challenges with AI

Streamlining Machine Learning Model Deployment

Related Posts

Share article

Subscibe to updates

Recommended from Emerj

Driving Disease Risk Prediction and Preventative Healthcare with AI – with Dan Elton of the National Human Genome Research Institute

Artificial Intelligence at Foxconn – Two Use Cases

Leveraging AI for Better Insurance Outcomes From Risk Management to Customer Care – with Mark McLaughlin of IBM

Managing End Point Storage in Hybrid Data Strategies for Financial Services – with Yonas Yohannes of Oracle

NLP Logix’s AI Collaborate 2024: A Look at the Future of GenAI Experiences from Sports to HR

Artificial Intelligence at Aflac – Two Use Cases

Understanding AI’s Expanding Role in Drug Discovery and Life Sciences R&D – Liran Belenzon of BenchSci

Long-Term ROI for GenAI in Healthcare – with Ylan Kazi of Blue Cross Blue Shield North Dakota

Tackling Chargeback Challenges at Scale with Machine Learning and Dynamic Arguments – with Roenen Ben-Ami at Justt

Generative AI as a Catalyst for Enterprise Innovation – with Deborah Golden of Deloitte

Artificial Intelligence at Wells Fargo- Two Use Cases

Keeping Up with AI Regulations in a New Age of Data Privacy – with Leaders from OneTrust, Microsoft, and TELUS

This Content is Exclusive to Emerj Plus Members

In-Depth Analysis

Exclusive AI Capabilities Matrix

Exclusive AI White Paper Library

Best Practices and executive guides

Register

Customize Your Experience