[seopress_breadcrumbs]

AI Data Strategies for Life Sciences Agriculture and Materials Science – with Daniel Ferrante of Deloitte

This interview analysis is sponsored by Deloitte and was written, edited, and published in alignment with our Emerj sponsored content guidelines. Learn more about our thought leadership and content creation services on our Emerj Media Services page.

R&D teams across life sciences, agriculture, and materials science are under increasing pressure to deliver innovation — but often face a fundamental obstacle: asking questions their data isn’t prepared to answer. 

According to a recent survey from IDC, partnering with Qlik, while 89% of organizations have updated their data strategies to adopt generative AI (GenAI), only 26% have actually deployed solutions at scale. Just 12% report having infrastructure robust enough to support autonomous decision-making. The disconnect between research ambition and data readiness remains a significant bottleneck.

The problem isn’t limited to infrastructure. Throughout the health sector, the so-called “10/90 gap” — where less than 10% of research funding targets diseases responsible for 90% of the global burden — reflects a highly cited and significantly deeper misalignment between research priorities and actionable data.

In both public and private domains, organizations struggle to define their hypotheses, interpret their findings, or connect siloed datasets across the R&D lifecycle. To address these challenges, Dr. Daniel Ferrante, AI Leader in R&D and Data Strategy at Deloitte, joined Emerj CEO Daniel Faggella on the ‘AI in Business’ podcast to discuss a new approach to scientific data: contextualizing it through domain-specific large language models (LLMs) and representation learning.

Drawing from his work on Deloitte’s Atlas AI framework, Ferrante outlines strategies for embedding internal data into domain knowledge, avoiding the pitfalls of rigid ontologies, and generating hypotheses through exploratory mapping rather than assumption-driven analysis.

This article examines three critical insights from Ferrante’s interview that provide actionable strategies for R&D and innovation leaders:

  • Building data context before building AI models: Mapping scientific data onto learned representations from domain-specific LLMs enables organizations to identify where their information aligns with established knowledge and where key gaps may remain.
  • Ontologies as a starting point, not a constraint: Teams should use rigid committee-built ontologies as flexible labeling tools and layer them into broader geometric and statistical models — allowing LLMs to bridge different naming systems and domains without being trapped by inconsistent labels.
  • Start with exploratory cartography, not assumptions: Before jumping to analysis, teams should enable smarter hypothesis formation by using LLMs to map internal data against known domain models (e.g., chemistry or genomics) that visualize patterns, identify clusters, and discover latent structures.

Listen to the full episode below:

Guest: Daniel Ferrante, Partner, AI Leader in R&D and Data Strategy at Deloitte

Expertise: AI strategy, Data Monetization, IP-driven POCs

Brief Recognition: Dr. Daniel Ferrante is a Partner and AI leader in R&D and Data Strategy at Deloitte. Before joining Deloitte, he co-founded SLF Scientific and was leading the company as a Chief Science & Data Officer. Dr. Ferrante received his Master’s & PhD in Theoretical Physics from Brown University.

Building Data Context Before Building AI Models

According to Dr. Daniel Ferrante, one of the most common missteps in enterprise AI adoption is assuming that internal data is ready for immediate model training. On the podcast, he explains that research and business leaders often ask questions their data cannot answer without the necessary contextual framework.

“When you ask a business or R&D question, you often don’t know if the data you have can answer it,” he notes. 

Ferrante underscores that Deloitte’s Atlas AI approach starts not with modeling but with representation — embedding internal datasets into domain-specific LLM landscapes (e.g., chemistry, genomics):

“Because so much of R&D involves experimental unknowns, we should start by leveraging what LLMs have already learned in specific scientific domains. By mapping our internal data against that learned landscape, we can understand its position within broader domain knowledge. Once we begin asking questions of our data in that context, we can start identifying meaningful patterns — giving us the biological, chemical, or scientific perspective we didn’t have before.”

– Dr. Daniel Ferrante, AI Leader in R&D and Data Strategy at Deloitte

These learned spaces, he says, often resemble terrains of valleys and ridges, each representing a unique biological trait such as solubility or toxicity.

This strategy, Ferrante argues, allows organizations to position their data within a broader scientific context before attempting hypothesis testing or downstream analytics. 

“We’re not just generating data — we’re generating labels,” he states, emphasizing that mapping comes first, not assumptions. For leaders, this means that AI serves as a lens for understanding what is possible with current data rather than rushing into ill-posed modeling tasks.

Ontologies as a Starting Point, Not a Constraint

Dr. Ferrante stresses in the interview that ontologies — while valuable for categorization — should not become architectural constraints that lock research teams into brittle naming conventions. He critiques what he calls “Frankenstein ontologies” assembled by committees, where companies often attempt to enforce a single vocabulary across diverse domains or research traditions.

Instead, Ferrante describes Atlas AI’s approach of embedding ontologies as soft labels within a broader statistical framework. Using soft labels allows for semantic similarity and flexible mapping, even when teams use different taxonomies:

“Ontologies are just structured labels — they’re useful, and we should use them wherever possible. But we shouldn’t trap ourselves within them.

In physics, we often make a problem larger in order to make it solvable, and the same principle applies here. Rather than trying to force-fit every data point into a rigid ontology, why not leverage language models that may have already learned the structure we’re trying to formalize? Why lock ourselves into Frankenstein ontologies when there’s a more flexible and scalable alternative?”

– Dr. Daniel Ferrante, AI Leader in R&D and Data Strategy at Deloitte

He argues that this method provides a more robust way to unify internal and external data without requiring standardization upfront. In Ferrante’s view, the role of ontologies should shift — from rigid taxonomies to tools for enhancing representation learning. 

Start with Exploratory Cartography, Not Assumptions

Daniel also insists that, before jumping to analysis, teams should enable smarter hypothesis formation by using LLMs to map internal data against known domain models (e.g., chemistry or genomics) that visualize patterns, identify clusters, and discover latent structure.

Throughout the conversation, Dr. Ferrante challenges conventional AI workflows that begin with hypothesis formation or confirmatory analytics. Instead, he advocates for “cartography” — a term he uses to describe the exploratory process of mapping an organization’s internal data within a learned domain landscape.

In the interview, Ferrante explains how researchers using Atlas AI might project oncology drugs into a chemical representation space derived from publicly available databases, such as PubChem. In doing so, teams can detect whether certain compounds cluster by toxicity, binding affinity, or another property — even before building formal models. “It’s not about labeling more data,” Ferrante says. “It’s about seeing where that data sits in a landscape of known science.”

Ferrante argues that this exploratory practice leads to better hypothesis generation and prevents wasted effort on questions the data can’t meaningfully address. For R&D leaders, the key takeaway is a shift in mindset: use LLMs not only for automation but also for revealing patterns that guide smarter experimentation and more informed investments in modeling.

Share article

Subscribe to updates

Subscribe to weekly email with our best articles Financial Services updates that have happened in the last week.

Recommended from Emerj

Close the CTA

Stay Ahead of the Machine Learning Curve

Join over 20,000 AI-focused business leaders and receive our latest AI research and trends delivered weekly.

This Content is Exclusive to Emerj Plus Members

You’ve reached a category page only available to Emerj Plus Members.

Members receive full access to Emerj’s library of interviews, articles, and use-case breakdowns, and many other benefits, including:

In-Depth Analysis

Consistent coverage of emerging AI capabilities across sectors.

Created with Sketch.

Exclusive AI Capabilities Matrix

An explorable, visual map of AI applications across sectors.

Created with Sketch.

Exclusive AI White Paper Library

Every Emerj online AI resource downloadable in one-click

Created with Sketch.

Best Practices and executive guides

Generate AI ROI with frameworks and guides to AI application

View membership options

Register