This interview analysis is sponsored by Deloitte and was written, edited, and published in alignment with our Emerj sponsored content guidelines. Learn more about our thought leadership and content creation services on our Emerj Media Services page.
R&D teams across life sciences, agriculture, and materials science are under increasing pressure to deliver innovation — but often face a fundamental obstacle: asking questions their data isn’t prepared to answer.
According to a recent survey from IDC, partnering with Qlik, while 89% of organizations have updated their data strategies to adopt generative AI (GenAI), only 26% have actually deployed solutions at scale. Just 12% report having infrastructure robust enough to support autonomous decision-making. The disconnect between research ambition and data readiness remains a significant bottleneck.
The problem isn’t limited to infrastructure. Throughout the health sector, the so-called “10/90 gap” — where less than 10% of research funding targets diseases responsible for 90% of the global burden — reflects a highly cited and significantly deeper misalignment between research priorities and actionable data.
In both public and private domains, organizations struggle to define their hypotheses, interpret their findings, or connect siloed datasets across the R&D lifecycle. To address these challenges, Dr. Daniel Ferrante, AI Leader in R&D and Data Strategy at Deloitte, joined Emerj CEO Daniel Faggella on the ‘AI in Business’ podcast to discuss a new approach to scientific data: contextualizing it through domain-specific large language models (LLMs) and representation learning.
Drawing from his work on Deloitte’s Atlas AI framework, Ferrante outlines strategies for embedding internal data into domain knowledge, avoiding the pitfalls of rigid ontologies, and generating hypotheses through exploratory mapping rather than assumption-driven analysis.
This article examines three critical insights from Ferrante’s interview that provide actionable strategies for R&D and innovation leaders:
- Building data context before building AI models: Mapping scientific data onto learned representations from domain-specific LLMs enables organizations to identify where their information aligns with established knowledge and where key gaps may remain.
- Ontologies as a starting point, not a constraint: Teams should use rigid committee-built ontologies as flexible labeling tools and layer them into broader geometric and statistical models — allowing LLMs to bridge different naming systems and domains without being trapped by inconsistent labels.
- Start with exploratory cartography, not assumptions: Before jumping to analysis, teams should enable smarter hypothesis formation by using LLMs to map internal data against known domain models (e.g., chemistry or genomics) that visualize patterns, identify clusters, and discover latent structures.
Listen to the full episode below:
Guest: Daniel Ferrante, Partner, AI Leader in R&D and Data Strategy at Deloitte
Expertise: AI strategy, Data Monetization, IP-driven POCs
Brief Recognition: Dr. Daniel Ferrante is a Partner and AI leader in R&D and Data Strategy at Deloitte. Before joining Deloitte, he co-founded SLF Scientific and was leading the company as a Chief Science & Data Officer. Dr. Ferrante received his Master’s & PhD in Theoretical Physics from Brown University.
Building Data Context Before Building AI Models
According to Dr. Daniel Ferrante, one of the most common missteps in enterprise AI adoption is assuming that internal data is ready for immediate model training. On the podcast, he explains that research and business leaders often ask questions their data cannot answer without the necessary contextual framework.
“When you ask a business or R&D question, you often don’t know if the data you have can answer it,” he notes.
Ferrante underscores that Deloitte’s Atlas AI approach starts not with modeling but with representation — embedding internal datasets into domain-specific LLM landscapes (e.g., chemistry, genomics):
“Because so much of R&D involves experimental unknowns, we should start by leveraging what LLMs have already learned in specific scientific domains. By mapping our internal data against that learned landscape, we can understand its position within broader domain knowledge. Once we begin asking questions of our data in that context, we can start identifying meaningful patterns — giving us the biological, chemical, or scientific perspective we didn’t have before.”
– Dr. Daniel Ferrante, AI Leader in R&D and Data Strategy at Deloitte
These learned spaces, he says, often resemble terrains of valleys and ridges, each representing a unique biological trait such as solubility or toxicity.
This strategy, Ferrante argues, allows organizations to position their data within a broader scientific context before attempting hypothesis testing or downstream analytics.
“We’re not just generating data — we’re generating labels,” he states, emphasizing that mapping comes first, not assumptions. For leaders, this means that AI serves as a lens for understanding what is possible with current data rather than rushing into ill-posed modeling tasks.
Ontologies as a Starting Point, Not a Constraint
Dr. Ferrante stresses in the interview that ontologies — while valuable for categorization — should not become architectural constraints that lock research teams into brittle naming conventions. He critiques what he calls “Frankenstein ontologies” assembled by committees, where companies often attempt to enforce a single vocabulary across diverse domains or research traditions.
Instead, Ferrante describes Atlas AI’s approach of embedding ontologies as soft labels within a broader statistical framework. Using soft labels allows for semantic similarity and flexible mapping, even when teams use different taxonomies:
“Ontologies are just structured labels — they’re useful, and we should use them wherever possible. But we shouldn’t trap ourselves within them.
In physics, we often make a problem larger in order to make it solvable, and the same principle applies here. Rather than trying to force-fit every data point into a rigid ontology, why not leverage language models that may have already learned the structure we’re trying to formalize? Why lock ourselves into Frankenstein ontologies when there’s a more flexible and scalable alternative?”
– Dr. Daniel Ferrante, AI Leader in R&D and Data Strategy at Deloitte
He argues that this method provides a more robust way to unify internal and external data without requiring standardization upfront. In Ferrante’s view, the role of ontologies should shift — from rigid taxonomies to tools for enhancing representation learning.
Start with Exploratory Cartography, Not Assumptions
Daniel also insists that, before jumping to analysis, teams should enable smarter hypothesis formation by using LLMs to map internal data against known domain models (e.g., chemistry or genomics) that visualize patterns, identify clusters, and discover latent structure.
Throughout the conversation, Dr. Ferrante challenges conventional AI workflows that begin with hypothesis formation or confirmatory analytics. Instead, he advocates for “cartography” — a term he uses to describe the exploratory process of mapping an organization’s internal data within a learned domain landscape.
In the interview, Ferrante explains how researchers using Atlas AI might project oncology drugs into a chemical representation space derived from publicly available databases, such as PubChem. In doing so, teams can detect whether certain compounds cluster by toxicity, binding affinity, or another property — even before building formal models. “It’s not about labeling more data,” Ferrante says. “It’s about seeing where that data sits in a landscape of known science.”
Ferrante argues that this exploratory practice leads to better hypothesis generation and prevents wasted effort on questions the data can’t meaningfully address. For R&D leaders, the key takeaway is a shift in mindset: use LLMs not only for automation but also for revealing patterns that guide smarter experimentation and more informed investments in modeling.