Overcoming Cultural and Technological Hurdles for AI Integration in Life Sciences – with Daniel Ferrante of Deloitte

Riya Pahuja

Riya covers B2B applications of machine learning for Emerj - across North America and the EU. She has previously worked with the Times of India Group, and as a journalist covering data analytics and AI. She resides in Toronto.

Overcoming Cultural and Technological Hurdles for AI Integration in Life Sciences-1-min

Far beyond surface-level ‘chatbot’ software, and other customer-facing support systems, the same generative AI (GenAI) capabilities having direct impacts across language workflows in front-office tasks in financial and legal services is also having a direct impact on how research teams in the life sciences space are targeting solutions for rare diseases and novel treatments. 

The same language models for document processing and examination can be augmented and trained to navigate complex solution spaces, integrate diverse data modalities, and unlock new scientific insights. Researchers have used generative AI techniques to analyze whole genomes and identify disease-relevant mutations in non-coding regions of DNA, in one example. 

In another primary example, the AlphaFold project from DeepMind demonstrates how AI can be used to rapidly expedite protein synthesis, accelerating biology research that previously required extensive lab work. 

Emerj CEO and Head of Research Daniel Faggella recently spoke with Daniel Ferrante, AI Leader in R&D and Data Strategy at Deloitte, on the ‘AI in Business’ podcast to discuss how integrating multiple data modalities enhances understanding and identifies unique ‘pockets’ of information.

The following analysis of their conversation examines two key insights:

  • Enhancing analysis with data aggregation: Aggregating diverse data sets into a comprehensive knowledge graph enables researchers to uncover interconnected data points and ‘communities,’ enhancing insights and decision-making capabilities.
  • Integrating multiple data modalities: Integrating numerous data modalities sharpens understanding and identifies unique pockets of information. Combining diverse data types enhances model strength and weakness identification, fostering a more comprehensive analysis.

Listen to the full episode below:

Guest: Daniel Ferrante, Partner, AI Leader in R&D and Data Strategy at Deloitte

Brief Recognition: Dr. Daniel Ferrante is a Partner and AI leader in R&D and Data Strategy at Deloitte. Before joining Deloitte, he co-founded SLF Scientific and was leading the company as a Chief Science & Data Officer. Dr. Ferrante received his Master’s & Ph.D. in Theoretical Physics from Brown University.

Expertise: AI strategy, data monetization, IP-driven POCs

Enhancing Analysis with Data Aggregation

Daniel begins his podcast appearance by elaborating on the concept of ‘communities ‘within data sets, likening the organization of data to the diverse groups or patterns that individuals belong to based on their behaviors or characteristics. He suggests that these communities represent distinct data segments that can be identified through analysis. For instance, individuals who consistently exhibit certain behaviors or preferences may belong to specific communities within the data.

Through these ‘communities,’ researchers can identify patterns or behaviors associated with others by analyzing data, allowing them to draw insights about individuals within those groups. Daniel provides an analogy using credit card transactions: If one individual uses their credit card in places where another does not, it indicates differences in their respective communities via behavior patterns. The difference between them can be leveraged to extract relevant data and insights from diverse sources, facilitating more effective analysis.

He emphasizes the power of aggregating diverse data sets into actionable formats, likening it to techniques used in detecting money laundering.

By aggregating and organizing data, Daniel suggests researchers can construct a comprehensive knowledge graph that captures the interconnectedness of different data points and communities. Managing such a knowledge graph effectively can enhance researchers’ ability to derive meaningful insights and make informed decisions, akin to adding “sweet cream on top” to the analytical process.

Integrating Multiple Data Modalities

In answering a question about advanced methods of data clustering, Daniel explains a complex concept using an analogy from his PhD research, where he studied the solution space of equations. He compares solving equations to navigating through a multi-dimensional space defined by different parameters. By varying these parameters, researchers can explore the landscape of possible solutions.

He extends this analogy to AI, particularly in neural networks.

In neural networks, data inputs are transformed into embeddings, vectors representing the data’s position in an abstract space. By introducing parameters into the model, researchers can manipulate the embeddings and observe how the landscape of solutions changes – like tightening strings on a guitar to play higher notes.

Daniel suggests that understanding the stability and characteristics of these solution landscapes is crucial for interpreting AI models effectively. By probing different landscape regions, researchers can gain insights into what the model has learned and how it processes information. This approach can be precious in fields like medicine, where AI models are used to analyze complex datasets and make predictions.

Daniel notes that being able to have a precise sense of scale in AI models will be crucial in segmenting the different “pockets” or communities of data effectively for the benefit of human patients: 

“But if you want to understand what your AI model has learned, what your neural network has actually absorbed. You want to probe the whole landscape, you want to see the whole valley, whatever you try to do. That is powerful. So to the point that you made before, the way I think about it: It would be different pockets around [the customer data], and it will be collection of these such pockets, because you’re gonna have data of different types for different diseases, they may or may not overlap with other folks. But the distribution that you have, I would like to believe, would be singular to you and define who you are. “

– Daniel Ferrante, AI Leader in R&D and Data Strategy at Deloitte

With regards to the different “pockets” of data Daniel refers to throughout the interview, he emphasizes the importance here of considering multiple data modalities in AI analysis, such as images, genomics, and proteomics. Each data modality represents a distinct solution space, similar to the stratification of patient behaviors discussed in the previous subsection. However, the real power lies in integrating these diverse solution spaces to create a comprehensive understanding of the data.

Daniel goes on to explain that, in a multimodal world where various data types intersect, the unique identifiers or pockets of information become even more distinct. By combining different modalities, researchers can sharpen their understanding of the data and better define the strengths and weaknesses of their models.

Lastly, Daniel suggests adopting a mindset similar to the MakerGarage movement, where curiosity and experimentation are encouraged, even in complex fields like drug discovery. He acknowledges that, even in challenging research areas, there’s potential to explore and utilize innovative tools effectively, especially considering the vast work already done in the field.

Daniel emphasizes the importance of understanding how scientists approach data: not just noting data points, but going on to extracting metadata and context from them. Ensuring data robustness, repeatability, and reproducibility are crucial. Additionally, capturing and aggregating metadata generated by scientists can provide valuable insights and opportunities for further analysis, enabling previously impossible tasks. This shift in approach, focusing on data hygiene and metadata aggregation, offers significant advantages in scientific research and innovation.

Subscribe
subscribe-image
Stay Ahead of the Machine Learning Curve

Join over 20,000 AI-focused business leaders and receive our latest AI research and trends delivered weekly.

Thanks for subscribing to the Emerj "AI Advantage" newsletter, check your email inbox for confirmation.