Generative AI Adoption Challenges From a Data Perspective – with Ted Kwartler of Data Robot and Tsavo Knott of Pieces

This interview analysis is sponsored by Pieces and was written, edited and published in alignment with our Emerj sponsored content guidelines. Learn more about our thought leadership and content creation services on our Emerj Media Services page.

Companies all over the world are fervently experimenting with generative AI, aiming to harness its transformative potential. However, this quest for innovation comes with risks and challenges. From ethical considerations surrounding data privacy to the complexities of integrating AI into existing workflows, companies grapple with ensuring responsible and seamless adoption.

Per KPMG Canada’s 2023 Generative AI Adoption index, 70 percent of survey respondents said their companies intend to continue using generative AI regardless of the risks and controversies involved. Only 16 percent of the respondent companies have comprehensive AI policies outlining specific use cases. Adopting generative AI may expose the company to technical and ethical challenges.

Emerj CEO and Head of Research Daniel Faggella recently sat down with Ted Kwartler, Field CTO of Generative AI at DataRobot, and Tsavo Knott, Co-founder & CEO of Pieces, to discuss how best to address challenges in adopting generative AI.

Pieces is an AI-driven software company that builds a desktop application and suite of integrations to enable software developers to work more efficiently. In contrast, Data Robot is an AI lifecycle platform that allows an organization to build, operate and govern its AI landscape from a single, unified experience.

The following analysis of their shared perspectives examines three critical insights from their conversation:

Reducing the model production cost to prioritize speed: With lower production costs, the companies will change the way they see data science and will prioritize speed over accuracy.
Leveraging AI tools for developer workflows: Implementing automation, heuristics and machine learning to capture and leverage valuable organizational knowledge embedded from diverse sources, ensuring that insights are not lost and can be actively utilized.
Optimizing website format to interact with large language models: Transitioning from traditional .txt files to JSON so the website can interact with and answer questions generated by large language models.

Guest: Ted Kwartler, Field CTO of Generative AI, DataRobot

Expertise: Data Mining, AI, Data Analytics

Brief Recognition: Along with serving as Field CTO at DataRobot, Ted is also a professor at the GSERM Global School in Empirical Research Methods, helping Ph.D. candidates with empirical research methods using NLP and text mining approaches. He is also an adjunct professor at the Harvard Extension School. Before his time at DataRobot, Ted served as a Director of Data Science at Liberty Mutual, as well as in various advisory and board positions for tech companies, investment firms and the DC-based Bureau for Economic Analysis.

Guest: Tsavo Knott, Technical Co-founder & CEO of Pieces

Expertise: Coding, Software Development, Entrepreneurship, Interactive Media, Computer Science

Brief Recognition: Tsavo graduated from Miami University in 2018 with a Bachelor’s Degree in Game and Interactive Media Design as well as Computer Science. Before co-founding Pieces in 2020, he was a vice president and co-founder of Accent.ai, a language learning platform.

Reducing The Model Production Costs To Prioritize Speed

Ted begins by addressing the challenges in managing data science projects within organizations. He highlights the disruptive impact of personnel turnover, where new data scientists taking over ongoing projects need help comprehending existing work, mainly when insufficient documentation is available, resulting in project delays.

Ted further connects project continuity issues with rising costs, as inefficiencies emerge during transitions. To tackle these challenges, he suggests adopting a structured approach by measuring variable costs per model and implementing lifecycle management to understand the time required to take a model from development to production.

It costs approximately $130,000 to place the model into production, which takes several months. However, Kwartler emphasizes that if you can significantly reduce these costs to around $35,000, it changes how organizations prioritize their data science efforts.

With lower costs, the calculation shifts and organizations might be more willing to prioritize speed over absolute accuracy in their models. It indicates that they’d opt for a “close enough” model and get it in front of users faster rather than striving for perfection.

Ted emphasizes that managing many models, including those in production, is unmanageable through manual oversight. This is especially the case in the example he cites of a client trying to scale 100,000 models. He underscores the necessity of a technology-driven solution to ensure efficient and continuous management, as depending on manual reviews or interventions is impractical in the face of rapid model development and deployment.

In response, Tsavo outlines Pieces’ approach to addressing the complex challenge of managing the development workflow, particularly at the individual developer level.

He describes Pieces’ solution as a highly detailed and granular tool designed to streamline and centralize various aspects of the software development process. This tool acts as a bridge or hub that connects different programs, such as web browsers, integrated development environments (IDE) and collaborative spaces where team members communicate and collaborate.

“Our goal is to really capture the work in progress journey because so much of that, in passing, ‘Hey, I looked at this site, I evaluated this model, I talked to these people last.’ It doesn’t get captured in formal documentation or when you commit to GitHub or other programs.”

-Tsavo Knott, Technical Co-founder and CEO of Pieces

Leveraging AI Tools for Developer Workflows

Ted reflects on his experience at Liberty Mutual, where he was involved in machine learning research and studying a phenomenon called “dislocation,” which consists of measuring how prices change for customers year over year.

Ted acknowledges that he did document his work to create scripts, visualizations and presentations, but he questions whether the organization knew how to make practical use of that information. His broader point is that the knowledge generated during his research was lost within the large organization.

To address these issues, Ted proposes using automation to run processes in the background, ensuring that valuable information is not lost. He also mentions the potential for implementing heuristics and machine learning to identify relevant portions of workloads and make them more accessible and actionable.

Tsavo shares Ted’s perspective on capturing and utilizing organizational knowledge but goes further by expanding the scope of what needs to be changed. He mentions that besides the formal documentation processes, organizations must consider the wealth of information embedded in blogs, internal documents and chat conversations in various platforms like Teams, GChat and Slack, in addition to the code that’s being written.

Ted provides an example from DataRobot’s experience, where they have maintained an archived Slack channel for data scientists for their 10-year history. He points out that there’s immense untapped knowledge contained in these chat conversations. He even suggests the possibility of creating an AI model to analyze and leverage this wealth of information, thereby harnessing the collective knowledge and expertise built up over a decade of data science discussions and questions.

Optimizing Website Format to Interact With Large Language Models

Ted further highlights the significant productivity boost he has experienced as a developer using tools like copilot platforms, which leverage technologies like GPT (likely referring to OpenAI’s GPT-3 ) to assist in coding tasks, such as Pieces’ copilot platform.

He shares a personal example of how he could quickly and efficiently solve a coding problem (involving shell scripting) that would have taken him much longer without these tools. While these tools provide a remarkable advantage, Ted points out a downside: the process and the prompt used to obtain the solution often need to be better tracked by teams leveraging them.

Ted then focuses on the challenges of integrating new technology into legacy systems. He mentions that many developer tools and systems have yet to evolve in line with the technological advancements of the modern era. He gives an example of CRM systems resembling outdated software like Microsoft Access from the ’90s or early 2000s.

Ted finalizes his point by acknowledging that there’s a need for these systems to evolve and integrate new functionalities, alluding to the importance of keeping pace with technological advancements.

Ted discusses a conversation he had with a software developer from a major search engine. The developer talked about a significant change in how website information is structured, specifically shifting from the traditional “robots.txt” file to a more structured JSON format optimized for interaction with large language model agents like GPT.

Kwartler then imagines a future scenario where users could make direct requests, such as: “I need an itinerary to Dublin for my parents and children in November.” Instead of searching for web pages and piecing together the information manually, a large language model agent could quickly generate a one-week itinerary.

What’s particularly interesting in this evolving landscape is the shift in how online traffic is managed. Ted mentions that a relative of his works at Priceline, a travel booking website, notes that a significant portion of internet traffic is transitioning from human-driven searches to interactions with ambitious AI agents that perform searches and make bookings on users’ behalf.

While anecdotal, the shift highlighted in the example means that websites and businesses must optimize their online presence not just for traditional search engines but also for these large language model agents.

Ted agrees that in the coming years, most internet traffic could be driven by these ambitious agents. How websites are structured, the information they provide and how they interact with these agents will be critical for businesses looking to maintain a robust online presence and serve users effectively in this evolving digital landscape.

Generative AI Adoption Challenges From a Data Perspective – with Ted Kwartler of Data Robot and Tsavo Knott of Pieces

Reducing The Model Production Costs To Prioritize Speed

Optimizing Website Format to Interact With Large Language Models

Services

Resources

Company

Reducing The Model Production Costs To Prioritize Speed

Optimizing Website Format to Interact With Large Language Models

Related Posts

Related posts (5)

Lessons from Retail Banking on Data Solutions for Structuring Insurance Data – with Ermir Qeli of Swiss Re

Data Challenges in the Defense Sector

How Existing Businesses Should Organize Their Data Assets for AI

Smart Home Services Provider Automates Report Creation with AI and Customer Data

Overcoming Data Challenges for AI in the Healthcare Industry

Services

Resources

Company

Stay Ahead of the Machine Learning Curve