Deploying large language models (LLMs) in an enterprise setting requires management teams to adopt a strategic approach tailor-made for their organizations, which research consistently shows must consider various essential factors.
First among these factors is how critical it is to understand LLMs’ capabilities for effectively leveraging their power in tasks such as natural language understanding, generation, and knowledge-intensive applications. Equally important is recognizing the limitations of LLMs, including challenges related to biases, data quality, interpretability, hallucinations, and ethical considerations.
A new study from Stanford University and the Institute for Human-Centered AI shows that, for all their innovation, LLMs still have tremendous limitations in the context of different industries and workflows, at least in their current state of development. Specifically, the report details how “state-of-the-art language models” hallucinate at least 75% of the time in “response to specific legal queries.” These conclusions suggest that LLMs cannot perform with the reasoning and expertise that human attorneys do, for now.
While many experts insist these dynamics will change soon, the inherent dilemma for business leaders across legacy institutions and industries remains. Emerj Senior Editor Matthew DeMello recently spoke with NLP Logix Data Science Solutions Architect Anton Kornienko and Modeling and Analytics Team Lead Ben Webster on the ‘AI in Business’ podcast to discuss other limitations of Large Language Models, as well as the importance of data quality and curation in their practical application across the enterprise to overcome these shortcomings.
Throughout their conversation, the duo emphasizes approaches that can help business leaders best adjust to the short-term challenges we’re currently seeing while best preparing their organizations for the long-term changes these technologies will undoubtedly deliver across the global economy in the coming years.
The following analysis examines three critical insights from their conversation:
- Recognizing the limitations of LLMs: Acknowledging and tracking LLMs’ limitations in specific workflows, particularly numerical tasks, and that traditional AI methods may be more suitable for particular applications, like predictive modeling.
- Prioritizing data quality and curation for informed decisions: Focusing on relevance, cleanliness, and duplication in assessing the quality of data before implementing LLMs to ensure data effectiveness throughout the organization.
- Customizing LLMs for specific needs: Engaging a skilled team to leverage the adaptability of LLMs in addressing specific questions or problems unique to a company’s domain.
Listen to the full episode below:
Guest: Anton Kornienko, Data Science Solutions Architect, NLP Logix
Expertise: Large Language Models, Machine Learning, and Python
Brief Recognition: Anton is a Data Science Solutions Architect at NLP Logix. In this role, he is involved in platform design, Infrastructure setup, model deployment, and data integration. He received his Bachelor’s in Electronics and Microelectronics from Moscow Power Engineering Institute.
Guest: Ben Webster, Modeling and Analytics Team Lead, NLP Logix
Expertise: Advanced Analytics, Predictive Modeling, and Sentiment Analysis
Brief Recognition: Ben has spent the last ten years at NLP Logix, first as a data scientist from 2013 to 2021 before being promoted to his current position as Modeling and Analytics Team Lead. In 2016, he earned his master’s degree in Mathematics and Statistics from the University of North Florida.
Recognizing the Limitations of LLMs LLMs’ Limitations
Anton begins by discussing the specific limitations of Large Language Models (LLMs) like GPT-3, particularly in handling numerical tasks compared to traditional AI methods. He underscores how essential it is to recognize that LLMs can appear more confident and proficient in many day-to-day operations where they actually consistently fall short due to their probabilistic nature. In fact, these domains may be better suited to the application of ‘less advanced,’ or deterministic, AI technologies.
He gives an example of predictive modeling, such as forecasting the weather, as a task where these traditional, ‘first-generation’ AI methodologies excel compared to the more probabilistic LLMs. Ben Webster chimes in to echo Anton’s point here by emphasizing that, contrary to much popular belief he encounters in consultations with NLP Logix’s clients, LLMs are not meant to replace existing AI solutions. He continues, pointing out that many current AI solutions don’t rely on LLM technology, and there’s often no need to do so. Or as he puts it: “You don’t need to try and reinvent the wheel when it already exists,” he reassures the executive podcast audience.
Anton then discusses the practical applications of LLMs in the enterprise more generally. He suggests that deploying LLMs to build chatbots can benefit companies trying to organize the large amounts of data they’ve already collected more efficiently. He provides an example of a scenario where his team assisted a client in creating a chatbot that navigates through their documentation and codebase.
From Anton’s description, the audience deduces that NLP Logix’s chatbot helps the company’s operational staff by answering questions related to their code, such as identifying issues or checking for required fields.
In addition to Anton’s answer about the misconceptions surrounding LLMs, Ben takes a step back to discuss the practical applications of these models and data utilization within companies. He feels strongly that enterprises should evaluate the quality of their data first before beginning to implement tools like LLMs to search through internal data. He also encourages business leaders to question whether all their company data is precious or if only a portion contains helpful knowledge. Even while that assessment is taking place, the rest of the data might need to be revised or better maintained.
With regard to what such an assessment process should look like, Ben notes that:
“Because historically, as companies collected things like documents where large language models tend to do the best, they would just take any document that exists and throw it into a database. And now you need to ask questions like, ‘Oh, did we have 20 copies of the same exact document with just small changes? We have some original copies of templates that are really great. And then some that people have manipulated a bunch that are trash.'”
– Ben Webster, Modeling and Analytics Team Lead at NLP Logix
Prioritizing Data Quality and Curation for Informed Decisions
Ben discusses additional considerations related to data privacy and relevance when utilizing LLMs or similar tools.
- Privacy Concerns: First, he raises privacy concerns regarding presenting data back to end-users without proper scrubbing. He questions how much data should be scrubbed, mentioning removing identifiable information such as names, places, and numbers. However, he also notes that doing so requires a balanced approach as excessive scrubbing may diminish the value of the data returned, as essential context may be lost.
- Data relevance: Ben also mentions that, while some may find value in receiving concise answers extracted from documents, others may prefer access to the entire document for verification purposes. It raises questions about data storage and retrieval methods and the importance of considering the level of detail needed for data to be relevant and valuable.
In giving his impressions of Ben’s presentation of the above framework, Anton emphasizes the strength of LLMs, particularly in summarization. He suggests that LLMs excel at condensing large amounts of unlabeled data, such as documents or valuable information, into concise summaries comprising one or two paragraphs.
These summaries provide a high-level overview of the content of the documents, enabling companies to gain insights into the data they possess, even if they have yet to fully understand its specifics initially:
“And the great thing is that you can summarize this into one or two paragraphs, which is going to give you a high-, high-level overview about what these documents say. Then you can start applying traditional techniques, such as classification. And you can find where those documents belong and create a valuable collection of your specific domain knowledge for your company.”
– Anton Kornienko, Data Science Solutions Architect at NLP Logix
Customizing LLMs for Specific Needs
Anton also adds that LLMs must be thought of as the foundational models that can address many common questions and problems. Often, when data leaders discuss models as ‘foundational’ (rather than ‘bespoke’), they’re referring to models that may not have as big an impact on direct customer-facing workflows, but operate as an information superhighway within the organization. He suggests that while LLMs can serve as solid foundations to these ends, more specialized (or ‘bespoke’) models may be built on top of them that cater to more specific needs and address more directly client-facing problems.
Ben further outlines a progression of options based on increasing complexity and customization needs. Firstly, he suggests developing custom prompts for language models, which often suffice for many applications. If the outputs of a custom-prompts approach doesn’t meet expectations, the next step is employing an existing model for data retrieval:
“I would not say that it’s overloading if you have 20 different models. But specifically in talking about LLMs, what I think right now is that: It’s not in demand when people think about retraining and training their own specific language models because the models that are already trained are pretty good and usable. So all you need is to have a good team who can help you adjust this model for your needs, your specific questions, and your specific problems.”
– Ben Webster, Modeling and Analytics Team Lead at NLP Logix
The process Ben refers to here is better known in data science circles as fine-tuning. He is quick to add that he only recommends fine-tuning to life sciences leaders as a last resort. He goes on to describe the process as more of an intense academic procedure than an easily deployable enterprise capability due to its complexities and potential need for continual retraining.
He highlights the potential challenges and friction associated with fine-tuning, such as constantly revisiting the model for updates or changes in the underlying technology. Before business leaders jump into the ‘deep end’ of fine-tuning, Ben advocates a ‘less is more’ approach, where they optimize proposed solutions within a simple framework to achieve the desired outcomes with minimal disruption.
Lastly, Anton introduces the ‘human in the loop’ concept as an essential workflow component when utilizing LLMs or similar technologies, and it is vital to fine-tune if business leaders find that step necessary. He explains that having a human involved in the process, particularly at the end of the pipeline, is crucial for ensuring the reliability and appropriateness of the generated output.
By ‘looping in’ more excellent human expertise to these developing systems, organizations can ensure that the output generated by LLMs is validated by human expertise, thus improving the overall quality and reliability of the results.