An Analysis of AI-Powered Document Search Capabilities in Banking

Daniel Faggella

Daniel Faggella is Head of Research at Emerj. Called upon by the United Nations, World Bank, INTERPOL, and leading enterprises, Daniel is a globally sought-after expert on the competitive strategy implications of AI for business and government leaders.

Document Search and Discovery in Banking - An Analysis of the Field
This Emerj Plus article has been made publicly available for a limited time.
To unlock our full library of AI use-cases and AI strategy best practices, visit Emerj Plus.

The financial services industry is buried in paperwork, and the NLP use-cases in banking and insurance grow every year.

For the last three years, we’ve closely followed the application space of AI-based search applications. These applications tend to be broad, and can hypothetically handle nearly any text-related data format, and could hypothetically be used to address nearly any document or data-related use workflow.

Over the last 18 months we’ve interviewed and directly analyzed 15 AI-based search vendors selling into the retail banking sector, including startups like, LucidWorks, Sinequa, and incumbent giants SAS and IBM. Our goal was to gain more clarity on how these broad search applications are actually used in practice, and what actual business problems they are addressing.

This article is broken up into five discrete sections:

  • Adoption Motives for AI-Based Search Solutions in Banking
  • Data Types Processed by Search and Discovery Applications
  • Evidence of Return on Investment
  • AI Deployment Lessons Learned for Search and Discovery Applications
  • Concluding Points for AI Product Developers

For enterprise leaders, this article should help inform your decisions about whether or not to adopt an AI-enabled search application, and potentially identify specific workflows where such applications might add value.

For AI consultants, startup leaders, and product developers, this article should help to lay out the competitive landscape of existing solutions, and showcase openings for new solutions to add enterprise value and win market share.

Adoption Motives for AI-Based Search Solutions in Banking

Compliance and Regulatory Pressure

In our primary research, we asked our interviewees about the “Problem Description” for our use-cases. Time and time again we received one of two answers:

  • Compliance was probably the strongest motive for buying, or 
  • Proving compliance benefits (that the AI initiative or product would help the company become more, not less compliant) was necessary before the bank was willing to consider the other benefits of the product

This was more than a simple “well of course everything needs to be compliant, nothing new here”, it was clear that compliance is not only the strongest present buying motive, and a most common point of entry for the value proposition of 


  • A general cross-platform search application (with a problem description as “the client had it’s data broken up across many silos and had a hard time finding things”) is often adopted because of GDPR threats.

Problems to Solve

The problem types that showed up most frequently from our analysis of the case studies include:

  • Improve intranet search (for HR and other internal company data, not customer data). We do not believe this to be a particularly lucrative use-case, but it is rather frequent and clearly there is a market for it.
  • Classifying lending documents (mortgage, for example), and data extraction from lending documents (including OCR).
  • 360 degree customer view for individuals (GDPR) and companies (KYC), mostly for compliance reasons. Pulling up all docs or all communications related to a specific entity in order to ensure compliance.
  • External search (of web sources, earnings reports, news sources) for wealth management, creating custom insights, reports or alerts for traders.

Data Types Processed

Document Ambiguity, Level of Customization Per Project

Almost no two applications were the same, and the vendors in this landscape made it clear that the documents and processes and integration for each client is unique. For vendors, this is somewhat unfortunate – but provides a barrier to entry to the market because of how hard it is to integrate and solve these unique search challenges for individual clients.

Adherence to a Limited Number of Document Types is an Unlikely Strategy

Each bank is unique. Within lending, within wealth management, within customer service, IT systems, data formats, business use-cases, document formats, document storage strategies, and data used to complete those processes, are unique.

To one client, a 360-degree view of customers may involve 40 data types, for another, it might be 400 or more. For one client, GDPR may post issues with a certain number of data types, and the same GDPR regulation might involve different workflows, risks, and data types for another bank. 

While there will be some transferability in skills and experience and data types, each setup will have many bespoke elements, and vendors should prepare to “be flexible” (i.e. willing to service unique needs in unique ways) in client projects. This is a reflection of the data environments and differences between banks.

Evidence of Return on Investment

Emerj's _AI ROI Trinity_ Model
Source: Emerj Plus AI Best Practice Guides

Challenging to Directly Measure Financial ROI

The vast majority of use-cases we examined showed little direct ROI evidence or claim. Bankers – and even vendors – admit that such a measurable ROI is extremely unlikely, because:

  • In order to adequately measure ROI, very precise before-and-after testing would have to be done (i.e. the time it takes to complete X process, or find X kinds of documents, or handle X kind of customer support call), but doing these before-and-after measurements isn’t something banks are likely to do.
  • Many applications will cost large sums of money and time (including upgrading data infrastructure, working with vendors, polling and talking to employees about search workflows to see the features they want) — and it is near-impossible to know when break-even really occurs on such a project.
  • Very few deals are being sold with a powerful financial pitch – they are being sold with a pitch to compliance, to reducing risk.

Risk Mitigation is the Trump Card to Handle ROI Concerns

In the exact words of one of our interviewees, a Head of Innovation at a mid-size Midwestern Bank: 

Fear and uncertainty is what sells for search/discovery… giving people a known problem/risk… or an unknown risk that you can prove in a pilot. It’s not really something that can be measured… it’s just something people get scared into. We have no idea the ‘ROI.’ It’s best to appeal people on problems they KNOW they have. You can apply it to problems they don’t know they have… but you should think through the lens of “what is the problem they’re already struggling with and frustrated with.”

“Known” Risks to Appeal To Directly

Based on both primary and secondary research – the most common known risks to resonate with buyers seem to be:

  • In the Wealth Management function: Keeping brokers, wealth managers, customer service people compliant in their communications (in their marketing and cross-selling, or in anything that could be interpreted as insider trading).
  • GDPR: Finding any and all information about a given customer (digital, physical, or scanned images of documents, CRM records, etc). This is most challenging when finding data in legacy or “home grown” IT systems. Working with these systems is very valuable (as it addresses a well-known need), but is obviously not very scalable.
  • Know Your Customer or “KYC” guidelines (set in place to prevent money laundering). Anything sold as KYC (unifying contact records, cross-referencing or verifying customer data, organizing all data about an entity or customer in one place) can be sold as a compliance benefit.
  • Making sure that contracts or agreements don’t violate company policies, or regulatory standards (for a loan, for example). Classifying compliant and potentially not-compliant contracts helps reduce this risk.

AI Deployment Lessons Learned

Scalability is Hard

Data types, processes, workflows, IT systems, data formats – are different from company to company.

As banks move slowly to the cloud and undergo digital transformation, this should become easier over time – but right now, relatively lengthy and bespoke setups will be common.

What is Near-Scalable?

While essentially no search and discovery solution is transferrable from one bank to another seamlessly, there are some skills and capabilities that can be built up as a competitive advantage, namely:

  • A contextual understanding of a workflow, and what kinds of classification can help improve said workflow (this was covered in robust depth in our first 40-page analysis).
  • A core technology for natural language processing or OCR. While these systems might become reasonably adept at individual document types (say, Promissory Notes), and might be able to start from scratch at 60-70% success rates of classification or information extraction from said documents — it is still somewhat inevitable that 
  • A broad understanding of the kinds of problems that search and discovery can solve (to help explore with clients, to help flesh out the different need areas of a client to build something for them). This is part of the intended value of this report, and the Case Studies datasheet.

Money is on the Back-End

Bespoke upfront setups are the norm, and will be for years to come.

To quote one vendor leader:

“After the setup is complete and you’re just servicing and iterating on the existing model in the client’s environment, you can run it like a SaaS business, but the integration and calibration is going to be really unique and for the next few years there is no way around that.”

The recurring revenue, the “product” money (not “services” money) is hard to extract early on, as there is a heavy services burden early on.

Concluding Points for AI Product Developers

Areas of Biggest Market Need

Compliance is the most pressing need in banking a present, and while this can be labelled a business function, it stretches across many business functions. Customer service, lending, legal, wealth management — all need to be up to speed on compliance. 

Second to compliance is wealth management, but this mostly involves what we refer to as “external search”, which seems to be neither the strength of EXL, nor their desired area of focus. The demand and competition for these external search applications is substantial, and in many regards, these offerings are not “document intelligence” at all – as they are focused on external data (on the web, primarily).

Areas of Largest Market Size

In the long term, vendors and bankers seem to agree that wealth management is transforming radically, and that there has been a blossoming ecosystem of ways to pull in, search, and find patterns in new data sources in investment data. Total market value and size-wise, this is poised to be massive.

Customer service is the internal banking function that most of our interviewees suspect will have the largest market opportunity – at least in terms of business function.

Stay Ahead of the AI Curve

Discover the critical AI trends and applications that separate winners from losers in the future of business.

Sign up for the 'AI Advantage' newsletter: