Information Extraction in Banking – Compliance, Contracts, and More

Dylan Azulay

Dylan is Senior Analyst of Financial Services at Emerj, conducting research on AI use-cases across banking, insurance, and wealth management.

Information Extraction in Banking - Compliance, Contracts, and More

Large banks deal with millions of documents every day across their corporate offices and numerous branches. Although one might assume that these documents are digital, in many cases, even the largest banks store old physical documents in file cabinets and boxes off the bank’s premises, and even those that are kept on-site might be relegated to storage units amongst hundreds of thousands of other documents.

The state of organization in the digital space isn’t much better. One bank’s multiple departments might all store their digital documents differently. They might each use a different system or save the documents in the same system in a totally different way. In addition, document layouts often change over time. A bank’s employment contracts might look completely different from those it had employees sign twenty years ago, for example; and these contracts might differ in both format and content.

Banks pay employees and retained legal teams to spend a majority of their time finding and reading through these documents to find information relevant to them in the moment. Depending on where the documents are stored, this could take weeks or months, and they might need to do the same process over again to find different information at a later date.

As a result, if a customer were to ask the bank to purge all of the information it has on that customer, the bank would have a very difficult time fulfilling this request. That puts them at risk of noncompliance with regulations such as GDPR and the California Consumer Privacy Act. In the case of the former, noncompliance could result in a fine of €10 million ($11.2 million) or 2% of the company’s yearly revenue, whichever is higher.

In addition, banks are spending on search and discovery: not analysis. They would obviously rather have their employees and lawyers spend as much of their time making decisions based on the information they find during their searches as possible.

Artificial Intelligence, namely natural language processing (NLP) and machine vision, could be one way to remedy this situation.

We spoke with Anke Conzelmann, Director of Product Management at Iron Mountain, about where AI could be used in banking for document search and information extraction purposes. In this article, we discuss several use-cases for AI-based document digitization and information extraction in banking, including contracts and compliance, interspersed with quotes from Conzelmann.

For more on information extraction and data search in banking, we recommend Iron Mountain’s white paper on the topic.

We begin with what is for many banks one of the most challenging problems at their companies: what to do with their backlogs of paper documents.

Digitizing Paper Documents and Microfiches

Common in the financial sector, including investment banking, is the storage of documents on microfiche. Microfiches are small index card-size films that contain microscopic images of documents. A microfiche reader zooms in on these images, allowing bank employees to read the documents.

Microfiches often contain past account statements and customer information. Employees that need to collect this information need to slide microfiches under a reader to figure out what each document on the microfiche contains. They might go through several microfiches to find the right information or get all the information required to respond to a request. Digitizing these microfiches could save employees time, allowing them to focus on higher value activities and enhance the customer experience.

Machine vision software, particularly optical character recognition (OCR), could help with digitizing the documents found within a microfiche. OCR is a type of artificial intelligence that works specifically to transcribe printed and hand-written text into digital text. Document digitization may be a helpful AI use-case in banking right now in this current phase of the AI Zeitgeist (which we call “Emergence”), mainly for the reasons described at the top of this article.

Microfiches contain documents from before digital became the primary storage method in banking, but these documents are often still relevant to established banks that have been in business for decades.

Conzelmann exemplifies their relevance with an estate owner who asserts the bank holds millions for a particular estate. An employee at the bank may need to verify or disprove this by producing multiple months or even years of statements on microfiche. Conzelmann details the laborious process of finding these statements on microfiche:

[The employee needs to] go find the right microfiche in the right box, put it on [their] microfiche reader, find the right square on [the] microfiche, get that digitized, and that was month one. Hopefully [the account] only ha[s] one page statements because, if [it] ha[s] two pages, [they’re] doing it twice for that particular month. [Searching across] 24 months takes a really long time. [The employee is] not adding any value through this; all [they’re] doing is responding to a request from a customer.

What the bank wants to do is be able to search for an account number across a date range and find all of the statements relevant for the particular estate, but this is challenging for large banks that store their documents across disparate sources, including microfiche.

An OCR software could in theory transcribe the text in the documents that were digitized from microfiche so that employees could search the documents and find relevant information within them faster, but also so that another employee doesn’t have to search for that same document on microfiche in the future.

That all said, OCR as part of document digitization seems to be a relatively nascent use-case for AI in banking in comparison to another machine vision capability: facial recognition and image recognition. During the course of our own research, we only found a total of five AI vendors offering machine vision software in comparison to the next AI approach we discuss in this article: natural language processing.

Information Extraction for Mortgages and Contracts

There are numerous use-cases for natural language processing in banking, and those for document search and related functions are by far the most common. In fact, according to our research, roughly 23% of AI vendors selling into banking offer NLP software for information retrieval: in other words, search. Seven top 100 banks also claim they’re using NLP for information retrieval as well, including JP Morgan Chase.

Banks could digitize all of their paper documents and still have difficulty searching for relevant information within them. Digital documents are certainly better than physical ones for the purposes of standard search functions, but AI search promises to extract information from thousands of documents when that information isn’t identical in every document it exists. This kind of functionality is right now only possible with artificial intelligence.

Suppose a bank digitizes mortgage agreements from paper. This likely allows employees to read through them faster, but they may still need to read most of the contract to find the information relevant for them. An AI-based document search or information extraction application, what Iron Mountain calls “document understanding,” could allow the employee to find information such as:

  • Mortgages of a certain amount issued within a certain date range
  • Mortgages issued in a certain geolocation within a certain date range
  • Mortgage agreements containing specific clauses or iterations of those clauses

Document search and information extraction applications could present these mortgages to a bank employee even if the information within them is not in the same format or isn’t said in the same way.

This applies to the Mortgage QA process as well: a bank must ensure the information within a customer’s loan file  is complete (all the forms are present) and that all of their forms are filled out completely. An information extraction software may be able to pull out the customer’s name, social security number, the APR, and other pertinent information as it appears in various places across all of the customer’s documents, even if that information is written differently in different places of the documents for different loans.

A customer named Robert might sign their name “Bob,” for example, or they might have missed a digit in their social security number. In theory, the NLP software would still extract this information as the customer’s name and social security number, but would flag the inconsistency as an exception. This would allow a bank employee to verify and correct the information or ask the customer to update their information as needed.

Information Extraction for Human Resources and Compliance

In areas where data privacy laws such as GDPR are in effect or will be soon, banks will likely need to figure out how to find all of the information they have on a customer or employee, be able to produce this information if requested, and be able to prove that they’ve purged it if the customer or employee ever asks them to do so. This can be a challenge in and of itself but gets even more difficult for customers and employees whose information is stored in part within physical documents. Conzelmann explains with a personal anecdote:

I’ve been at Iron Mountain a long time. When I first came, there were physical pieces of paper that were filled out as part of my employee file. Those are still sitting somewhere. But there was also digital information that was collected when I had my review last month. So how does [a bank] go across those different repositories of information…and be able to answer questions like…’Give me all of this employee’s personal information across all of these channels.’

Finding where that information is located is paramount for banks that want to remain within compliance, and although the integration time will differ depending on the way the bank is already organizing their digital documents, a bank might benefit from implementing an AI-based document search system for compliance purposes.

Under these data privacy laws, banks need to be able to present a customer or employee’s personal information to them when they ask for it. A search application that allows a bank’s HR or customer service department to quickly find all of an employee’s or customer’s information may be necessary in the future as data privacy laws become more ubiquitous in many parts of the world.

Our research corroborates that there’s a need for AI-based solutions for compliance. We found that 12 AI vendors offer compliance solutions to banks, about 15% of the number of AI vendors selling into banking. We also found that, on average, AI vendors offering compliance solutions were relatively credible, scoring a 3.1 out of 4.0 on our Expertise and Funding score, which scores a vendor on the AI experience of their team and how much funding they’ve raised.

This indicates that banks looking to adopt AI for compliance at their companies are likely to work with an AI vendor that have the technical staff to back up the claims they make about their software.

In other words, compliance vendors are likely to know what they’re talking about when it comes to artificial intelligence and machine learning. Many AI vendors actually don’t, as we outline in one of our most popular executive guides: 7 Ways to Tell if an AI Company is Lying About Using AI.

The Bottom Line – What Banks Need to Know

Banks have options when it comes to natural language processing solutions for extracting information from digital documents. They have much less options for digitizing their paper documents, although we suspect that the pool of solutions for this use-case will widen over time.

The bottom line is that banks perhaps more than any other financial institutions are dealing with an inordinate amount of documents in a variety of formats both physical and digital, and they struggle with searching through these documents to generate customer analytics, solve customer support inquiries, and perhaps most importantly, remain compliant with local and regional laws.

This will likely become more difficult with the continued introduction of data privacy laws, and banks could lose hundreds of millions of dollars in fines if they aren’t efficient and organized enough to provide customers with their personal information and purge it on request.

Banks that have the resources to commit to building an AI-based information extraction product in-house or working with a credible AI vendor may come out ahead of even the largest banks that struggle to digitize their millions of legacy documents, let alone implement an AI search function for them.

Conzelmann puts the potential value of AI-based search applications succinctly:

The power of machine learning and AI is that you can do this at scale accross millions of documents even when you’re dealing with disparate, different-looking content.

Banks could save millions on time-consuming processes that involve manually searching through paper documents, microfishes, PDF scans, and digital forms in a variety of file types. Although absolutely no bank should implement AI without a thorough understanding of the data, talent, time, and resources it requires, banks that are truly ready for AI might want to consider a document digitization or search application. We suspect that applications like these are likely to become universal in the coming decade, especially in response to GDPR and similar regulations.


This article was sponsored by Iron Mountain, and was written, edited and published in alignment with our transparent Emerj sponsored content guidelines. Learn more about reaching our AI-focused executive audience on our Emerj advertising page.

Header Image Credit: Aspartame

Stay Ahead of the AI Curve

Discover the critical AI trends and applications that separate winners from losers in the future of business.

Sign up for the 'AI Advantage' newsletter: