An Overview of AI Document Digitization in Finance

Niccolo Mejia

Niccolo is a content writer and Junior Analyst at Emerj, developing both web content and helping with quantitative research. He holds a bachelor's degree in Writing, Literature, and Publishing from Emerson College.

An Overview of AI Document Digitization in Finance

There are several companies claiming to offer AI document digitization solutions to banks, insurance enterprises, and other financial institutions. We found that these solutions are intended to help financial institutions with at least one of the following business problems:

  • Digitizing physical documents and forms
  • Updating scanned documents
  • Searching through digitized documents

What Business Leaders in Finance Should Know

What is AI Document Digitization?

Many enterprise banks, insurance companies, and other financial institutions likely store many of their files in physical locations—filing cabinets, drawers, and storage bins. This makes it extremely difficult for employees at these companies to search through these documents quickly, especially when they are looking for nuanced pieces of information within them.

Document digitization is an application of machine vision that could help banks and insurance enterprises upload their documents into their systems and, perhaps most importantly, allow their employees to search for information within those documents. Theoretically, the machine vision algorithm behind document digitization software not only creates a PDF scan of a document, but fills in the template of a digital form with the information on each line of that document.

Again, in theory, this could allow banks and insurance companies to search for the information in those uploaded documents through a search function built into a document search software that runs on natural language processing. We’ve written extensively on document search for the banking and insurance industries in our two below reports:

Business leaders in finance should know, however, that not every document digitization software necessarily comes with search functionality, although most digitize documents with that search function in mind. How successful this software is in practice remains to be seen fully.

Three of the companies mentioned in this report offer solutions for enterprise document digitization, but we also included one company that offers AI document search once those documents are digitized. Parascript, Leverton.ai, and IManage all offer document digitization, whereas Alphasense offers document search.

Which Companies Offer AI Document Digitization and/or Document Search?

Parascript is a Russian company founded in 1996. Their website uses AI-related terms, such as machine learning, prominently throughout its website, which indicates to us that the company has undergone a recent “AI rebrand.”

Many older tech firms and consultancies, even the largest, have opted to position themselves as leaders when it comes to offering AI solutions. However, it’s very unlikely that a company established in 1996, well before machine learning was on the radar of most computer scientists, let alone business leaders, has been offering AI since its founding. As a result, the likelihood that a company founded earlier than the late 2000s employs artificial intelligence talent in its C-suite is low. A company’s density of AI talent, especially in its C-suite, is one of the key criteria we look for when vetting a company on the legitimacy of its claims to leveraging AI.

Parascript actually does employ a C-level executive who might be capable of working on machine learning solutions: its CEO. He holds a PhD in Applied Mathematics from Moscow State Technical University, which is a degree that is well-suited for working with data in a way necessary to build machine learning software. That said, Parascript onboarded him to be its President and CTO in 2010, 14 years after the company was founded, which may lend credence to the presumption that the company rebranded to focus on AI.

In addition, Parascript doesn’t seem to list any data scientists on its LinkedIn profile. This doesn’t mean that they don’t employ any, but if they do, they may also serve other roles at the company, listing those roles instead. This is necessarily different than if they employed dedicated data scientists and AI talent, which would imply a higher likelihood that the company is doing AI.

This is not to say Parascript’s document digitization software doesn’t actually work for its clients. We simply felt it was important to make business leaders aware of the possible discrepancies between the company’s claims and their staff. Again, its software may very well get the job done; it may just not use machine learning to do it.

iManage is in a slightly similar position, having been founded in 1998, but their recent foray into AI seems to be on the back of the data scientists they onboarded when they acquired RAVN Systems, a firm offering AI solutions. The company’s AI talent for the most part worked at RAVN Systems prior to the acquisition, and so although iManage may not have had the necessary talent requirements to legitimately claim to offer AI, they seem to now at least to some degree. They employ a handful of data scientists and machine learning engineers with Master’s degrees, on of which holds a PhD in Electronics that he earned in 2009.

Leverton.ai seems to be the company with the most robust data science staff. Its Head of Research holds a PhD in Math that he earned in 2007. Previously, he worked at Leverton as a machine learning engineer, and prior to that, he was a data scientist working with machine learning at Zalando SE, a fashion enterprise in Europe with over 5,500 employees. In addition, the company employs a data scientist who holds a PhD in Mathematical Physics from 2014, and previously, he worked as a data scientist at HelloFresh and an analyst at Bank of America.

In addition to their other data scientists, these two employees indicate a high likelihood that Leverton is in fact offering machine learning software for document digitization.

With regards to document search, Alphasense employs a few “AI Researchers,” including a few with Master’s degrees in Computer Science. One of these employees earned their degree at Carnegie Mellon, world-renowned for their machine learning program. Business leaders should be aware, however, that it does not list data scientists on its LinkedIn profile, however, and they do not seem to employ AI talent in their C-suite.

Digitizing Physical Documents and Forms

Parascript

Parascript offers a namesake software for data extraction which it claims can help financial institutions digitize their physical documents using machine vision.

We can infer the machine learning model behind the software was trained on millions of document scans, such as insurance claims forms, loan contracts, and tax forms. These scans would have been labeled as whether or not a claim was under a certain amount, a loan was paid back on time, or a tax form was from a certain year, for example. These labeled scans would then be run through the software’s machine learning algorithm. This would have trained the algorithm to discern the sequences and patterns of 1’s and 0’s that, to the human eye, form the chains of text that indicate a small claim, an unpaid loan, or a certain date, as displayed in a PDF or similar document scan.

The user could then search the software for claims under a certain amount, unpaid loans, or tax forms from before a certain year.

Below is a short 3-minute video demonstrating how a Parascript document digitization solution works for insurance claims:

Parascript claims to have helped an inc. 5000 company continue to process documents quickly during a period of rapid growth. The documents the client company needed to process were becoming more varied and complex, and their current process could not support every type of document. The client company integrated Parascript’s software into its database. According to the case study, The client company cut document processing labor costs by 50%.

Parascript also lists SourceHOV and Burroughs as some of their past clients.

Ilia Lossev is Chief Scientist and Vice President at Parascript. He holds a PhD in Mathematical Biophysics from the Institute of Biological Physics at USSR Academy of Science.

Leverton.ai

Leverton.ai offers a namesake software which it claims can help financial institutions digitize their documents using machine vision.

We can infer the machine learning model behind the software was trained on hundreds of thousands of financial documents, such as non-performing loans, loan agreements, and mortgage documents. We can infer these scans would have been labeled as loan amount, effective date, and principal sum. These labeled scans would then be run through the software’s machine learning algorithm. This would have trained the algorithm to discern the sequences and patterns of 1’s and 0’s that, to the human eye, form the chains of text that relate to the amount of a loan, the effective date thereof, and the principal sum of a mortgage as displayed in a PDF or similar document scan.

The user could then search the software for any data relating to the scanned items, such as how many open loans are over a certain amount, how many loans are were started in a given week, how many mortgages there are that are over a certain principal sum.

Below is a short 3-minute video demonstrating how Leverton.ai works:

Our research yielded no results when we tried to find case studies for the software. Leverton lists Apollo and Deutsche Bank as some of their past clients.

Florian Kuhlmann is Founder and CTO at Leverton.ai. He holds an MS in Business Informatics from the University of Münster. Previously, Kuhlmann served as Senior Project Manager of R&D at Neofonie GmbH.

Updating Scanned Documents

iManage

iManage offers a software called ISDA MA, CSA Robot, which it claims can help financial services companies easily digitize their documents as well as update the digital scans of those documents as needed using natural language processing and what appears to be machine vision.

We can infer the machine learning model behind the software was trained on tens of thousands of International Swaps and Derivatives Association (ISDA) master agreement (MA) documents and Credit Support Annexes (CSAs). These involve the terms companies set for partnerships or enterprise level transactions. These scans would have been labeled as the companies involved in the agreement and what each party is receiving. The labeled scans would then be run through the software’s machine learning algorithm. This would have trained the algorithm to discern the chains of text that, to the human brain, might be interpreted as an agreement between two companies as displayed in an ISDA MA or CSA.

An employee would then be able to search for an ISDA document or CSA that would need to be updated based on the business data from the client company’s database. It would appear that the software would be able to make those updates automatically, but iManage claims it “combines AI with information processing.” As a result, we can infer that if the software can make these updates, it’s ability to do so may not come from machine learning or natural language processing.

We were unable to find a demonstration video showing how ISDA MA, CSA Robot works.

iManage does not make available any case studies regarding the software.

That said, iManage lists Sirius International and Blick Rothenberg as some of their past clients.

Mohit Mutreja is CTO at iManage. He holds an MS in computer science from the University of Illinois at Chicago. Previously, Mutreja served as Head of Research and Development of Autonomy at Hewlett-Packard.

Searching Through Digitized Documents

AlphaSense

AlphaSense offers a namesake software, which it claims can help banks and financial service companies easily search through large amounts of financial documents using natural language processing.

We can infer the machine learning model behind the software was trained on tens of thousands of financial documents, such as contracts or receipts. These would have been labeled as insurance claims forms or loan applications, for example. The labeled text would then be run through the software’s machine learning algorithm. This would have trained the algorithm to discern the chains of text that, to the human brain, might be interpreted as an insurance claims form or a loan application as displayed in a PDF document.

The software would then be able to determine which documents are most relevant to the user’s keyword search request, and provide them in a list starting with the most relevant.

We were unable to find a demonstration video showing how AlphaSense works. In addition, AlphaSense does not make available any case studies regarding the software. That said, AlphaSense lists Credit Suisse and JP Morgan as some of their past clients.

Raj Naveernnan is Founder and CTO at AlphaSense. He holds an MS in Computer Science from Bowling Green State University. Previously, Naveernnan served as CTO at Majesco Mastek.

 

Header Image Credit: Corrigan Record Storage

Stay Ahead of the AI Curve

Discover the critical AI trends and applications that separate winners from losers in the future of business.

Sign up for the 'AI Advantage' newsletter:

Subscribe