[seopress_breadcrumbs]

AI for KYC Compliance – Three Use Cases

•

January 9, 2023

‘Know Your Customer (KYC)’ compliance is what is commonly referred to by a legally-mandated, globally developed set of guidelines to ensure that banks, financial institutions (FIs), and other related enterprises perform due diligence on potential customers. The purpose is to verify customer identity as well as the legitimacy and risk involved with developing and maintaining a business relationship.

By verifying customer identities according to the strict rules-based system of the Banking Secrecy Act of 1970, KYC compliance rules reduce instances of fraud and money laundering. Businesses comply with KYC to meet local business standards, mitigate the risk of fines and punishments, and avoid reputation damage.

Failing to comply with KYC and anti-money laundering (AML) rules carries stiff penalties. Lengthy audits, multi-million-dollar fines, and bans on conducting business in specific countries or regions are all possible consequences.

To assist business leaders with navigating the hazard-filled, turbulent waters of KYC compliance, this article will examine the commonly-held compliance difficulties of banks, FIs, and other enterprises. In an attempt to provide some insight into potential solutions, we present several use cases–including the business problems, AI used, workflow changes, and business outcomes of said AI.

These use cases include:

Automated cross-referencing and application processing: Reducing processing time and costs in expediting KYC customer data verification.
10-K data extraction: Accelerate labeling workflows to shorten customer onboarding processes, labor costs, and time spend.
Identifying customer relationships: Analyzing transactions between banking customers to decrease the time and cost to perform a KYC review.

The Business Costs of Know Your Customer Compliance

As the true business costs of KYC are diverse and extensive, it is important for enterprise leaders to understand both the monetary and opportunity costs of traditional, mostly manual, KYC compliance processes. The business costs of KYC compliance can be classified into direct and indirect costs; the latter being the result of inefficient processes due to antiquated technologies.

Per a report on IT operations from financial services research and advisory firm Celent, FIs spent approximately $37.1 billion on AML-KYC compliance functions. Elsewhere in Celent research, KYC compliance is described as among the riskiest and most inefficient of all banking operations due to a lack of quality data and automation.

The primary challenge of KYC compliance for most banks usually consists of one of the following dilemmas:

Combining outdated technology with disparate internal and external data sources.
The absence or severe lack of high-quality data and automation.

The results are similar: data that are difficult–if not impossible–to locate, integrate, analyze or use. The pain is especially acute during a client’s “KYC check.”

A typical corporate KYC check process involves gathering, identifying, and validating various company or individual data. These data include client ID, wealth/net worth, funding sources, corporate subsidiaries, and more. Verifiers must cross-reference these data across sources to ensure customer truthfulness and information accuracy.

The absence of high-quality data and automation means:

(a) Increased security vulnerability
(b) Cumbersome, time-consuming assessments of corporate subsidiaries, shareholders, and structures.

The byproduct of effect (a) is that skilled fraudsters can take advantage of vulnerabilities by constructing networks of front firms and corporate structures. The result is increasing perplexity, with the fraudster often ultimately avoiding detection.

The byproduct of (b) shows up on the balance sheet. The volume of work that KYC/AML compliance mandates often translates to high expenses. The additional labor costs and technology spend are particularly significant.

Besides the significant expenses, opportunity costs for banks and FIs exist. These include:

Lost customers
Reduced productivity
Low-value-adding work
Stunted business growth.

The last is of particular concern. Enterprises operating in competitive environments may lose patience and take their business elsewhere. As such, vendors that offer KYC automation solutions often list “customer satisfaction” as a key benefit to their respective solutions.

AI and machine learning can augment or automate KYC compliance processes, possibly reducing some of the aforementioned direct and indirect costs.

We begin by discussing how one fintech uses a combination of machine learning components to integrate data silos, extract form data, and cross-verify data across various internal and external data stores.

Use Case #1: Automated Cross-Referencing and Application Processing

Datametica is a software company that offers automation, cloud, machine learning, and data warehouse migration solutions.

According to the case study, the client is a fintech firm that issues KYC acknowledgment letters. The firm’s clients transfer KYC-required customer information and assorted documentation. The firm then cross-references this data with those in-house. Reception of the acknowledgment letter is eagerly awaited by the client, as the business transaction can not proceed until this happens. There is a lot at stake for both the receiving and the issuing firms.

Before implementing the solution, the client’s workflow involved several manual processes. This included the manual acquisition of the application as well as the identity and proof of address documentation.

In the Datamenica webinar below, presenters explain the company views on the essentials for a KYC compliance solution based on available technological capabilities. The relevant section begins at the 34:31 mark and lasts approximately one minute:

The manual process carried over to identifying, matching, and verifying application details across internal data stores and sources before issuing the acknowledgment letter. The case study report describes the process as time-consuming and costly.

Complicating the problem was the inability to scale and meet customer requirements. Due to an influx of KYC requests and difficulty scaling manual processes, Datametica claims that the traditional workflow created challenges for the fintech firm in that the client could not satisfy the terms and conditions in its service-level agreements (SLAs).

On the extraction and processing side, the case study report also claims fintech’s leaders were also concerned about the potential for human error in several manual functions, including:

Processing backend data
Data extraction
Correlating customer data across sources and documents (e.g., application data versus some KYC document or database).

The case study further indicates the presence of a data bottleneck, making it difficult for the fintech firm to accommodate different application types from a single distribution point. A key influencing variable in this bottleneck was the 150+ data providers, each with their own KYC applications, documents, and supporting formats.

To overcome these challenges, Datametica states that they automated the end-to-end reception and verification of KYC applications and associated data via a machine learning model equipped with deep learning capabilities. The company claims that the solution can extract data from any assortment of digital KYC applications and forms using a single CVL client.

The case study reports integrating the following solutions, inputs, and outputs:

A OCR deep learning image processing model using custom computer vision and OCR codebase: to extract applicant information from printed forms and KYC documents
An integrated data pipeline: Aa central data repository for easier cross-reference of application information against KYC documents and databases
An image processing pipeline: to retrain tagged supporting documents
Validation and classification model: to identify new data points in KYC forms and verify against the client’s metadata.

The case study does not reveal the specifics of before and after workflow changes. However, we may safely conclude the following workflow modifications (assuming the information and reported results within the case study is accurate):

Potentially significantly faster and easier access to data sources, data extraction, and cross-verification
Potentially significantly less manual tagging and labeling of data
Potentially a significantly higher automation-to-human throughput ratio

Datametica reports the fintech client was able to achieve the following results using their solution:

75% reduction in operational costs from reduced manual processes, model implementation, and automated file classification
66% faster KYC application processing
85% accuracy in the automated verification process
Easier scalability with less effort

Use Case #2: 10-K Data Extraction for Customer Verification

Snorkel AI is a software company that produces solutions focused on accelerating AI applications for its clients via a patented automated data labeling method. The company has coined the term “programmatic labeling” to describe this method.

Snorkel’s client was reputedly a top-3 US bank, though no more details are given.

Before implementing Snorkel’s solution, the bank manually extracted data from 10-K forms. The length of 10-K reports — up to 300 pages — made the manual mining of these data time-consuming and onerous. The bank reported that this method lengthened the onboarding process, costing time and money.

Manual extraction and labeling of training data is often a slow process that requires a large team of data scientists and domain experts. Labor costs and time consumption are two of the more common complaints of business leaders here.

Snorkel AI offers a platform called Snorkel Flow, which the company claims can help businesses accelerate labeling using machine learning.

The platform uses what Snorkel dubs “programmatic labeling,” defined as “noisy, programmatic rules and heuristics that assign labels to unlabeled training data.” These attributes describe weak supervision machine learning, which appears to be at the center of the Snorkel Flow value proposition.

The Snorkel Flow value proposition. (Source: Snorkel AI)

To understand some of the content within this case study, it is necessary to quickly define supervised machine learning and why it is sometimes not an ideal solution from a business perspective.

In short, supervised machine learning requires mapping input data to output and manual labeling. For the enterprise, this process is — literally and figuratively — expensive as it is slow, requiring a team of data scientists and, often, domain experts.

The banking client reported the following quantitative operational problems with its KYC functions:

Labor costs: 300-500 KYC analysts necessary to manually extract data
Time spend vs. volume: 30-90 min spent manually reviewing a single 10-K report, with 10,000+ reports analyzed every year

An automated extraction solution centered around Snorkel Flow was constructed. To meet client requirements, Snorkel worked with the bank to custom-build its solution.

A key reason why Snorkel recommended the Flow solution was the programmatic labeling ability of the software. Snorkel states that programmatic labeling improves traditional methods by labeling functions by enabling large-scale labeling instead of one-by-one tagging, expediting the process.

*A screenshot of Snorkel AI’s programmatic labeling user Interface. (Source:* *Snorkel AI*)

The end-user workflow appears to be as follows:

Data integration: The client integrates the platform with its data stores using APIs.
Writing labeling functions: Users create labeling functions in this phase to represent different weak supervision sources, such as patterns, heuristics, outside knowledge bases, and other organizational resources.
Modeling relationships: User-provided labeling functions are combined with new weights to develop a generative model that estimates certain accuracies and correlations.
Model training: The model is trained using a set of probabilistic labels generated by the software.

The case study does not provide specifics on which method was used to train the model. However, we can make some safe assumptions given the bank’s size.

The model was likely custom-trained (not using one of the five model frameworks or AutoML), given its status as a “top-3” bank. We may also deduce that its asset holdings, intellectual property, and data science resources are significant enough to demand a more resource-intensive, technically-rigorous solution.

Concerning the input and output data, we know from a Snorkel-sponsored webinar that the input data consisted of a dataset of unstructured, multi-format 10-K reports. The software extracts this unstructured data using programmatic labeling. The output is a database comprising the key attributes of the customer. The above-cited webinar reports the following output data:

Company name
Nature of business
Key senior managers
Total assets
Other attributes (15-20)

The business outcomes as reported by Snorkel:

89+% model accuracy
10,000 labor hours saved per year, equivalent to $500,000

Use Case #3: Identification and Tracking of Beneficial Owners

Quantexa is a London-based software company that produces decision intelligence software for banks and other enterprises.

The company produces a solution called Contextual Decision Intelligence (CDI) that it claims enables businesses to improve decision-making by mapping and displaying contextual relationships between data using machine learning.

ABN-amro is a Dutch multinational bank with a presence in 15 countries. The company reports a 2021 net profit of EUR 1.2 billion on revenues of EUR 8.47 billion.

A Product Owner at ABN-AMRO, Paul Westrate, discussed the use case in a video call with an analyst from Celent.

In the call, Westrate discusses the business reasons behind the partnership with Quantexa. He lists the time-consuming nature of financial crime investigations, high operational costs, and evolving compliance requirements as the three main impetus factors behind the partnership and sought solution.

More specifically, ABN-AMRO leaders sought an automated solution that could:

Reduce the labor required in manual data gathering and analysis
Reduce time spent on discerning legitimate and non-legitimate suspicious activities through automation
Combine internal and external data sources, group companies into hierarchies, and gain insight into their relationships.

Quantexa lists several components of the platform, including the core platform, underlying platform capabilities, and the underlying technology (see below). All of these may provide insight into what the workflow may look like for the bank’s end-user.

Visualization of the components and capabilities of Quantexa’s Contextual Decision Intelligence Platform (Source: Celent)

The bank’s end-user first connects their machine to the platform via server or cloud and selects their internal and external data via an API. Among the data points within internal sources are:

Customer/Company data
Account information
Transaction details
Alerts and cases

Among the data points within external data are:

Company structures
Ultimate Beneficial Owner (UBI) data
Enrichment
Watchlists

Following data integration, the entity resolution engine creates a single view of the integrated data. An existing data schema is THEN used to infer, configure, parse, and standardize potential linking attributes.

Network generation then links entities (i.e. customers/companies) into networks that may demonstrate some connection.

The output is a GUI of identified networks and highlighted risk areas for investigators. The display includes the most relevant connections, entities, and data links between ABN-amro customers. These data may include party and counterparty names, relationships, and transactions.

From this output, the end-user can then prepare analytic models and perform data exploration and visualization. This output reportedly helped the bank to understand, recognize, and counteract risks and threats and potentially enable more informed, accurate, and consistent investigations and decision-making.

Unfortunately, there are no publically reported quantitative benefits realized by ABN-amro. However, the following qualitative results were reported by a case study and the above-cited webinar, respectively.

According to a case study published by financial research and consulting firm, Celent, Quantexa’s CDI platform enhances KYC/CDD practices by:

Pinpointing and tracking disclosed and undisclosed beneficial owners and their associations
Promoting effective customer risk evaluations
Streamlining customer due diligence processes

Mr. Westrate also listed a couple of other benefits realized from the solution in the above-cited webinar:

Reduction in time spent gathering and understanding data and information
Improvement in the overall client experience

“Acknowledgement Letter Definition.” Law Insider, https://www.lawinsider.com/dictionary/acknowledgement-letter.
“Celent Case Study: Automating KYC Investigations with ABN AMRO.” Quantexa, Quantexa, 15 Mar. 2022, https://www.quantexa.com/resources/celent-kyc-investigations/.
Datametica. “How a Finance Company Saved 75% Cost by Automating KYC Process Using Machine Learning Model: Datametica Case Study.” Datametica, 25 Apr. 2022, https://www.datametica.com/how-a-finance-company-saved-75-cost-by-automating-kyc-process-through-machine-learning-model/.
“Programmatic Labeling.” Snorkel AI, Snorkel AI, 13 Sept. 2022, https://snorkel.ai/programmatic-labeling/.
Ray, Arin. “ABN AMRO: KYC Investigations.” Celent, Celent, 15 Mar. 2022, https://www.celent.com/insights/805245301.
Ray, Arin. “ABN AMRO: KYC Investigations.” Celent, Celent, 15 Mar. 2022, https://www.celent.com/insights/805245301.
“Snorkel Flow AI Application Development Platform.” Snorkel AI, 9 Dec. 2022, https://snorkel.ai/snorkel-flow-platform/.
“Understanding the Steps of a ‘Know Your Customer’ Process.” Dow Jones Professional, 23 Sept. 2022, https://www.dowjones.com/professional/risk/glossary/know-your-customer/.

Recommended from Emerj

Navigating Challenges and Solutions in Data Security with AI – with Dimitri Sirota of BigID

This interview analysis is sponsored by BigID and was written, edited, and published in alignment with our Emerj sponsored content guidelines. Learn more about our thought leadership and content creation services on our Emerj Media Services page. Find out more about how BigID can help your organization adopt AI safely and responsibly here. Uncontrolled AI…

Riya Pahuja

•

May 15, 2025

The Future of Customer Experience in Financial Services with Agentic AI – with Abhii Parakh of Prudential Financial and James Wood of Interactions

This article is sponsored by Interactions and was written, edited, and published in alignment with our Emerj sponsored content guidelines. Learn more about our thought leadership and content creation services on our Emerj Media Services page. Low customer engagement is a persistent challenge in the insurance sector, particularly with policies held for an extended period.…

Riya Pahuja

•

May 12, 2025

Artificial Intelligence at AbbVie – Two Use Cases

AbbVie is a global biopharmaceutical leader with approximately 55,000 employees in over 70 countries. In 2024, the company invested over $10.8 billion in research and development, supporting active immunology, oncology, and neuroscience clinical programs. To accelerate drug discovery, AbbVie is applying artificial intelligence (AI) to improve early-stage decision-making. The company aims to streamline target discovery…

Marilie Fouche

•

May 12, 2025

Emerj: Building Readiness for AI Agents in Healthcare Systems - Raheel Retiwalla

Building Readiness for AI Agents in Healthcare Systems – with Raheel Retiwalla of Productive Edge

This interview analysis is sponsored by Productive Edge and was written, edited, and published in alignment with our Emerj sponsored content guidelines. Learn more about our thought leadership and content creation services on our Emerj Media Services page. Burnout among hospital staff, particularly nurses and physicians, has reached critical levels. A report by the Center…

Riya Pahuja

•

May 8, 2025

Neurobiological and Cybernetic AI for Manufacturing, Part 2 – with Oleg Savin of Unilever

In our current technology-driven era, data is considered extremely valuable. Yet, data often goes unused or underutilized. The reasons vary, but it's certainly not a newly surfaced problem. An article initially published by Harvard Business Review highlights that organizations struggle with managing and analyzing existing data. This problem is more pronounced in manufacturing, where unused…

Sharon Moran

•

May 5, 2025

Artificial Intelligence at Charles Schwab – Two Use Cases

The Charles Schwab Corporation is a leading financial services firm, reporting $10.28 trillion in client assets as of February 2025, a 16% year-over-year increase. In Q4 2024, the company generated $5.3 billion in net revenues (up 20% year-over-year) and $1.8 billion in net income, resulting in $0.94 EPS. Core net new assets reached $114.8 billion…

Riya Pahuja

•

April 28, 2025

Driving Disease Risk Prediction and Preventative Healthcare with AI – with Dan Elton of the National Human Genome Research Institute

The landscape of preventative healthcare and genetic research is "awakening" with data, enabling earlier and more precise disease risk prediction. The evolution is particularly critical as the healthcare industry shifts from reactive treatment to proactive care. Integrating advanced capabilities with genomic data allows researchers and clinicians to analyze vast numbers of genetic variants, providing more…

Emily Smith

•

April 21, 2025

Artificial Intelligence at Foxconn – Two Use Cases

Foxconn, officially Hon Hai Precision Industry Co., Ltd., is a multinational electronics contract manufacturing company headquartered in Taiwan. Founded in 1974, it is renowned for producing consumer electronics for major companies like Apple, Microsoft, and Amazon. As of 2023, it employed approximately 90,221 people globally and reported an estimated annual revenue of $4.1 billion. The…

Riya Pahuja

•

April 14, 2025

Leveraging AI for Better Insurance Outcomes From Risk Management to Customer Care – with Mark McLaughlin of IBM

AI is transforming the insurance industry from reactive claims processing to proactive risk management. Rising competition from fintech and insurtechs and growing consumer demands for personalized, real-time experiences are driving such widespread industry adoption. Academic research highlights how insurers are increasingly using digital technologies and behavioral data to personalize services and influence customer behavior, underscoring…

Emily Smith

•

April 7, 2025

Managing End Point Storage in Hybrid Data Strategies for Financial Services – with Yonas Yohannes of Oracle

Transparency in AI is a major hurdle for businesses, particularly in the financial services industry. Generative AI (GenAI) models, particularly non-deterministic models, are often viewed as “black boxes,” making it difficult to understand the underlying decision-making processes. Due to this black box risk, banks can experience multiple types of AI incidents, including system glitches, data…

Sharon Moran

•

March 31, 2025

NLP Logix’s AI Collaborate 2024: A Look at the Future of GenAI Experiences from Sports to HR

This article is sponsored by NLP Logix and was written, edited, and published in alignment with our Emerj sponsored content guidelines. Learn more about our thought leadership and content creation services on our Emerj Media Services page. Event Title: AI Collaborate 2024 Event Host: NLP Logix Location: Florida, USA Date: November 19-20, 2024 What Happened…

Riya Pahuja

•

March 26, 2025

Artificial Intelligence at Aflac – Two Use Cases

Aflac is a global leader in supplemental health and life insurance, providing financial protection to over 50 million policyholders in the U.S. and Japan. In 2023, Aflac reported an annual revenue of $18.7 billion. With approximately 12,785 employees worldwide, Aflac continues to drive innovation in cancer and medical insurance. Although Aflac's total investment in AI…

Ashwin Telang

•

March 24, 2025

Search site

Search site

AI for KYC Compliance – Three Use Cases

The Business Costs of Know Your Customer Compliance

Use Case #1: Automated Cross-Referencing and Application Processing

Use Case #2: 10-K Data Extraction for Customer Verification

Use Case #3: Identification and Tracking of Beneficial Owners

Recommended from Emerj

Navigating Challenges and Solutions in Data Security with AI – with Dimitri Sirota of BigID

The Future of Customer Experience in Financial Services with Agentic AI – with Abhii Parakh of Prudential Financial and James Wood of Interactions

Artificial Intelligence at AbbVie – Two Use Cases

Building Readiness for AI Agents in Healthcare Systems – with Raheel Retiwalla of Productive Edge

Neurobiological and Cybernetic AI for Manufacturing, Part 2 – with Oleg Savin of Unilever

Artificial Intelligence at Charles Schwab – Two Use Cases

Driving Disease Risk Prediction and Preventative Healthcare with AI – with Dan Elton of the National Human Genome Research Institute

Artificial Intelligence at Foxconn – Two Use Cases

Leveraging AI for Better Insurance Outcomes From Risk Management to Customer Care – with Mark McLaughlin of IBM

Managing End Point Storage in Hybrid Data Strategies for Financial Services – with Yonas Yohannes of Oracle

NLP Logix’s AI Collaborate 2024: A Look at the Future of GenAI Experiences from Sports to HR

Artificial Intelligence at Aflac – Two Use Cases

Customize Your Experience

AI for KYC Compliance – Three Use Cases

The Business Costs of Know Your Customer Compliance

Use Case #1: Automated Cross-Referencing and Application Processing

Use Case #2: 10-K Data Extraction for Customer Verification

Use Case #3: Identification and Tracking of Beneficial Owners

Related Posts

Share article

Subscribe to updates

Recommended from Emerj

Navigating Challenges and Solutions in Data Security with AI – with Dimitri Sirota of BigID

The Future of Customer Experience in Financial Services with Agentic AI – with Abhii Parakh of Prudential Financial and James Wood of Interactions

Artificial Intelligence at AbbVie – Two Use Cases

Building Readiness for AI Agents in Healthcare Systems – with Raheel Retiwalla of Productive Edge

Neurobiological and Cybernetic AI for Manufacturing, Part 2 – with Oleg Savin of Unilever

Artificial Intelligence at Charles Schwab – Two Use Cases

Driving Disease Risk Prediction and Preventative Healthcare with AI – with Dan Elton of the National Human Genome Research Institute

Artificial Intelligence at Foxconn – Two Use Cases

Leveraging AI for Better Insurance Outcomes From Risk Management to Customer Care – with Mark McLaughlin of IBM

Managing End Point Storage in Hybrid Data Strategies for Financial Services – with Yonas Yohannes of Oracle

NLP Logix’s AI Collaborate 2024: A Look at the Future of GenAI Experiences from Sports to HR

Artificial Intelligence at Aflac – Two Use Cases

This Content is Exclusive to Emerj Plus Members

In-Depth Analysis

Exclusive AI Capabilities Matrix

Exclusive AI White Paper Library

Best Practices and executive guides

Register

Customize Your Experience