[seopress_breadcrumbs]

Simple Applications of NLP in Banking Compliance – with Jan Veldsink of Rabobank

•

October 17, 2022

An operational compliance framework is critical for banks and other financial institutions (FIs). However, it is a tremendous challenge to constantly and uniformly comply with regulations, guidelines, and laws, given the complexity and immensity of legislation.

Consider the 2010 Dodd-Frank Act, a comprehensive banking reform measure that created the Consumer Financial Protection Bureau, among other entities and frameworks. This sole law contains more than 1,500 provisions and 400 rule mandates banks and FIs must consider.

The typical compliance efforts used to – and to some extent, still do – involve costly, labor-intensive processes. However, NLP technology is helping automate some of the more tedious and repetitive workflows. The technology is primarily used to expedite routine workflows such as information gathering, analysis, and reporting.

To get an insider view of NLP in banking compliance, we connected with Head of AI and machine learning at Rabobank Jan Veldsink.

Rabobank is a multinational banking and financial services company based in the Netherlands. The bank has a presence in 38 countries and holds over EUR 600 billion in assets. The bank reported a net profit of EUR 3.7 billion in 2021. Rabobank reports 4,900 employees.

In a wide-ranging 15-minute conversation between Jan and Emerj CEO Daniel Faggella, the two AI veterans focus pull apart two distinct topics:

Simple, near-term use cases of NLP in compliance
Where NLP can add more value to banking in the future

Listen to the full episode below, skim our interview takeaways, or read the full transcript below.

Guest: Jan Veldsink, Head of AI for Compliance, Rabobank

Expertise: Applied AI, machine learning R&D, AI compliance, Cybersecurity

Brief Recognition: Jan spent the last two-plus decades in multiple high-level AI roles, including the past twelve years in his present role as Head of AI at Rabobank. Jan also teaches data security and cybercrime at Nyenrode Business University’s EMBA program, where he has taught business classes for the last 22 years.

Key Insights

Start with the documents: When starting with NLP, look at your document archive. NLP technology can find much valuable information in these documents, including potentially actionable customer data.
Use simple NLP compliance tasks to add value: Using more straightforward (and readily outsourced) NLP techniques such as text mining and topic extraction can help discover potentially-harmful trends. For example, new phishing methods and other harmful email content.
NLP can help find both questions and answers: When starting out, you often will not know what initial question(s) you want to ask of that data to solve a particular problem. NLP can help you extract data and reframe the question in a way recognizable by the language contained within a set of documents.

Interview Transcript

Dan Faggella: So Jan, we’ll start by asking just where you see NLP having promise, having value in banking today? Obviously, it’s just one branch of AI, but where do you see NLP resonating with business problems?

Jan Veldsink: That’s a good question,. Dan. It’s a broad sense. Of course, we start it all off with data. What we have come to conclude is that we have a huge [opportunity with documents]. For example, at Rabobank, we have about 500 million digitally-scanned documents lying around. In that vast amount of documents lies what I think is gold for the bank and compliance.

Dan Faggella: Yeah.

Jan Veldsink: If you look at the 500 million documents, you can let a Google search algorithm or box run through it and find your terms – but I’m not interested in terms as a compliance man. We’re interested in “Okay, what do those documents tell me? What are they about?”

Yeah, of course, we put them into a document management system, and somebody metadata-ed those documents, okay, wrong.

Somebody did that, but it didn’t know my question. I think the image, or the projection in the sense of knowing your customer due diligence, is all hidden in those texts. Beforehand, we don’t know what question we want to ask. So the data needs to tell me how to interpret my question and how to find things in that vast amount of documents that can help me get a clearer picture of my customer, customer groups, or whatever.

So I think, in that sense, that’s a very agnostic way of looking at language because we are going to be all used to the sentiment analysis and the “bag of words” approaches. They are limited because they only give me what I put in a bag of words or my sentiments, but I want to take it a step further and say, “Okay, these are the documents. These are my customers. Do they behave in a way we expect them? And how do they text the NLP and the raw texts? How do we contribute to the image of the customer?”

Dan Faggella: Yeah, that’s, I mean, clearly, as you had said, that’s going to add more depth, more value than saying, “Oh, are these emails angry? Or are they friendly?” You know, there’s more than this, the singular sorts of things of that kind. Clearly, it poses more challenges to get that kind of value out of documents.

Like you’ve said, it would seem as though we really need to structure the questions we’re asking. We really need to know what we’re looking for. What does it look like to get that level of insight that’s so much deeper than the surface layer NLP approaches?

Jan Veldsink: Yeah, that’s also a good question because we do not have that now,. A and so that’s something we were headed towards. We have some students from the University working for us on this question.

And that’s not an easy cookie, and it’s all about context. Because, as you know, AI and machine learning now is very bad in changing context. We know we put up an AI or machine learning in a context, and in that context, it will perform happily, and we can do that. The same with recommendation engine stuff like the old robots.

They all function within a certain context, and they perform well. And for text is the same, can I give such an algorithm a context? And that’s from within this context, and that could be a regulation or something like that. Something happens in the outside world, and we think, “Okay, how many customers have information on their documents that can relate to this topic?”

So there’s a new money laundering scheme, – Russian laundering scheme, Panama papers, or whatever. Okay, give my engine, my text engine, these contexts. And can it find, based upon that context, related items and related issues in the texts we have on customers?

Dan Faggella: So clearly, again, part of this is where you want to be heading. If you could summarize or nutshell, [where] today –, and maybe a lot of it is the light-level AI stuff NLP stuff –, where there is still some value to be gleaned.

You could say today, maybe just in your domain, the compliance domain, where NLP is being used, like a couple of little snippet examples. It could be your bank, or it could be a general insight about where it’s being used. Where would you say kind of layman’s terms?

Jan Veldsink: Okay, well, what we do now with text analysis, we do, let’s say, for emails, we get a bunch of phishing emails from customers. Is this a phishing email versus Rabobank mail? No, no, it wasn’t. And we do topic extraction.

Okay, where are these? Are the general topics of these emails? And that’s again agnostic because we do not use a bag of words as your set. Okay. Tell me the general topic of these emails. That helps us in finding out. Is there a new trend in phishing going on?

Dan Faggella: That’s interesting. Okay. So, more emails might say, “Reset your password,” or phishing like, “There’s been this big legislative thing,” and you need to pick up on that.

Jan Veldsink: Yes, yes. Our assumption is that certain groups do phishing emails. And we now can see, okay, we had just launched a new product. That’s one. Of course, we have legislation, [so] we need to screen all transactions that go abroad, towards terrorism, financing, U.S. sanctions, etcetera.

So we screen all the transactions and names of customers on those lists. That’s also a form of natural language processing because we see, okay, in this text is the name of a person that is in this text. That’s also something we do that’s very compliance oriented. And that goes for descriptions or remarks in payments, as well as names, places, harbors, vessels, or whatever.

Dan Faggella: And that’s just kind of entity-oriented work there.

Jan Veldsink: Yes, entity extraction. A matching of names. It looks simple. Okay, as a human, I can see my name. If it’s spelled wrong, you can see how it might be wrong, but the machine just says, “I don’t know.” It’s spelled wrong.

Dan Faggella: It needs to be able to match entities. You know, sometimes it’s listed as IBM, sometimes it’s IBM Corp. Sometimes they might spell out the whole old name, International Business Machines or something. We would need a system that maybe can take into account, you know, different types of spellings.

Jan Veldsink: Yeah, that’s the current state of affairs. That’s what we do. I wrote an algorithm that just agnostically looks at two texts and says, okay, they might be related. And the words in those texts might be in different orders, and the mighty spelling errors in there, but still, I think those texts are related. So that’s the thing we do now.

Dan Faggella: Got it. So just to dive into both of those quick little examples and we’ll talk a bit about the future and what you’re excited to have NLP be able to do moving forward, maybe a couple of like, you know, visions of where you’d be excited to see things go.

But just to touch on both of these [use cases] on the entity side, even if it’s very straight and narrow NLP, but there’s still business value to it. Can we scan, whether it’s our documents, the news, some kind of Reuters ticker, or whatever the case may be, for known entities?

Jan Veldsink: Yeah, we do that.

Dan Faggella: But bankers might ask themselves, “Is there value we would gain by scanning those documents to be able to find information about entities that maybe we do business with, or people that maybe we do business with?”

Can we have some listen-to notifier of that so we can skim through those and look for compliance issues or things we should know about those customers? So there’s NLP low-hanging fruit there.

Jan Veldsink: Yes.

Dan Faggella: Certainly. The other [use case] is that we cluster the kinds of terms and topics and summarize those from individual messages. We know this could be done at a document level, but you were talking about it being done at a phishing email level, where there’s some way of sorting through those.

Now, do those have to be labeled in some way? In other words, does a human being have to look at those phishing emails – let’s say 10, 20, or 200 – emails so that the machine knows to put it in that folder moving forward?

Jan Veldsink: No, no. We just made [the algorithm] agnostic. The topic extraction algorithm is just agnostic. Again, an agnostic, unsupervised learning method that just said, “Okay, give me statistically what’s in those emails and tell me the most important topics.”

I researched Google YouTube videos and transcripts of that in relation to an article written for the correspondent for an online magazine in the Netherlands. We just did topic extraction on four gigs of transcribed YouTube videos to find some topics.

We didn’t give them the topics we wanted or what we were looking for, but [the model] gave [feedback on] certain regions of topics, so we did some clustering on those topics.

Dan Faggella: I was going to say because you obviously wouldn’t want a human to have to look at endless blocks of different topics, they would, you could distill the topics, but then there would have to be a bucketing, and you’re saying both of those processes are NLP-able. Maybe a little bit of guidance along the way. But that NLP can sort of do both of those.

Jan Veldsink: Yep.

Dan Faggella: Cool. So maybe we can wrap up with a little quick idea. Maybe there’s a way to nutshell what you see as far as where NLP could go, the aspects about NLP that you’d be excited to see moving forward, or that you think maybe we’ll get to.

Jan Veldsink: Okay, I think of conversational agents. I always refer to what I do as a human. I read books, I read newspapers, I read online magazines, etcetera. And it all comes back to my mind somewhere.

And I envision that there will be conversational agents that take in all that information, and create in the background, such a context for communication with customers, with regulators, or with employees in a bank that is richer than just the “yes, no” question or the simple chatbot functionality.

We have now that you have to give it the context and let the documents in your organization create the context for the communication and the interaction you have with employees and customers.

Dan Faggella: So would you suspect that, in the coming two to three years, we’ll see a notable uptick in that? Or do you think it might be even longer until we can see the needle move, so to speak? Because I know there are so many challenges to make that happen.

Jan Veldsink: Yeah, there’s a lot of judgment with how this will go forward. And it all starts with having the idea that this could be useful for your business. And then I think there is a bunch of technologies already available that could assist and help you in this. But we’re not there yet.

Dan Faggella: Yeah, fingers crossed. A lot of bankers are excited to get there eventually. Cool. Well, that’s topic two, Jan. I appreciate you being able to be with us here for a second interview on AI and banking. So thank you so much for being with us.

Jan Veldsink: Thank you, Dan.

“Our Impact in 2021.” Rabobank, Rabobank Communications & Corporate Affairs, https://media.rabobank.com/m/43dd3d744e53220d/original/ESG-Facts-Figures-2021-EN.pdf.

Recommended from Emerj

Breaking Down AI’s Role in Genomics and Polygenic Risk Prediction – with Dan Elton of the National Human Genome Research Institute

While protein sequencing efforts have amassed hundreds of millions of protein variants, experimentally determined structures remain exceedingly rare, lagging far behind the number of unresolved structures. The 2024 UniProt knowledgebase catalogs approximately 246 million unique protein sequences, yet the Worldwide Protein Data Bank holds just over 227,000 experimentally determined three-dimensional structures — covering less than…

Ashwin Telang

•

September 1, 2025

Transforming Manufacturing with AI-Powered 3D Digital Twins and Remote Monitoring – with Rad Desiraju of Microsoft and Mike Geyer of NVIDIA

This interview analysis is sponsored by Microsoft and NVIDIA. It was written, edited, and published in alignment with our Emerj sponsored content guidelines. Learn more about our thought leadership and content creation services on our Emerj Media Services page. Manufacturers worldwide are under increasing pressure to enhance operational efficiency and agility in response to evolving…

Marilie Fouche

•

August 26, 2025

Global AI Regulations and Their Impact on Industry Leaders – with Michael Berger of Munich Re

There is significant regulatory uncertainty in global AI oversight, primarily because of the fragmented legal landscape across countries, which hinders effective governance of transnational AI systems. For instance, as noted in a 2024 Nature study, the lack of harmonized international law is complicating AI innovation, making it difficult for organizations to understand which standards apply in…

Riya Pahuja

•

August 25, 2025

Artificial Intelligence at ABB- Two Use Cases

ABB is a global technology leader specializing in electrification and automation, with a history spanning over 140 years and approximately 110,000 employees worldwide. Headquartered in Zurich, Switzerland, ABB operates in over 100 countries, supported by approximately 170 manufacturing sites worldwide. In 2024, the company reported revenues of $32.9 billion and an order intake of $33.7…

Riya Pahuja

•

August 18, 2025

Transforming Shutdown, Turnaround, and HSE Operations in Energy Spaces with AI – with Leaders from Oxy, NOV, and AltaML

This interview analysis is sponsored by AltaML and was written, edited, and published in alignment with our Emerj sponsored content guidelines. Learn more about our thought leadership and content creation services on our Emerj Media Services page. Shutdowns, turnarounds, and outages (STOs) are among the most resource-intensive and risk-laden operations in the energy sector. These…

Marilie Fouche

•

August 13, 2025

AI as Enterprise-Wide Enabler of Clinical Trial Innovation – with Leaders from Medable, Takeda, Sanofi, Novartis, and Daiichi Sankyo

This article is sponsored by Medable and was written, edited, and published in alignment with our Emerj sponsored content guidelines. Learn more about our thought leadership and content creation services on our Emerj Media Services page. Clinical trials are becoming increasingly complex as pharmaceutical companies pursue more personalized therapies, navigate tighter timelines, and expand access…

Marilie Fouche

•

August 13, 2025

Laying the Groundwork for Enterprise AI in Banking and Finance – with Leaders from EPAM and Edward Jones

This interview analysis is sponsored by EPAM and was written, edited, and published in alignment with our Emerj sponsored content guidelines. Learn more about our thought leadership and content creation services on our Emerj Media Services page. While AI stands poised to transform even legacy financial institutions, many organizations across BFSI spaces struggle to get…

Riya Pahuja

•

August 11, 2025

AI in Biopharma Innovation and Regulatory Challenges – with Nishtha Jain of Takeda Pharmaceuticals

As life sciences organizations race to adopt AI, the biopharmaceutical sector remains one of the most complex and high‑stakes environments for implementation. The median cost to develop a new drug is $708 million, according to the RAND corporation, rising to an average of $1.3 billion when accounting for failures and capital costs. According to the…

Emily Smith

•

August 11, 2025

Artificial Intelligence at Centene

Centene Corporation is a leading healthcare enterprise that is committed to helping people live healthier lives through government-sponsored and commercial healthcare programs. The company serves as a managed care organization providing a comprehensive range of healthcare services, primarily through Medicaid, Medicare, and Health Insurance Marketplace contracts. In 2024, Centene reported an annual revenue of $163.1…

Ashwin Telang

•

August 4, 2025

Shaping the Future of Healthcare with AI – with Lyndi Wu of NVIDIA and Will Guyman of Microsoft

This interview analysis is sponsored by Microsoft and NVIDIA and was written, edited, and published in alignment with our Emerj sponsored content guidelines. Learn more about our thought leadership and content creation services on our Emerj Media Services page. U.S. hospitals are facing an unprecedented digital infrastructure crunch. According to the U.S. Department of Health…

Riya Pahuja

•

July 30, 2025

AGI Governance: Insights From Asanga Abeyagoonasekera and the Millennium Project

As global conflict and economic instability dominate headlines, a quieter but no less urgent challenge is gaining traction among international institutions: the governance of Artificial General Intelligence (AGI). Over the past year, AGI has transitioned from an abstract theory to a top priority for policymakers in both the United States and the United Nations. Across…

Matthew DeMello

•

July 28, 2025

Video Data in Retail for Security and Beyond – with Leaders from Solink and Amazon

This article is sponsored by Solink and was written, edited, and published in alignment with our Emerj sponsored content guidelines. Learn more about our thought leadership and content creation services on our Emerj Media Services page. Retailers are facing a surge in both organized retail crime (ORC) and internal theft, resulting in massive financial losses.…

Riya Pahuja

•

July 24, 2025

Search site

Search site

Simple Applications of NLP in Banking Compliance – with Jan Veldsink of Rabobank

Key Insights

Interview Transcript

Recommended from Emerj

Breaking Down AI’s Role in Genomics and Polygenic Risk Prediction – with Dan Elton of the National Human Genome Research Institute

Transforming Manufacturing with AI-Powered 3D Digital Twins and Remote Monitoring – with Rad Desiraju of Microsoft and Mike Geyer of NVIDIA

Global AI Regulations and Their Impact on Industry Leaders – with Michael Berger of Munich Re

Artificial Intelligence at ABB- Two Use Cases

Transforming Shutdown, Turnaround, and HSE Operations in Energy Spaces with AI – with Leaders from Oxy, NOV, and AltaML

AI as Enterprise-Wide Enabler of Clinical Trial Innovation – with Leaders from Medable, Takeda, Sanofi, Novartis, and Daiichi Sankyo

Laying the Groundwork for Enterprise AI in Banking and Finance – with Leaders from EPAM and Edward Jones

AI in Biopharma Innovation and Regulatory Challenges – with Nishtha Jain of Takeda Pharmaceuticals

Artificial Intelligence at Centene

Shaping the Future of Healthcare with AI – with Lyndi Wu of NVIDIA and Will Guyman of Microsoft

AGI Governance: Insights From Asanga Abeyagoonasekera and the Millennium Project

Video Data in Retail for Security and Beyond – with Leaders from Solink and Amazon

Customize Your Experience

Simple Applications of NLP in Banking Compliance – with Jan Veldsink of Rabobank

Key Insights

Interview Transcript

Share article

Subscribe to updates

Recommended from Emerj

Breaking Down AI’s Role in Genomics and Polygenic Risk Prediction – with Dan Elton of the National Human Genome Research Institute

Transforming Manufacturing with AI-Powered 3D Digital Twins and Remote Monitoring – with Rad Desiraju of Microsoft and Mike Geyer of NVIDIA

Global AI Regulations and Their Impact on Industry Leaders – with Michael Berger of Munich Re

Artificial Intelligence at ABB- Two Use Cases

Transforming Shutdown, Turnaround, and HSE Operations in Energy Spaces with AI – with Leaders from Oxy, NOV, and AltaML

AI as Enterprise-Wide Enabler of Clinical Trial Innovation – with Leaders from Medable, Takeda, Sanofi, Novartis, and Daiichi Sankyo

Laying the Groundwork for Enterprise AI in Banking and Finance – with Leaders from EPAM and Edward Jones

AI in Biopharma Innovation and Regulatory Challenges – with Nishtha Jain of Takeda Pharmaceuticals

Artificial Intelligence at Centene

Shaping the Future of Healthcare with AI – with Lyndi Wu of NVIDIA and Will Guyman of Microsoft

AGI Governance: Insights From Asanga Abeyagoonasekera and the Millennium Project

Video Data in Retail for Security and Beyond – with Leaders from Solink and Amazon

This Content is Exclusive to Emerj Plus Members

In-Depth Analysis

Exclusive AI Capabilities Matrix

Exclusive AI White Paper Library

Best Practices and executive guides

Register

Customize Your Experience