Episode Summary: Close to a year ago, we had an interview here on the AI in Industry podcast with Jeremy Barnes of Element AI. We visited their headquarters in Montreal, and we’d interviewed Yoshua Bengio a couple years before that.
Jeremy had brought up one point in that interview that I really like and that transfers its way into this conversation, which is that businesses should think not just about being more efficient with artificial intelligence, but places where they can actually make a real difference in the bottom line for the company beyond shaving off some savings.
In this week’s episode, we focus on compliance and analyzing contracts. At first, one might think about such an application in terms of cost savings. We speak with Shiv Vaithyanathan, an IBM fellow and Chief Architect of Watson Compare & Comply, about the following:
- What’s possible with AI when it comes to analyzing contracts, and, most importantly
- Where is the business upside for AI as it relates to contract analysis. How can we analyze contracts not just in a way that saves money, but that allows us to optimize our deals for revenue, for the likelihood that they’ll go through? What’s that farther vision?
Subscribe to our AI in Industry Podcast with your favorite podcast service:
Expertise: Contract management and compliance across industries, Hadoop, machine learning
Brief Recognition: Vaithyanathan holds a PhD in Operations Research. He’s worked at IBM for 19 years as Chief Scientist of text analytics and Senior Manager of large scale analytics.
(03:00) What can we do with artificial intelligence that’s useful for contracts?
Shiv Vaithyanathan: So there are broadly two areas that I will talk about in terms of where AI technology is applicable to contracts. One, in the actual reading of the contract itself and classifying individual clauses, identifying individual obligations, exclusions, and disclaimers for the individual parties in a contract.
Today, a significant portion of what I am telling you of is being done by humans keying in the individual metadata from a contract, who are the parties involved: when is the effective date of the party, when is the end date of the contract, when is the end date of the contract? All of these individual pieces of information, they keep getting keyed in manually.
Over and above that, there is every individual sentence. It is referred to, for the most part, in legal terms, as an element. Each element needs to be identified as whether it is an obligation, a disclaimer, a definition, and…it has to be an identified for which party is it relevant to.
I’ll give you an approximate estimate. We sat down with a corporate lawyer who does this. We’ve been doing this for the last 15 years. About a 14 to 15 page contract. Took him more than an hour to be able to go through every sentence, and he can do it now in minutes today with AI software.
So, these applications are immediate in my opinion. This is not something that’s further out. The second part is the original contract typically comes to us, it often in scanned PDF documents.
The scanned PDF documents are such that you cannot even open them up in your browser and cut and paste from that if you want to put it into either an IEP software or some other software for digital processing.
So, just being able to take these images, OCR images, .tif images, and other pages, and pull out the digital text from that, and being able to put them into a digital format that can then do the processing that we need to do. Both of those things can be done today, and can be done actually pretty effectively.
Each and every one of these elements terms that are used, think of it as a legal informatics term. These elements that we identify then get classified into several categories, and the other side of it, the obligation, the disclaimer, et cetera, these are referred to as natures. We have to identify what the nature is, and we have to classify and identify which party it’s for.
These are relatively complex sentences, and because of the way in which legal language itself gets written, understanding those linguistic structures is one of the leaps that we have done in the world of natural language processing as a whole, to be able to go in and identify these things.
If I step back one second, while NLP is the broader area of AI, legal language processing, think of it as a slightly smaller subset of natural language processing. That legal language processing is slightly more structured and can be accomplished with very high accuracy in my opinion today.
(09:30) How do those have to be broken down? Is it extremely granular? Is it kind of high-level bucketable? Or does it vary drastically from one client to another? How do these elements function?
SV: If you take a contract, the high-level structure elements of a contract briefly fall into the following: there is sections, there is sub-sections, sub-sub-sections, and so on. That we understand. Within these, there are paragraphs, and within paragraphs there are sentences. Then there are two more structural elements in a contract that matter a lot.
One are bulleted lists, and typically, in bulleted lists we start by writing down the actual leading sentence, and each and every individual bullet after that leads on from that particular sentence. So the full sentence, in English, is formed together by putting together the leading sentence, and each bullet. So, such bulleted lists are structural constructs that are very important.
Finally, tables. Tables contain the significant meat of what exists in a contract. So, for example, if we talk about software entitlements, when Company “A” buys a software from Company “B,” it is entitled to certain things. What they’re entitlement is which software, which version, for how long is the entitlement valid for? All of those things go into tables and you need to extract them all from tables.
So, these are largely the structural constructs. Within the structural constructs, there can be variance depending upon exactly what type of a contract it is. Some are the level of individual sentences, and that’s what I refer to in elements, and some cases, the actual elements may be two or three sentences together, maybe forming a portion of a paragraph on one sub-bullet. So, there is some level of variance, but it is not crazy variance.
The real leap that we do with terms of the more than NLP and bringing together other technologies from AI, is being able to break those things down. Somewhat accurately from one contract to another.
(13:00) When you look at where legal is gonna be in the enterprise five years ahead, what are the AI capabilities? What are the abilities of someone searching documents, or exploring documents that you think will be unlocked?
SV: The next step, which is pretty close on the horizon is the notion of comparison. Basically, being able to take a template contract with the actual instance of a contract and be able to identify semantically, “Hey, where are the pieces of these two contracts different? Where are the pieces of these things that are identical? And where are the things that are deviant in language from what I would normally expect it to be?”
This is the kind of understanding and being able to recognize what is different across things that starts to have a significant impact on all the way from contract negotiation, which is where two parties go through back and forth, and oftentimes go a very long time before they’re able to converge, all the way up to minimizing exposure for a company.
The minimization of exposure is not just compliance with regulatory parties, but minimizing exposures even on contracts that we have massive contracts like SLA or minimization of fines that come about revenue recognition. Just about every area within the larger enterprise, all of which rely on contracts will absolutely get affected.
As of today, almost all of them rely on humans doing it. While humans will do it, there’s is tiredness, there is going to be inconsistency, there is going to be lack of correctness. There’s a myriad of reasons why when we start to understand there’s at this level, and be able to recognize the fact that things are changing, not only will fact-finding be useful, but minimizing risk. That’s the place where the real lava start to flow.
So, the sequence in the way in which if I had all the control, I would play out and this is following. Today, software is something I expect corporate lawyers to start utilizing immediately, now.
As we go further, there is going to be as larger operations in the company happen, whether they’re mergers and acquisitions, whether they are running other deals with somebody else. There are gazillions of contracts that have to be gone through and looked at.
I expect the software to evolve to a point wherein a group of corporate lawyers will just hand over a significant number of documents or contracts over to this piece of software. It will come back with the analysis, and the analysis will contain all the way from “here are these many obligations” to “here are all the deviations that we see. And here are the kind of things that we need to look at before we move the next step forward.”
I actually expect it to go a little bit further. This information that comes out from this larger batch of contracts that comes back can be fed into a model, and these models start to give you some level of even quantification of the list.
So, this sequence of things playing out to the point where even before contracts get written, such a simulation gets done, and it starts to give companies and particularly lawyers, and mergers, and acquisitions, other major decisions that get made at a company to start to get their feel for what is the risk they’re even getting into in such a deal.
Clearly, humans are going to be involved in doing due diligence and maybe output that comes, no question at all, but right now, a lot of it is based on a group of people sitting down, maybe going through a few documents, but imagine, basically increasing that by two orders of magnitude, and knowing that your level of correctness is going to be up by another 50, 80%. That going into a model because you have all data available. That can make a pretty significant impact.
Large companies, IBM being one of them have these big services contracts that they put together. These services contracts are not such that they can get comfortable with that every quarter things will get paid. Because there are some clauses that get violated, the customer comes back and says, “sorry, I will not pay for these clauses because these things did not get completed.”
Today, that kind of label data is available within these large enterprises. Imagine if that information could be extracted, fed into a model, and a model trained that actually tells you, “Okay, you know what? As you’re writing this particular contract, this is closer to a contract that you’ve written before that did not bring in the money it was supposed to. You may want to re-word things.”
(19:30) We’re painting the future for the audience, but, where do you see those opportunities where this is not just a time-saver, maybe this could transform into something that actually is driving value?
SV: So, broadly, there are two areas that’ll get affected. One is productivity. The second one is going to be to minimize the exposure. In the minimize exposure part, it is across the board. One of the healthcare companies that I recently spoke with basically said they have more than 20 years of business associates agreements. That tells them what data they can use, how they can use the data with cohort entities, what they can and cannot do with that data.
Unfortunately, there’s a lot hidden in that. They don’t touch anything, mostly because of the fact that they don’t know what they’re even allowed to do, and they just don’t have the people to go through 20 years. So, here is actual lost opportunity because nobody has taken the trouble to understand those 20 years of business-associated agreements.
Similarly, on the services side, there are contracts that are getting put together that usually just get copied from a previous template. The fact that those previous templates have created problems and have resulted in potentially hundreds of millions. Such outsourcing contracts can be in the billions of dollars. Potential losses of hundreds of millions of dollars have happened and have not been taken into consideration while they actually go ahead and write down the legal [contract]. I’ll give you a number approximately. In most of such large services that happen in large outsourcing contracts, 1/3 of the reason why revenue is not accrued is usually because the service contracts have not been written appropriately. Approximately 1/3. That’s a pretty significant number.
Subscribe to our AI in Industry Podcast with your favorite podcast service:
Header Image Credit: Inverse-Square