We interviewed Jay Budzik, CTO at ZestFinance, about the business value of machine learning for auto lending. We speak with Budzik about how underwriting, lending, and credit scoring is evolving as a result of advances in machine learning – both in terms of new data sources, and more advanced algorithms.
In addition, we talk about how companies might solve the “black box” problem of machine learning in finance, and how transparency and interpretability can be developed in machine learning models.
Subscribe to our AI in Industry Podcast with your favorite podcast service:
Expertise: AI in loans and lending
Brief Recognition: Budzik earned his PhD in Computer Science in from Northwestern University in 2003. He is also a Trustee at the Illinois Mathematics and Science Academy.
(03:30) What are the benefits of AI in auto lending?
Jay Budzik: That’s a great place to start. I think the auto lending space is really interesting because when folks go to get a car—this is going to be an important thing for them to be able to go to work, to be able to take care of their families. There’s a variety of ways they can get a loan to finance the purchase of that car.
One is from the manufacturer of the vehicle. Especially if the car is new, often vehicle manufacturers will have some programs that are available for folks. Even when the car is new, those programs might not be open to everybody. So if you maybe had an issue with your credit in the past, or don’t have a spotless record, you can get denied.
There are other lenders, banks and then non-bank lenders, that also provide financing for a car. And what’s interesting about this space, in general, is that there’s so much data available to use in the process of deciding whether or not a consumer should get that loan and then how to price the interest rate on the loan so that the lender is able to make a profit.
The data sources we had to look at are things like the make and model of the car, the mileage of the car, and then attributes of the consumer. Have they had bankruptcies in the past and how many? Were they recent?
So by making use of all of that data, both about the car itself and the loan that the customer is applying for, and their credit history, we can more accurately assess whether they’re going to be able to pay back. By doing so, it gives our lender customers just a real incredible competitive advantage because they can approve folks who might be overloaded by others.
You can identify those risks, your borrowers, more easily as well. So you can avoid the riskiest folks, but still accrue the ones that are going to pay you back. As it turns out, the folks that have the less credit history or that might have a blemish on their credit record, those are the folks for which having access to a car is really the most meaningful.
So they’re the ones who really need the financing to be able to get the car so that they can go to work and provide for their family. So we see this as actually providing some pretty significant access to the folks out there who might have been denied a means of transportation in the past.
(07:30) Is the transformation mostly because of new data sources, better algorithms, or a relatively equal contribution of both?
JB: It’s really a bit of both. So we work with auto lenders of all sizes, and in some of our customer engagements, we’ve actually been able to take a pretty close look at the models they have in place before they adopt machine learning. In many cases, we see 10 to 15 variables used in the model to make a decision about whether or not to lend to a person.
They’ll look at things like the percentage down on the car, the loan amount, the customer’s credit score, and then make their decisions based on rules that are built around those.
So what machine learning allows these lenders to do is to consider many, many more variables. So instead of this 10 or 15 variable model, we can help them consider hundreds of variables, even thousands of variables. By doing so, they get a much more accurate picture of that consumer.
But in the process, in order to be able to consider so many variables, they need new algorithms that are able to handle them. And machine learning offers a way through that problem to be able to consider all those variables but not make mistakes because you get tripped up by things like correlations of limitations of the math.
(10:00) How are algorithms being adjusted to be more granular?
JB: …Two attributes that actually pretty important to be careful with. One is age, and the other is geography. So there are laws in the United States that require that you not discriminate in lending based on age, gender, or ethnicity. The location of the applicant, where they live, can be a pretty good predictor of their race and ethnicity.
So you have to be pretty careful when you’re considering attributes like the location of something in some of the models that we built. Even at a state level, when you associate the state in which a person lives with other attributes like the mileage on the car, you can end up with pretty perfect predictors of their ethnic background and their race.
So it’s important to be able to do a full inspection on the model to make sure that it’s not doing the wrong thing from a discrimination perspective, that it isn’t making decisions that are biased and unfair. But that being said, you’re absolutely right. The more specific you can be about the application, the business, the type of car, the mileage on the car, the trim, all those things add a little bit of extra predictive power when you’re trying to consider whether someone’s going to default. You just have to be very careful with which attributes you ultimately choose.
(13:00) How do you like to frame what remolding credit models looks like? What’s really happening under the hood there?
JB: Credit score’s really transformed the industry. It used to be that to get a loan you had to talk to a banker behind the desk, and they made a judgment on you based on the application data that you provided, and their sense of whether or not you were going to be a good borrower. That judgemental process actually has been demonstrated to result in the kind of bias that we were just talking about. The bankers used to approve people that looked and acted like they did, and that created discrimination in the system.
So when we moved to having a credit score—that was a more objective combination of various numerical factors—we saw a pretty big decrease in that type of race-based discrimination and gender-based discrimination.
So that was a huge step forward. And then as automation came into play, many and many more of those lending decisions could be made automatically. So if you had a high enough credit score, your loan will get approved automatically by a machine. So that automation has been making its way through the industry for the past 20, 30 years, these instant approvals and the like.
With AI and machine learning, what you’re now able to do is have the same kind of objective, automated decisioning process. But instead of just considering a handful of variables, you can consider hundreds and thousands of variables and make a much, much more accurate decision.
As a result, instead of approving people that are just going to default and have a problem on their hands, create a mess by offering credit to folks who aren’t going to be able to pay, lenders can avoid that and prevent that from happening to consumers.
And they can approve the folks who are really more deserving that might not have been approved in the past because they don’t have all of the traditional data points that one might think of in a credit report. This is particularly true for young people these days who have taken out a debit card, don’t have credit cards, do most of their transacting using their checking account online, and have a very limited credit history. But it’s also the case for underrepresented segments of the population that might not have the means, or the set up, to be able to create that history for themselves.
So we see this as sort of the next big revolution in credit, if you have on the firsthand, the invention of an objective credit score, and then the ability to automate credit decisioning based on that score and rules. If those are the first two revolutions and credit, then the third is certainly AI and machine learning.
(16:30) What’s involved in a credit score versus what’s possible with AI today?
JB: Typically, if you think about the first generation scores, those are based on a dozen or two factors. Things like: Length of time working for your current employer, the number of bankruptcies, the number of open lending products that you have, your income. Those typical economic indicators were combined using a method called logistic regression, into an equation that produces this credit score. And that was quite a revolution. What’s possible today is to use techniques like neural networks, and a thing called a gradient boosted tree, and other advanced modeling methods to create a much more accurate and nuanced picture out of much, much more data.
So the models that we put into production for our customers tend to have hundreds or thousands of variables in them. We have one that has 2,200 variables that is running an auto lending business. That customer has enjoyed a doubling of their business as a result of using this model. And the way that we get folks comfortable using that much more data to make these decisions is by explaining to them exactly how the model uses that data, so they can be sure that it’s doing the right thing in all circumstances.
(19:00) What categories of new data are new?
JB: These are things like public records about the number of court cases that you have pending against you? Have you been in jail? Not what are you saying on social media or what websites are you visiting on your computer?
It’s the ability to consider much more finance and credit-related data points that you wouldn’t be able to use without the advanced math of machine learning. These aren’t signals that you would think are creepy or unwarranted.
It’s very interesting actually, in China we hear of folks using things like their social media interactions all the time, but here it’s much more boring.
(22:00) Is there a way that you like to explain the middle ground between hard rules and between ML?
JB: ML is really just a more advanced version of what people have been doing for years. Statistics invented this notion of predicting an outcome from data and machine learning has sort of taken that to the next level by producing even more accurate methods of predicting the future.
The way that we’ve done our work is to approach it from a perspective that we need to require ourselves to deliver the same level of transparency that’s been available, and that enabled the adoption of statistical modeling methods in the industry.
So we hold ourselves to a very high bar. We want to enable our customers to understand how each and every model-based decision is made, so that they can explain to consumers when they get denied or approved, why they’re getting denied. As is required by law, I might add. Also so that they can make sure that their models are doing the right thing.
If you’re going to adopt some fancy new technology, we kind of want to be sure that it’s going to do the right thing. And so that’s really been our main focus here at ZestFinance has been developing those methods, testing them, making sure that they’re right, doing all of the mathematics that’s required to prove to ourselves and to others that the answers that our tools produce are accurate, consistent, and trustworthy.
Subscribe to our AI in Industry Podcast with your favorite podcast service:
This article was sponsored by ZestFinance and was written, edited and published in alignment with our transparent Emerj sponsored content guidelines. Learn more about reaching our AI-focused executive audience on our Emerj advertising page.
Header image credit: Riviera Invest