AI Transparency in Finance – Understanding the Black Box

Daniel Faggella

Daniel Faggella is Head of Research at Emerj. Called upon by the United Nations, World Bank, INTERPOL, and leading enterprises, Daniel is a globally sought-after expert on the competitive strategy implications of AI for business and government leaders.

AI Transparency in Finance - Understanding the Black Box

The financial sector was one of the first to start experimenting with machine learning applications for a variety of use-cases. In 2019, banks and other lenders are looking to machine learning as a way to win market share and stay competitive in a changing landscape, one in which people are no longer exclusively going to banks to handle all of their banking needs.

While lenders may be excited to take on machine learning projects at their businesses, many aren’t fully aware of the challenges that come with adopting machine learning in finance. Lenders face several unique difficulties when it comes to implementing machine learning, particularly with regards to machine learning-based credit models.

In order to better understand these domain-specific challenges, we spoke with Jay Budzik, CTO at Zest AI, about transparency in machine learning as applied to the financial sector and how lenders may be able to overcome what is often referred to as the “black box” problem of machine learning.

In this article, we describe the black box of machine learning in finance and explain how a lack of transparency may cause problems for lenders and consumers that interact with machine learning-based credit models. These problems include:

  • Ineffective Model Development And Validation
  • Inability to Explain Why a Credit Applicant Was Rejected
  • Algorithmic Bias (Race, Gender, Age)
  • Inability to Monitor Models In Production

Later in the article, we discuss Budzik’s critique of one popular technique for explainable machine learning. Finally, we finish the article by exploring what machine learning-based credit models mean for both lenders and credit applicants.

The Black Box of Machine Learning in Finance

Although ML-based credit models are more predictive than traditional models (such as logistic regression and simple FICO-based scorecards) in many cases, they often lack one of the most critical advantages of these traditional models: explainability. 

Machine learning has what’s called a “black box” problem, meaning it’s often extremely difficult to figure out how an ML-based credit model came to the score it did for a particular credit applicant. 

This is due to the nature of ML algorithms such as tree-based models or neural networks. In neural networks, each node in the network fires in response to patterns that it’s “seen” before, but it doesn’t actually understand what’s being presented to it. In other words, one can’t probe a node to figure out what concept or idea made it fire without complex diagnostics that are far from ubiquitous in the world of machine learning.

In less regulated industries, the black box problem is less of an issue. For example, if a recommendation engine for an online store mistakenly recommends a product to a customer that they aren’t interested in, the worst that happens is the customer doesn’t buy that product. In this case, it ultimately doesn’t matter how the model came to recommend the customer that product.

In time, the model will simply get better at recommending products, and the online store’s operations go on as usual. But in the financial sector, a machine learning models’ inaccurate output can have serious consequences for the business using the model. 

Inability to Explain Why a Credit Applicant Was Rejected

In credit underwriting, lenders must be able to explain to credit applicants why they were rejected. Traditional linear models make this relatively simple because one can easily interpret how the model and the underwriter came to the borrower decision, and they factor in significantly fewer variables than a machine learning model could. 

There may be hundreds of variables involved in a machine learning-based credit model, and the interactions among each of those variables can themselves be variables.

Without rigorous explainability, lenders aren’t able to provide applicants with an adverse action notice that details why they were rejected, information they can use to improve their credit profiles and successfully obtain credit in the future. We further discuss the effect this can have on consumers later in the article when we discuss the SHAP technique for explainability in machine learning models.

Unexplainable machine learning-based credit models can have serious legal consequences for lenders that use them. The Fair Credit Reporting Act of 1970 requires lenders to be able to explain the models they use to approve and deny credit applicants.

Failure to do so accurately can result in large fines and/or a suspension of banking license. Noncompliance is perhaps the most vital reason that lenders have been cautious to adopt machine learning for credit scoring and underwriting.

Algorithmic Bias (Race, Gender, Age)

Another critical concern for lenders looking to adopt machine learning-based credit models is algorithmic bias. Although one may assume an algorithm is objective and neutral to the social context in which it’s created, those that develop the algorithm bring with them their own assumptions about the society to which they belong. These assumptions can inadvertently influence the way a machine learning engineer develops an algorithm, and, as a result, the outputs of that algorithm can be unintentionally biased. 

In lending models, bias is more often be introduced when determining the data or the weightings that the algorithm will use to make its decision about whether to approve an applicant or not. Lenders know which data points they need to avoid using when making decisions with traditional credit models be they manual scorecards or linear regression models, including:

  • The applicant’s race or ethnicity
  • The applicant’s gender or sex
  • The applicant’s age

There will undoubtedly be other data points that are barred from use in credit modeling as more regulations are passed in the coming years. Other data points can serve as clear proxies for race such as property values, 

One might think it’s easy enough to simply ensure these specific data points aren’t fed to an ML algorithm. But the black box problem makes it difficult to know if discriminatory credit signals aren’t inadvertently being factored into a machine learning-based credit model as the combination of other seemingly unrelated data points. According to Budzik:

The location of the applicant can be a pretty good predictor of their race and ethnicity, and so you have to be pretty careful when you’re considering attributes like the location of someone. We’ve seen even at the state level when you associate the state in which a person lives with other attributes like the mileage on the car, you can end up with pretty perfect predictors of their race. So it’s important to be able to do a full inspection of the model to make sure it’s not doing the wrong thing from a discrimination perspective, that it isn’t making decisions that are biased and unfair.

The consequences of a biased algorithm can vary, but in credit modeling they can be harmful to those who are already at a disadvantage when it comes to acquiring credit. People of color are already much more likely to be denied mortgages than whites using traditional credit scores, in part because their scores don’t take into account recurring payments such as rent, phone bills, and utility bills.

As a result, people of color may have thin credit histories but be just as likely to pay off a loan as white borrowers with extensive credit histories. This is the state of the current credit system; machine learning could exacerbate the problem if lenders and regulators are unable to look into an algorithm and figure out how it’s coming to its conclusions.

The political sphere is starting to concern itself with algorithmic bias as well. In a letter to several government agencies, including the Consumer Financial Protection Bureau, presidential candidate Elizabeth Warren and Senator Doug Jones of Alabama asked what these agencies were doing to combat algorithmic bias in automated credit modeling, calling out FinTech companies specifically for rarely including a human in the loop when underwriting a loan.

In addition, Zest AI’s CEO, Douglas Merrill, spoke before the AI Task Force of the House Committee on Financial Services on algorithmic bias and the company’s purported ability to develop transparent underwriting algorithms.

SHAP and Its Limitations in Credit Underwriting

SHapley Additive exPlantions (SHAP) is one method of bringing explainability to machine learning that’s gained some popularity since its inception in 2017. The method helps explain how a tree-based machine learning model comes to the decisions it does. It incorporates elements of game theory to do this, and it can purportedly do this quickly enough for use in business.

According to Zest AI, however, SHAP, when used alone, has some limitations in credit underwriting when it comes to explaining ML-based credit models. The greatest of these is in large part a domain-specific problem. 

Lenders may want their model to approve a percentage of the applicants that it sees. This percentage, this desired outcome, exists in what’s called “score space.” Meanwhile, the machine learning model’s actual output is not a percentage; it exists in what’s called the “margin space.”

SHAP is designed to explain machine learning model outputs that exist in the “margin space,” and Zest AI argues that this poses a challenge when using it to explain why an applicant was denied credit. In other words, lenders need the explanation to be in “score space,” but SHAP doesn’t allow this easily. As a result, according to Budzik in a post on the topic:

If you compute the set of weighted key factors in margin space, you’ll get a very different set of factors and weights than if you compute them in score space, which is where banks derive their top five explanations for rejecting a borrower. Even if you are using the same populations and are only looking at the transformed values, you will not get the same importance weights. Worse, you likely won’t end up with the same factors in the top five.

As a result, a lender might provide an adverse action notice to a consumer that inaccurately ranks the reasons the applicant was denied. This could lead to the applicant attempting to correct their credit profile in the wrong way, focusing on a reason for denial that was actually less critical to their denial than another reason that was inaccurately ranked less important.

What ML-based Credit Models Mean for Lenders

As of now, there are few examples of explainable machine learning as applied to credit underwriting. Zest AI does claim its ZAML software is one of them, working to explain multiple types of machine learning-based credit models.

That said, it’s likely going to be several years before lenders are comfortable fully automating their credit underwriting with machine learning, especially due to uncertainty around regulations. As evidenced by the recent interest in discussing machine learning in finance, the government is likely to make several decisions regarding the use of ML-based credit models within the next few years. 

That doesn’t, however, necessarily mean that lenders should wait to figure out how exactly they would implement a machine learning-based credit model when the time comes. They should be paying attention to the black box problem and how FinTechs, banks, and regulators are responding to it.

The black box problem matters because overcoming it can unlock the real potential that machine learning has in finance, particularly credit underwriting. Explainable models could allow lenders to approve more loan applicants that would have otherwise been denied by more rigid credit model without increasing the risk they take on.

In terms of dollars, this means more revenue and less money lent out to borrowers that won’t be able to pay it back.

The near infinite data points involved in a machine learning model may in large part originate the black box problem, but they’re what will also allow lenders to create more nuanced profiles of a loan applicant. 

What ML-based Credit Models Mean for Borrowers

Machine learning-based credit models may allow lenders to approve them for loans when they wouldn’t have before. When explainable ML becomes ubiquitous in credit underwriting, ideally, lenders will be able to run more effective businesses and will allow more people to have fair access to credit, regardless of age, gender, or race.


This article was sponsored by Zest AI and was written, edited and published in alignment with our transparent Emerj sponsored content guidelines. Learn more about reaching our AI-focused executive audience on our Emerj advertising page.

Header Image Credit: