In the last year, interest in so-called “autoML” has risen greatly in part due to its promise of bringing artificial intelligence to businesses that have been blocked from accessing it due to its serious time, talent, and budget requirements. Although machine learning may still be widely unavailable to small businesses, medium-sized businesses may find that autoML allows them to make use of it in the coming years.
We spoke with Yiwen Huang, CEO at R2.ai, a company that offers cloud-based machine learning services, including autoML. Huang speaks to us about how autoML applications could bring AI to a broader base of companies. In our interview with him, Huang said autoML could “make…machine learning model development and operation easier, quicker, better, and more affordable.”
The company calls their brand of autoML “BizML,” and they claim it’s different from other autoML services because domain experts can understand it. This is in contrast to autoML services that are built more for data scientists that already have a firm grasp of machine learning.
Listen to the full interview below and read on for an in-depth overview of autoML:
Subscribe to our AI in Industry Podcast with your favorite podcast service:
In this article, we talk about what autoML is and how it works, describe how it could make AI more accessible to companies with less resources than the largest enterprises in their sectors, and discuss how businesses will need to prepare in order to use it effectively.
We start with a brief overview of how machine learning models are built traditionally, without the help of autoML:
How Machine Learning Models are Built (Traditionally)
Traditionally, data scientists and subject-matter experts would need to spend a considerable number of hours every week doing what’s called feature engineering, which we detail in depth in our article Feature Engineering for Applying AI in Business – An Executive Guide.
In a nutshell, feature engineering involves figuring out the data relation between various variables, or “features,” that go into making a certain decision in terms of which algorithm(s) to use to build machine learning models.
For example, an eCommerce company may want to build a machine learning model for predicting the best product pricing to send to a certain segment of people. Subject-matter experts, in this case email marketers, would relay to data scientists the information they’ve used in the past for determining product prices for this segment and provide some context about the open rate data that’s available in the CRM software.
The email marketers would also provide the data scientists their thoughts on how to weight certain data before the data scientists feed it into the algorithm.
This process can take months and continues after data scientists build and test the initial machine learning algorithm. The eCommerce company might find that the product pricing the model suggests isn’t garnering the results they expected them to, in which case the data scientists and email marketers would need to reconvene to tweak the algorithm.
In addition, building machine learning models in-house requires powerful computers. Most businesses simply lack the compute necessary to run machine learning models. Computer hardware companies like NVIDIA are building processors specifically designed to handle machine learning, and established companies are unlikely to invest in upgrading their computers with these processors.
This leaves them with limited options when it comes to making using of artificial intelligence and machine learning.
AutoML could in part help businesses overcome these challenges.
How AutoML Generates Machine Learning Models
Once a user company purchases a subscription or license for autoML software from a vendor like R2.ai, they could log into the software to start building machine learning models in the cloud. Building models in the cloud could allow companies without the necessary compute to use the compute of a company that does.
Additionally, autoML applications are built to automate the process of building machine learning models, at least to some degree. For example, R2.ai’s BizML software purportedly automates parts of the process including data quality check, data preprocessing, feature processing, algorithm selection, parameter tuning, model recommendation, and model monitoring. We could not verify if this was the case or not ourselves, however.
AutoML software can vary in the processes that they automate, with some automating more than others. Users with data science and machine learning expertise can often overwrite the decisions that autoML systems make in order to calibrate models based on domain expertise.
Below is a brief overview of how most autoML applications work to generate machine learning models that are useable in business. The first three steps involve human input:
1. Set the Goal
The user provides the application with the goal they want the resulting machine learning model to achieve. This could be scoring leads or predicting the optimal payout on an insurance claim, for example.
2. Upload the Data
The user uploads a structured, labeled dataset into the autoML application. This step requires people at the company to collect and consolidate the data they need for the machine learning model to achieve the goal the company wants it to achieve. This is one part of the feature engineering process that autoML is unlikely to help with, as we discuss later in the article.
3. Set the Confidence Interval
The user provides a confidence interval or margin of error that they want the resulting machine learning model to work within. For example, the user could set the confidence interval to 90%, indicating that the resulting model must offer a prediction or suggestion only when it scores its prediction or suggestion as 90% accurate.
This is much less important when the algorithm is predicting the best product pricing than it is when it’s searching for tumors in a medial scan, for example.
The autoML application runs through the last three steps and are where it delivers its value:
4. The AutoML Generates and Tests Machine Learning Models
The autoML application selects various best-suited machine learning algorithms based on feature processing of the dataset to build multiple models simultaneously and then tests the performance of each of those models on part of the uploaded data (i.e. validation dataset).
5. The AutoML Recommends a Working Machine Learning Model
The algorithms converge and the autoML application then presents the user with various machine learning models and their performance metrics and makes a recommendation on a model that they could use to achieve the goal in the real world. The model recommendation is based on business requirements, and the software provides cost-benefit based model selection.
Users can also make their own choice based on their business requirements. For example, they could choose a model that predicts one scenario most accurately rather than the one does better in predicting all scenarios
6. The AutoML Continues to Monitor Model Performance
Once a model is selected and deployed into the business environment, some autoML applications can purportedly keep an eye on the model performance and alert users when it is not rendering desirable results. In such a scenario, users could then start to re-train the models with a new data set or combined bigger data set.
According to the company, R2.ai’s BizML application is “self-learning” and improves the way it builds models over time. The company outlines a simplified version of the autoML process in the graphic below:
How AutoML Could Make AI Accessible to Businesses
The real value of an autoML application is the time and money saved in doing steps 4,5, and 6 itself. Huang believes that the end-goal of machine learning is to solve business problems, and in doing so, an ML solution should be so simple and fast that even non-technical people at a company can make sense of it, not just a data scientist. He reiterates this when he discusses how autoML could in part reduce the need for data scientists to test and tweak models:
The user needs to be more savvy on the business side than the data science side. We’re removing one…roadblock, which is the data science skill portion. We have a lot of savvy business analysts [as clients] who understand the business portion pretty well but lack the machine learning capabilities
In doing so, autoML could open up opportunities for businesses with less time and resources to create machine learning models that were until recently inaccessible to them. Huang puts it this way:
If you provide the right dataset, the system will basically walk you through the entire model-building process. You don’t need to have very detailed, sophisticated knowledge of machine learning. You provide the data, you define the problem, and the system will basically go through the entire model-building lifecycle, starting from checking the data quality automatically and fixing any issues automatically, do[ing] the feature engineering…select[ing] the best [machine] learning algorithm that matches your data, fine tun[ing] the model, and then recommend[ing] the model results to you. That’s the entire process it walks through automatically.
The goal of autoML applications is to allow users to create machine learning models without or reduce the need for in-house data scientists or machine learning engineers. We dedicate many of our articles to the challenges that business leaders face when adopting AI, and one of those is the lengthy, expensive process that comes along with achieving a machine learning algorithm that actually helps solve a business problem.
Traditionally, a company has the option of either hiring in-house data science talent to build machine learning models from the ground up or working with an AI vendor to implement that vendor’s solution into an existing workflow at the company.
Either way, the company should expect to spend months and tens (even hundreds) of thousands of dollars working toward a machine learning model that’s ready for actual use. This greatly limits the number of companies that can access AI.
However, Huang says in our interview with him, “With the autoML support, a model can be put out in less than three weeks. A majority of the time in those three weeks is spent collecting the right datasets.”
Theoretically, autoML could reduce the time and money companies spend on launching an AI product or AI-enabled service.
Preparing for AutoML and the Challenges of AI Initiatives
But while this could allow AI to become accessible to companies that are simply lacking the AI talent or the budget necessary for embarking on a three-month machine learning initiative, autoML is far from allowing companies to escape all of the challenges that come with adopting AI. Huang describes his company’s clientele:
We are mainly going after customers who are already past the data consolidation stage and are now trying to turn their massive data set into value. For those customers, the main barriers we saw are:
- Lack of talent – It’s very hard to acquire good quality machine learning talent. It’s very expensive.
- Time to value – it takes a very long time to develop machine learning models.
- Consistent quality of those models – As a [company] with limited resources and limited time…there’s no guarantee that they will be able to fully leverage the insights in the data. The model they develop may not be the perfect one.
As such, autoML may prove useful for companies that simply need a tool for finding patterns and insights amongst their large volumes of data. Unfortunately, the vast majority of companies are in fact in the data collection or consolidation phases. What this means is that for whatever problem they want to solve with AI, they either:
- Haven’t been collecting the data they should have in order to solve that problem
- Have been collecting the right data in inconsistent formats
Both cases still might require a company hire data scientists to either work with subject-matter experts to figure out how to collect the data they would need in order to train a machine learning model or to clean the data the company already has to make it machine-readable. In other words, companies need to make a lot of preparations before they can use an autoML application.
Huang reiterates this point:
There is a boundary in terms of what the autoML is capable of doing today. The assumptions it makes is the data that’s fed to the system are…not garbage. The second thing is there are certain business-logic related stuff the autoML may not figure out. In terms of some of the feature engineering…That’s the part that will probably require certain human intervention.
What this means is that data scientists and subject-matter experts at the company will likely still need to do some feature engineering as part of the preparatory work.
In addition to figuring out what kinds of data to collect and how to clean it so a machine can read it, they’ll need to discuss which data points to feed to the algorithm, how to weight them, and which to keep from being fed into the algorithm altogether.
All of the context that goes into how a person might make a decision needs to reflect in the data that’s fed to a machine learning model if it’s intended to make that decision itself. That kind of brainstorming requires the collaboration between data scientists that understand the subject matter to some degree and subject-matter experts that understand the language of data science.
We discuss this further in our executive guide, Applying AI in Business – the Critical Role of Subject-matter Experts.
This is critical because Huang says “Once the data goes into the system, [the autoML application] assume[s] that the business knowledge is already in that dataset.” A company that fails to prepare and think through the data it’s going to feed into a machine learning algorithm, whether built in-house or generated by an autoML application, risks creating an algorithm that doesn’t accurately predict, suggest, categorize, or provide some other output.
AI and machine learning are truly “garbage in, garbage out” systems, and businesses that find that autoML can work for them should be mindful of this. That said, as the technology progresses, autoML does show some promise with regards to making it easier for data scientists to simplify tedious and complex end-to-end modeling processes, such as feature engineering, after they’ve collected and contextualized the data they plan on feeding into the algorithm.
It could also help non-technical people at a company build and tweak machine learning models.
In other words, the latter half of the process for building machine learning models may take much less time and cost much less money with the help of autoML, which bodes well for businesses that lack the resources of their largest competitors.
Subscribe to our AI in Industry Podcast with your favorite podcast service:
This article was sponsored by R2.ai, a cloud-based AutoML solution provider, and was written, edited and published in alignment with our transparent Emerj sponsored content guidelines. Learn more about reaching our AI-focused executive audience on our Emerj advertising page.
Header Image Credit: University of Lisbon