Maximizing a Small Data Science Team with AutoML

Raghav Bharadwaj

Raghav is serves as Analyst at Emerj, covering AI trends across major industry updates, and conducting qualitative and quantitative research. He previously worked for Frost & Sullivan and Infiniti Research.


The financial services sector has been one of the early adopters of data science and AI technologies. That said, financial firms that have engaged in AI projects will have realized that they require a deep understanding of data management and skilled data science professionals to solve these complex problems.

Additionally, AI projects are highly capital intensive, requiring financial firms to find, hire, and retain skilled data scientists, who are still a rare commodity today. What this means is that even in large corporations, data science teams are usually centralized for all business units and are often small. ranging from five to fifteen data scientists per team. 

The small data science teams at these financial institutions cater to the data needs of several business units, such as sales or marketing. Maximizing what these small teams can achieve is critical and almost necessary for financial firms due to the costs involved with AI. These small teams need to successfully satisfy the needs of individual business units while simultaneously creating new innovative data projects.   

Automated Machine Learning Platforms to Augment the Capabilities of Data Science Teams

The term automated machine learning or AutoML is a relatively new one, and with the AI space evolving rapidly, a universally agreed-upon definition might be hard to come across. Essentially, AutoML involves a set of data science tools and software that can help machine learning engineers and experts automate repetitive tasks in the creation of AI projects. It does this, In short, by applying machine learning to improve the building and development of machine learning projects.

Automated machine learning platforms help data scientists execute AI projects much faster and keep track of every step of the project in one place. AutoML can help reduce the time taken to develop AI models from scratch, in essence automating some of the steps. Each step in an AI project can last anywhere from a day to a month with the possible likelihood of needing to repeat several of these steps. This could drastically cut down on the efficiency of small data science teams. 

One way for financial firms to improve the efficiency of their data science teams is by using automated machine learning platforms. We spoke to Anwar Ghauche and Carlos Pazos from SparkCognition (listen to the full interview on the AI in Industry podcast) about how AutoML platforms such as their own Darwin can help maximize the efforts of small data science teams in financial services companies. 

According to Pazos, AutoML systems can help financial firms create and generate AI models, retrain these models to expand them to include new data, and prototype and test new models in a rapid and scalable way. 

The goal of these AutoML platforms is to help expand the capabilities of data science teams. Pazos told us:

The financial services industry has already made efforts towards building teams of data scientists. But the complexity of the applications in this sector and the vast volumes of data involved make any AI project challenging for small teams due to the maintenance busy work that also goes along with these projects.

This is because AI projects are not just about developing one AI model, but also maintaining the model and updating it when new datasets are introduced to the system. The maintenance of these models is continuous work that is critical for the success of the project. 

Most AI projects require the use of this type of prototyping of algorithms to fine-tune and maintain them in order to account for new situations and datasets. It also requires tedious data management, testing, and verification work.    

Anwar and Carlos also seemed to suggest that things don’t get easier once the prototyping phase has been completed. In addition to the maintenance of these models, data science teams at finance firms need to scale the AI models generated to the rest of the organization. What this means is that when a data science team develops an algorithm for a sales or marketing application, the models need to be replicated for several different scenarios. 

Predicting Customer Churn Using Automated Machine Learning Platforms 

Pazos explains this by considering the example of a use-case where a financial firm is using AI to predict customer churn for one of their products. If the company now needs to develop a customer churn prediction model for another financial product, in theory this might seem like a similar project that can borrow from the model already generated, but in reality, different products might really require totally different models to accurately predict churn. 

Pazos adds that making generalizations like this is a bad practice to have in data science. Every single data science project is unique in this way, even if the application or use-case is the same. In addition to this, even for the same use-case for the same product, the most optimized algorithm might be different based on the dataset the model is being trained on. 

The 1-minute video below gives a glimpse into how SparkCognition’s Darwin automated machine learning platform, and some of its functions and features:

There are several AutoML vendors in the market offering products for the financial services industry. SparkCognition also claims to have tested the Darwin model against several open-source Automated machine learning platforms such as Autosklearn, and Random Forest. The company claimed the following results: 

Results of Spark Cognitions test of its Darwin platform

The test included problems that involved classification such as identifying correct alphabet from contorted characters, regression analysis such as predicting real-estate prices and time series analysis such as predicting financial trading performances.

Each AutoML platform may have application and use-cases where the solution performs best, and financial services firms need to consider their existing data infrastructure, areas where they are currently having a strong need for AI-based automation with existing budgets and data science resources.  

Challenges Faced By Small Data Science Teams

Another challenge for small data science teams at financial firms is the fact that they are centralized and serving several other business units. This makes the work of data scientists much harder since they not only have to develop new models to solve challenges in different business units, but also constantly maintain the existing solutions they have already built to account for new data. 

Data science teams are already hard to hire and retain, as mentioned earlier, and it makes sense for financial firms to have these employees applying their specialized knowledge to solve advanced complex problems as opposed to the day-to-day data science maintenance work of dealing with data, testing certain scenarios, building models and being able to maintain those models. This is where AutoML platforms largely come into play. 

With financial firms hiring small data science teams that are centralized and dedicated to solving data problems in the firm, these teams are almost ending up as services organization inside the financial firm. For instance, the sales, marketing, and financial trading divisions at an organization might request the data science teams to solve various problems.

It is easy to overwhelm small data science teams with requests from different business units where they end up in a loop of serving these requests. According to Pazos, AutoML platforms can help free up the time of data scientists to help dedicate their intellect in more appropriate problems to create more value impact for the organization from their data science teams. 

AutoML can help with finding the correct algorithmic model that allows for the most accurate predictions given a particular dataset. On the other hand, AutoML platforms can also be used to build customer AI models, tailoring these models to the specific data at hand. 

In order to successfully leverage an AutoML platform, humans are still required in the loop. AutoML can help with getting a project up and running faster, yet humans still need to interpret the results of the system, validate them and put the system into practice in the organization. This requires inputs from both data scientists and subject matter experts. 

For example, in the customer churn prediction example, the AutoML platform might help with data pre-processing, identifying features in the dataset input to the system and general several models to find the most optimized model for a given dataset. But in order to test the predictions of churn and test the accuracy of predictions, data scientists and subject matter experts are needed.  

Financial firms might find it difficult to get data scientists and subject-matter experts on the same page.  

How AutoML Works

Pazos had another insight for financial services firms whose data science teams are stuck servicing requests from several business units. He asserts that when these firms start seeing a bottleneck in terms of productivity of their data science teams, usually this comes at the cost of innovation. 

Additionally, using traditional software engineers and subject matter experts build AI systems might not lead to roadblocks since these projects require a deeper understanding of data management techniques. Similarly, AutoML might also help new data science teams hit the ground running and learn and create models faster. 

For instance, SparkCognition claims they worked with a financial firm to generate the most accurate models to classify financial market regimes to capture multiple asset classes across a broad time window. According to the case study, the financial firm used the Darwin AutoML platform to create a predictive model based on historical financial data.

To find the best trading methodology, the company used SparkCognition’s Darwin to automate three major steps in the data science process:

  • Data Cleaning and Management: According to SparkCognition, Darwin Platform was used to automatically pre-process the historical financial trading data and make it machine-readable. 
  • Feature Generation: Feature Generation is the process of taking unstructured data and defining variables in the algorithm that are also called features. SparkCognition claims the Darwin platform was also used to automatically generated new features from the historical data that the subject matter experts did not previously uncover. 
  • Model Building: Lastly, the Darwin platform was used to generate a deep-learning architecture that was tested and updated to categorically fit the dataset input to the algorithm. This was done by fine-tuning the algorithm over several generations to arrive at the most accurate model.

AutoML platforms are not a solution by themselves, but are a good way to automate processes in an AI project, so that even small data science teams or new AI teams that have just been established can handle several new projects while still maintaining ones that have already been built. 

Manually developing a machine learning model requires financial subject-matter expertise, statistical expertise, and computer science skills. Human error, algorithmic bias, and bias in the data are all additional challenges that financial firms have to deal with during AI projects. Automated machine learning enables organizations to augment the capabilities of their data scientists wherein they can develop new projects without having to develop the capabilities from scratch every single time themselves, thus reducing the cost, time and effort involved in AI projects. 

Using AutoML it might be possible for even employees in financial firms without much data science experience to add to the development process of building AI systems since the software can help.  

Machine learning can help finance firms find potential business insights hidden in their data. The added benefit that automated machine learning brings to this process is largely in achieving the same capabilities but in a highly scalable way.


This article was sponsored by Spark Cognition and was written, edited and published in alignment with our transparent Emerj sponsored content guidelines. Learn more about reaching our AI-focused executive audience on our Emerj advertising page.

Header image credit: Dataquest

Stay Ahead of the AI Curve

Discover the critical AI trends and applications that separate winners from losers in the future of business.

Sign up for the 'AI Advantage' newsletter: