UBER’s Head of Machine Learning Thinks You Might be Doing it Wrong

Daniel FaggellaLast updated on November 29, 2018

Last updated on November 29, 2018, published by Daniel Faggella

Daniel Faggella is Head of Research at Emerj. Called upon by the United Nations, World Bank, INTERPOL, and leading enterprises, Daniel is a globally sought-after expert on the competitive strategy implications of AI for business and government leaders.

UBER's Head of Machine Learning Thinks You Might be Doing it Wrong

Machine learning has the best chance of achieving meaningful return on investment when companies model previous success.

At last week’s Applied Artificial Intelligence conference in San Francisco, Uber’s Head of Machine Learning Danny Lange laid out his four principles for simplifying the process of applying machine learning in business.

Lange has witnessed firsthand the evolution of machine learning technologies, and he has a pretty good idea of what works and what doesn’t when companies want to implement machine learning for the first time.

A software creator and computer scientist by early trade, Lange founded Cupertino-based Vocomo Software in 2001 before it was acquired by Voxeo in 2005. Most recently in November 2015, Lange took on the role of head of machine learning at Uber.

One of the beauties of more companies implementing machine learning are all the mistakes they make and the resulting lessons that can be gleaned by those who are interested in using the technology, but haven’t yet made the leap.

You don’t have to be a behemoth company, Lange says, to apply machine learning. Open-source machine learning platforms are more accessible than ever, and if you have the right framework for implementing them, opportunities abound for even smaller businesses to find value. Lange suggests making the time spent implementing machine learning more productive by considering the following:

1 – ‘Low hanging fruit’ is the answer to this question: “If we only knew…”. Find a problem before you implement machine learning as a solution. Ask the question you’re dying to know but can’t figure out with existing methods i.e. ‘If we only knew’ the real return on investment (ROI) of our video marketing, or how to get more people to stay on our subscription software, or the commonalities of our customers that require the least amount of weekly and monthly maintenance, or (fill in the blank).

2 – Start supervised learning with a wealth of historic data. Lange argues that most companies don’t need to collect months of data after implementing a machine learning system before they derive value. Instead, look at the historical information that you already have and feed it to a supervised machine learning system (an algorithm that takes a known set of inputs and a matching known set of outputs and trains a model to generate predictions for responses to new data). Companies often have reams of saved customer service data that can yield lots of valuable insights, like how lead sources correlate to refunds, or how service packages are related to the amount of customer support a particular customer requires. The key is to choose existing data that is related to your main problem or question so that you drive ROI with purpose.

3 – Start with clean data, not big data: Don’t just find the biggest bucket of information; instead, find the information that you know is clean. Maybe you have lots of data around promotions and sales, but you tracked that data differently every month and yielded “messy” or “dirty” data (in other words, data that is not uniform). You have make sure you’re comparing apples to apples – this is what’s meant by clean data. Try finding a clean subset of information from that larger messy data set – maybe the way you measure and track customer churn and lead source has been the same. Your resulting dataset may not be as big, but if you can look at the data evenly across the board i.e. the format has stayed the same, then it’s considered clean and right for the job.

4 – Use an available cloud system (Amazon, Google, Microsoft, etc.): Some of the biggest names in the industry have started to introduced cloud-based machine learning (also known as open source software libraries), which are more or less like ‘machine learning kits’ that allow companies and developers of varying skill levels to build their own systems and models. Amazon offers Amazon Machine Learning, Google has TensorFlow, Baidu offers The Stack, and there are others. Danny recommends doing some research and leveraging one of these pre-packaged systems and skipping the from-scratch route.