[seopress_breadcrumbs]

Why Big Data is Not Necessarily the Best Data for Business – A Conversation with Slater Victoroff

•

May 8, 2016

Why Big Data is Not Necessarily the Best Data for Business - A Conversation with Slater Victoroff

Episode Summary: You’re a business, and you’ve collected data – now how do you now make sense of it? Bring in ‘sentiment analysis’, a form of machine learning that determines whether text is positive or negative. Slater Victoroff’s company Indico provides algorithms that specialize in this task.

In this episode, we talk about about the common misconceptions that businesses have about where ‘big data’ may be applicable, and the lessons he’s learned by gaining more tangible insights from smaller sets of data for companies. He explains why big data is not necessarily better, and discusses the steps that companies should take early on to make sure they’re prepared when it’s time to apply machine learning to their processes.

Guest: Slater Victoroff

Expertise: Computer science/software

Recognition in Brief: Slater Victoroff is a graduate of the Franklin W. Ollin College of Engineering. In 2015, Victoroff won Boston TechJam and mentored at over 10 hackathons, including Hack the North, Hack Princeton, and Hack MIT. When founding Indico in 2014, Slater raised over $3 million as part of Boston’s Techstars accelerator program.

Current Affiliations: CEO at Indico Data Solutions

Machine Learning – More Than Big Data?

Big data is a trend for good reason. Without lots of information, machine learning would not have gotten its second wind in the world of AI. But what sort and amount of data is necessary to glean useful business insights from machine learning in the first place?

“It’s where a lot of people fall down, the notion that it’s not big data is really what I want to emphasize here, it’s all about quality over quantity, and what I mean by that is the primary types of data that are most valuable to companies in the long run is rich media,” says Victoroff.

When people think about big data, Slater says there’s a tendency for people to focus on collecting lots of data and fit it into the right format – taking customer support logs and fitting that information into a spreadsheet, for example. But massive amounts of tightly organized data isn’t everything. “When you look at the largest amount of value possible it’s really coming from rich media, which is text, which is audio, which is video, and the more of that data you can keep with useful metadata associated, I think the more value you’re going to get,” says Slater.

Victoroff founded Indico, a machine-learning based company offering sentiment analysis application programming interfaces (API) that use algorithms to analyze a range of text. One of the most common blunders that Victoroff has seen in working with businesses is working with Clickstream data, which collects huge amounts, up to terabytes of data; however, in Slater’s experience, the amount of value garnered by this data is negligible.

“It’s a perfect example that people are optimizing for types of data which are as large as possible to give themselves effective bragging rights without focusing on what would be deeply valuable,” he says. Storing all of a user’s clickstreams throughout their lifecycle has far less value than focusing on the data available through an associated social media account, which has a fraction of the data (such as Facebook posts), but which Slater believes offers far more insight than any mass of text in a decontextualized spreadsheet.

Dissecting Rich Media with Machine Learning

How should businesses decide what to track in the first place? Victoroff believes it starts with going back to the basics.

“There’s a big assumption that people have that for some reason because you’re dealing with data science and machine learning, you need to approach the probably fundamentally differently than you would as a human being, people assume that computers can’t deal with the same kind of information that people can and assume they can’t step through the same processes, but that assumption means that you’re trying to do apples and oranges comparison,” he says.

For example, many eCommerce businesses share the goal of maximizing product sales, and sending out accurate recommendations is a key facet of this business model. A human’s first goal is likely centered on better understanding the goals of the buying customers and products in their marketplace. They might come up with the idea of gleaning valuable information from a customer’s review on previous products or looking at their browsing history. On the product side, a person might try to find out more about designs and assessing the factors driving buying decisions.

There’s a logical progression to finding out this information, says Slater. In a perfect world, we can get social media login information for our users. A fashion company, for example, might check out a customer’s Instagram account and look at the items that they’ve liked (whether from their store or others) then select items similar from their store and use this information to drive recommendations.

“In my mind that’s a very intuitive, a very human way of thinking about it, and that’s exactly the way you should approach it with machine-learning perspective, but what I see typically is people faced with exactly that sort of problem (driving recommendations), instead of thinking about it like every other problem, say let’s instead find the biggest amount of data that we can,” says Victoroff.

Collecting 500 gigabytes of Clickstream data that lists what products people have looked at, for example, might be logged and forgotten about, and by the time it’s pulled up people are left with a lot of data taken out of context and is essentially useless. Slater emphasizes that the assumption that computers can’t look at the same information as humans (such as social media posts) and make similar conclusions without a ton of data is false. “Getting out of that mindset that computers somehow deal with these problems in a fundamentally different way than people do, I think that’s really the first thing to go.”

But Victoroff notes that while there are some eCommerce sites using this strategy, a lot of big eCommerce companies, including Amazon, are not (to the best of his knowledge) currently linking social media accounts to their buyer profiles and using machine learning to analyze people’s posts and comments. “If they talk frequently on Facebook about running and cooking, that’s a product recommendation right there on a fundamental level and potentially you’re allowed to show them things that then you wouldn’t have insight into otherwise,” says Victoroff.

If you understand someone is a cooking fanatic, just a tiny granular of information you can get from someone’s profile, then you can provide better recommendations. This approach speaks to the evolving marketing trend of helping people by recommending products or services that fit their needs, even if they’re not actively looking through search queries.

Potentials of Machine Learning in Business Applications

Victoroff and his team have seen positive outcomes by using machine learning in business with smaller and more immediate sets of data, including social media information. Indico has helped pick up on actionable and relevant trends in the area of customer support, for example. “We did some early experiments and found some really interesting results…a lot of people on the customer support side are interested in getting through tickets as quickly as possible…it’s a great problem, it something that absolutely needs to be solved,” says Slater. He notes that there are many companies who do this well, but his company decided to look at the level beyond, which was taking the customer support data and assessing real actionable feedback on the product.

When the indico algorithm analyzed just the text data in customer support records, it was able to recognize things like product features broken by a new release or users unhappy with a particular aspect of a product. Sentiment around features change over time, and those mentioned frequently were correlated with a significant drop in satisfaction rate.

They were also able to recognize the things about which a specific customer might be upset about, providing a kind of early intervention in some cases by picking up on text clues.

“If the last three interchanges have all been a little bit negative, that’s a strong signal of a customer potentially leaving the service…this is looking through data that’s already there, people already have access to but they largely ignore, because even though the first thing a person would do if they were trying to innovate on a product is look through a customer’s support record, there’s this assumption that computers can’t do the same job,” says Slater.

There seems to potentially immense value in a computer that can take customer intervention to the next level by picking up early on keywords and associated sentiments, correlated with customer satisfaction and a particular product change, that might only become clear to humans once a certain amount of service tickets have stacked up.

What are the things that Victoroff would warn people against when leveraging machine learning for their business? One is the issue of collecting data without keeping useful associated metadata. An eCommerce site that wants to optimize for sales needs to keep its sales data, and a lot of people don’t realize this point, says Slater. “Whatever your highest level business objective is, you want to make sure that that’s somewhere in the data that you’re storing,” says Slater.

A company might want to do something with images and try to store all images, but if they’re not collecting attached metadata to traffic or sales or whatever the high-level directive is, it’s basically useless. Victoroff emphasizes that a lot of the time if you don’t store this information when it happens, then it’s basically impossible to go back and find this data.

A lot of people also assume they need a custom solution for their data.

“Pretty much nobody’s data is a snowflake, 9 times out of 10 you’d be just as well going with some off-the-shelf solution and the attempt to specifically customize something for your application will actually hurt you more than help you,” says Slater.

If a vendor has a solution ready to help optimize images for sales, for example, it’s probably a good fit for you business’ needs.

Victoroff even goes so far as to state that if a vendor is promoting a custom solution, that it might be a way to conceal what’s called “overfitting” i.e. masking the weakness or narrow problem domain of the technology with which they’re working. The amount of data that most businesses bring is small in the machine learning world, and while you could try and stretch your data set out, but the delta that most businesses are going to get from a customized solution is negligible, especially as a first step.

Image credit: generalassemb.ly

Recommended from Emerj

Artificial Intelligence at Manulife

Manulife Financial Corporation is a leading international insurance company headquartered in Toronto, Canada, with a strong presence in Canada, Asia, and the United States. As of the first quarter of 2025, Manulife reported core earnings of $1.8 billion and core revenue of $2,140 million. The company manages over CA$1.3 trillion in assets and employs more…

Riya Pahuja

•

July 14, 2025

GenAI Efficiencies in Manufacturing, Smart Farming, and Beyond – with Dr. Steffen Hoffmann of Bosch

Traditional manufacturing faces significant challenges due to its reliance on legacy processes, manual quality control, and limited adaptability to evolving market demands. Research from the Universities of Liverpool and Bristol in 2022, respectively, highlights that legacy manufacturing equipment and processes often lack the advanced sensing, real-time monitoring, and data analytics capabilities found in modern, digitally…

Riya Pahuja

•

July 7, 2025

Unlocking Enterprise Efficiency Through AI Orchestration – with Kevin Kiley of Airia

This interview analysis is sponsored by Airia and was written, edited, and published in alignment with our Emerj sponsored content guidelines. Learn more about our thought leadership and content creation services on our Emerj Media Services page. Organizations are increasingly vulnerable to unintentional data leaks when sensitive information is shared with AI systems. In 2023,…

Riya Pahuja

•

July 4, 2025

Artificial Intelligence at Barclays – Two Use Cases

Barclays is a leading British universal bank with a diversified portfolio serving retail and wholesale customers globally. The bank employs over 100,000 people worldwide, reflecting its significant global footprint and scale. In its most recent financial results for Q1 2025, Barclays reported £7.7 billion as total income, up 11% year-on-year. Barclays’ approach to AI centers…

Riya Pahuja

•

June 30, 2025

Building AI Systems That Think Like Scientists in Life Sciences – with Annabel Romero of Deloitte

This interview analysis is sponsored by Deloitte and was written, edited, and published in alignment with our Emerj sponsored content guidelines. Learn more about our thought leadership and content creation services on our Emerj Media Services page. Large language models have significantly advanced the field of genomics by enabling the prediction of genome-wide variant effects,…

Riya Pahuja

•

June 24, 2025

Inside Enterprise Strategies for Fighting First Party Fraud at Scale – with Leaders from Justt and Walmart

This article is sponsored by Justt and was written, edited, and published in alignment with our Emerj sponsored content guidelines. Learn more about our thought leadership and content creation services on our Emerj Media Services page. Fraud and chargebacks present a massive and growing financial burden for merchants and financial institutions. According to Visa's Fall…

Riya Pahuja

•

June 23, 2025

Artificial Intelligence at CVS Health

As one of the largest healthcare companies in the United States, CVS Health generated $357.8 billion in revenue in 2023 and serves over 100 million people annually across its insurance, retail, and pharmacy operations. With such an expansive footprint, the company’s internal application of AI is critical not only to improve operational efficiency but also…

Emily Smith

•

June 20, 2025

How Leaders in Regulated Industries Are Scaling Enterprise AI – with Leaders from Searce, Blue Cross Blue Shield, US Bank, and Emprise Bank

This interview analysis is sponsored by Searce and was written, edited, and published in alignment with our Emerj sponsored content guidelines. Learn more about our thought leadership and content creation services on our Emerj Media Services page. Regulated industries, such as healthcare and finance, face significant barriers to AI adoption—compliance constraints and legacy systems hinder…

Marilie Fouche

•

June 18, 2025

Artificial Intelligence at BMW

Headquartered in Munich and founded in 1916 in Germany, the BMW Group is a multinational vehicle manufacturer that manufactures vehicles in Germany, the United Kingdom, the United States, Brazil, Mexico, South Africa, India, and China. In the US alone, BMW brand sales totaled 87,615 vehicles in the first quarter of this year, which represents a…

Sharon Moran

•

June 16, 2025

AI for Drug Development and Portfolio Management – with Leaders from Intelligencia AI and Novartis

This interview analysis is sponsored by Intelligencia AI and was written, edited, and published in alignment with our Emerj sponsored content guidelines. Learn more about our thought leadership and content creation services on our Emerj Media Services page. Rising costs, long development cycles, and uncertain success rates continue to challenge pharmaceutical R&D efficiency. According to…

Marilie Fouche

•

June 12, 2025

Building Better PCB Layouts with AI Driven Optimization – with Alain-Sam Cohen at InstaDeep

This interview analysis is sponsored by InstaDeep and was written, edited, and published in alignment with our Emerj sponsored content guidelines. Learn more about our thought leadership and content creation services on our Emerj Media Services page. Printed circuit boards (PCBs) are the foundation of virtually every electronic device, from smartphones to spacecraft. Despite decades…

Marilie Fouche

•

June 11, 2025

Securing GenAI at Scale for Observability, Guardrails, and Risk – with Leaders from ActiveFence and Barclays

This article is sponsored by ActiveFence and was written, edited, and published in alignment with our Emerj sponsored content guidelines. Learn more about our thought leadership and content creation services on our Emerj Media Services page. As generative AI systems enter mainstream enterprise workflows, the conversation around AI safety has shifted from theoretical concern to…

Marilie Fouche

•

June 6, 2025

Search site

Search site

Why Big Data is Not Necessarily the Best Data for Business – A Conversation with Slater Victoroff

Machine Learning – More Than Big Data?

Dissecting Rich Media with Machine Learning

Potentials of Machine Learning in Business Applications