Making Genies – Apple’s Acquisition of VocalIQ and the Importance of Speech Recognition

Daniel Faggella

Daniel Faggella is Head of Research at Emerj. Called upon by the United Nations, World Bank, INTERPOL, and leading enterprises, Daniel is a globally sought-after expert on the competitive strategy implications of AI for business and government leaders.

Making Genies - Apple's Acquisition of VocalIQ and the Importance of Speech Recognition

Apple’s recent acquisition of UK-based speech recognition company VocalIQ was done in as mysterious a manner as any of their other acquisitions, with no overt explanation from Apple, and no open reply from the acquired team.

Speech recognition (and the broader domain of “natural language processing”) is likely to be a more important facet of the future of consumer tech than most consumers realize. Apple and many of it’s direct competitors are aiming to create real-life genies, personal assistants that not only understand your commands and questions but can act on them in the real world.

Secrecy has always been the name of the game with Apple – but there’s little stretch of the imagination required to see how and why Apple would buy a speech recognition company.

The entire proposition of VocalIQ was to develop a personal assistant AI that could learn from ongoing interactions with it’s users, and improve it’s responses and it’s understanding of context over time. Re/code’s article covering the acquisition put it well:

Enhancing Siri’s ability to better understand what people are asking — or to follow up with questions when its unclear — would touch virtually everything Apple does, from its iPhones and the Apple Watch to the electric car it still has under wraps.

In the early 2000’s, Google won the search engine game by simply connecting users to more pertinent results (most of the time) than their competition. Certainly Google’s early lack of ads and other user experience factors almost certainly helped, but the fact of the matter is that relevance and Google become synonymous, and there wasn’t a reason to go elsewhere (for crying out loud, I Googled “how Google won the search engine war” to write this article).

In some cases, Google wasn’t astronomically better than Lycos / it’s other competitors, but it had enough of the strong edge to become the de-facto search engine.

Big consumer tech giants like Apple, Google, Microsoft, and Facebook are all of the understanding that at some point, the “AI personal assistant war” will be won in a somewhat similar battle as the “search engine war.” Someone’s app will just be better. They’ll connect to more devices, allow for more commands, and – arguable most importantly – they’ll be intuitive and simple to talk to and interact with.

At present, Apple has a lot more chips on the table in the devices game than newer personal assistant rival, Facebook, because it has so many devices in the works. Microsoft and Google have a similarly large array of devices, and many of the large publications who covered the VocalIQ acquisition (including this Forbes piece) understand that winning the speech recognition game is really about becoming the most popular metaphorical genie that users around the world call on to help them with their goals.

It’s more than finding a good diner in town – it’s about being the platform of human-technology interaction that people use in work and life. To some extent, that same underlying battle is being fought right now – but with improvements in AI with regards to determining meaning and context in natural speech, the next frontier is open game.

Understanding intention and meaning, as it turns out, isn’t easy, and there are researchers who have been trying to crack this nut for decades. In one of our recent interviewees – Dr. Daniel Roth (Harvard PhD, professor at University of Illinois) has been working on natural language processing for nearly 20 years. He provides an example of one of the core challenges in natural language processing: “Nearly any word in the english language – the the word ‘table’ – can mean multiple things… ‘table’ can be a noun, or it can be a verb, you can ‘table’ things.”

For this reason, traditional “if-then” statements for programming simply cannot suffice if our aim is to develop applications that can tune into the meaning and context of a real conversation.

Here are some examples of tasks that AI like Apple’s Siri and Microsoft’s Cortana can accomplish already:

  • “Find me the next bus from here to San Fransisco”
  • “Pull up my last photo album from Vermont 2014”
  • “Text Peter Faggella and tell him I’ll be calling him in 2 hours”
  • “What’s the cheapest Chinese food restaurant in Cambridge, Massachusetts”
  • “Find me the best Donald Trump memes”

(For a more robust look at the future chatbot plans of the “big 4” tech firms, please see our recent article entitled Chatbot Comparison – Facebook, Microsoft, Amazon, and Google.)

Dr. Roth predicts more significant leaps in natural language processing in the coming 10 years. He’s of the belief that as opposed to a kind of simple Q-and-A interaction as we have with Siri, future personal assistants will be more of an extension of our thought – machines that will help us brainstorm and think through white-canvas problems, in addition to contextually understanding what we need, and bringing that information to us.

Roth continues, “There are millions of medical articles published each year, and nobody knows what’s in all of them… in the future, a doctor may be able to sift through the world’s medical knowledge for the particular treatment ideas or diagnostic studies that will help with his particular case – all just by asking the machine.” Interested readers can listen to the entire NLP interview with Dr. Dan Roth here.

Sure, Siri can Google something for you. No problem. But she can’t hunt down the super-relevant list of specific resources that you need in the here and now – certainly not when it comes to the vast domain of medical journals. Sure, we can say that right here and now “the world is at our fingertips” thanks to the internet. But it be closer, it could be faster, and we humans aren’t about to give up on the opportunity for more.

Here are some examples of tasks that present AI applications generally cannot handle, but should be able to handle in the coming decade (a few of these examples are derived from our conversation with Dr. Roth):

  • “Show me all reputable medical journal articles in the last two years that demonstrate correlations between high blood pressure and diverticulitis, excluding those published in the United States.”
  • “How does my current sales compensation plan differ from those that sales training experts advise? Are there any changes that most experts would probably recommend I make?”
  • “I’m trying to convince my mom that living in an assisted living facility is the best thing for her health and wellbeing, how have other people like me successfully brought this up with aging parents?”
  • “Suppose a 2009 Mustang can accelerate from a stop to 100 mph in 15 seconds. Let us assume that the acceleration was at a constant rate (This would mean that the average speed during the time traveled was 50 mph.) Would the Mustang have traveled the quarter mile in this time?” (sampled from uwosh.com)

Ultimately, these devices are extensions of ourselves. Our wants and ideals are not limited to ordering pizzas, finding busses, and calling people. Our wants and desires are borderline limitless, and there simply is no end for the desire for more, better, faster, easier. We want to extend our human capacities to get what we want, and technology companies give us a way to do that.

Apple might have an exceptionally strong market position and “cool” factor, but it seems impossible to deny that people guy from Apple because they know Apple products work, and often are easier to use and integrate into our life than other brands of consumer tech goods. For many people in the market to get their next phone or laptop (myself included), Apple is the logical choice for just these reasons.

In the coming decade(s), the companies who extend our capacities to the next level are most likely to succeed if their technologies extend our capacities most naturally and seamlessly.

The physical capacity of devices will do that (watches to monitor health stats, cars to take us places, home stereos to get us in the groove, appliances to do the dirty work), but it’s extremely like that AI will be a huge part of the next kind of interface between humans and technology… and this’ll be “make or break” for whose genie we decide to call on.

I’ve long speculated about what this tendency implies in terms of grand changes in the human experience and indeed, human potential itself.

What requires little speculation is that whoever wins the personal assistant game (and so successfully extends our abilities to command technology seamlessly with our will), will indeed gain tremendous momentum, possibly as much (if not more) than Google gained when it became the world’s de-facto search engine (indeed, if there’s any private company capable of making a sentient world simulation that’s better than the NSA’s, it’s Google).

Who’s genie will you be using in 2020? Apple hopes it’s theirs. There’s a shot VocalIQ will help them get there – and given Apple’s recent ramp-up efforts for AI staff, it’s likely that it’s not the last acquisition of it’s kind of Apple.

Stay Ahead of the AI Curve

Discover the critical AI trends and applications that separate winners from losers in the future of business.

Sign up for the 'AI Advantage' newsletter:

Subscribe