Crowdsourced Natural Language or Speech Training – Use Cases and Explanation

Dylan Azulay

Dylan is Senior Analyst of Financial Services at Emerj, conducting research on AI use-cases across banking, insurance, and wealth management.

Crowdsourced Natural Language or Speech Training - Use Cases and Explanation

When it comes to planning an AI initiative, a business will need to determine the method by which to acquire the data necessary to meet their objectives. Data is essential when it comes to succeeding with AI. An effective AI strategy is built on top of data that is specific to the business problem a company is trying to solve.

As we discussed in our recent piece on creating a data strategy, a business may either need to collect brand new data on which to train a machine learning model or enhance the data it already has to do the same.

A business can collect and enhance its data in a variety of ways, one of which is crowdsourcing. We spoke with Mark Brayan, CEO of Appen, a firm that offers crowdsourced training data for machine learning applications. In our interview, Brayan discusses when a company might take advantage of the crowd to acquire the data it needs to train a machine learning model.

Brayan gets to the idea of how natural language processing models might often require human inputs from a variety of backgrounds in a variety of situations to be successful at executing their intent.

Crowdsourcing companies could provide businesses access to these humans. In this article, we dive into several use cases for natural language processing models in which the crowd might prove useful for collecting and enhancing the data necessary to train those models, and we do so through the lens of how the crowd might train those models. The crowd might train effective NLP models on the following:

  • Accents and Dialects
  • Situations and Environments
  • Domain Language
  • Sentiment

Crowdsourcing may also help businesses looking to build their own AI models label the data on which they intend to train those models. Listen to our full interview with Mark Brayan below:

Training an NLP Model on Accents and Dialects

Companies looking to sell into a variety of regions and countries might consider building their NLP models to understand the accents, dialects, and idioms of the people who live in them. English is spoken in a variety of ways depending on the speaker’s region or country, and sometimes depending on the speaker’s culture.

An NLP model trained to understand British English may not understand it as it is spoken by someone in the southern US, for example. In other words, a customer in the southern US may find that the voice recognition system with which they are interacting does not recognize their voice because that voice recognition system was only trained to understand English as it is spoken in England. This could pose a problem for a company that wants to sell their products to people in the southern US.

Brayan further describes this dilemma when he discusses how an automotive company might use NLP to build the voice recognition system built into its vehicles, a use case that inspired the following example:

If an automotive company intends to sell its vehicles in Boston, they might need to collect speech data on the way Bostonians talk about concepts related to vehicles and driving. For example, a driver from Boston might naturally say, “Waterfront parking,” and intend for the car to bring up a route to the nearest (and perhaps cheapest) parking garage in the Waterfront area of the city.

The driver might have a thick accent and say the word “parking” without pronouncing the “r.” If the voice recognition system was trained on standard English, it won’t register the driver’s intent; the car’s navigation system would not pull up a route to a parking garage.

Additionally, the voice recognition system won’t understand “Waterfront parking” means “a parking garage in the Waterfront area of the city” without first being trained on what and where the Waterfront is.

The problem becomes even more nuanced when it’s understood how an NLP model might attempt to “understand” speech. Indeed, the automotive company may have trained their cars’ voice recognition systems to understand that the word “parking” in a Boston accent still means parking the car. However, if only one person trained that model to understand the word “parking” in that accent, the voice recognition system might only understand it if that person says it.

This is one place in which the crowd could prove useful for a company training a speech system. An automotive company might need a thousand humans from New England to drive the company’s car and record a wide variety of voice commands. These humans could then indicate to the voice recognition system that it misunderstood when the car’s navigation system does not in fact pull up a route to a parking garage.

Eventually, the voice recognition system would learn that “Waterfront parking” in a Boston accent meant “find a route to the nearest parking garage in the Waterfront area of the city,” and—key here—it would learn to understand the phrase when spoken in a variety of different voice pitches from a variety of different people who have Boston accents.

This could result in a car with a voice recognition system that understands a Bostonian’s speech from the moment they purchase it, which of course would be the objective of the automotive company selling in the city.

If the automotive company wants to sell into markets outside of Boston, they may need to repeat this process for every region in which they intend to sell—all of this for the voice recognition system to understand just one driving-related concept. Brayan suggests that businesses looking to expand internationally are “going to have to localize that data not just from one language to the next, but make sure it fits the culture and so on.”

The nuance of training an NLP model on accents and dialects extends to other use cases. Brayan gives an example of an auto insurance company that might offer a chatbot on a smartphone app to handle routine customer requests. An American driver might type in the chat interface, “I crashed my car,” and a chatbot trained on American language might understand that the customer is looking to get an estimate of how much the insurance company will cover.

That same chatbot might not understand the intent of an Australian customer who types, “I pranged my motor.” The auto insurance company might benefit from working with a data collection vendor to create a program that might involve gathering people from Australia to train its chatbot on Australian English. Brayan points out, “The success of the chatbot comes down to the ability to answer questions. The more natural it is, the more capable it is, the more the end user…gets value out of that channel.”

Training an NLP Model on Situations and Environments

Accents and dialects aside, different regions of the world experience different road conditions due to weather and construction frequency. The aforementioned automotive company could theoretically train its voice recognition system to understand various accents and dialects as they’re spoken in a recording studio.

Indeed, the voice recognition system may then be able to understand phrases spoken in a variety of accents and dialects, but it may only be able to do so when they’re said in a recording studio. A recording studio has extremely different acoustic properties than the inside of a car, where the voice recognition system is really used.

The crowd may once again prove valuable to solve this problem. A crowdsourcing company could gather humans in a specific region to drive around that region in a variety of cars of different makes and models and say a list of phrases into a voice recognition system provided by the automotive company.

On top of that, the automotive company might want these crowdsourced drivers to drive in different road conditions: light rain, thunderstorms, snowstorms, high traffic volume, low traffic volume, smooth pavement, gravel roads. It might request that drivers speak into the voice recognition system while they’re blaring the radio or while it’s quieter.

The company would then collect data on how the people in the region say a variety of phrases in their own accent and dialect and what that sounds like in a variety of cars and road conditions. This would increase the accuracy with which the voice recognition system understands a driver on their first day with the car—the necessary goal for selling into the region.

Training an NLP Model on Domain Language

There are a variety of other uses cases for NLP where crowdsourcing may be one of the best options available to a company. In customer service more broadly, call centers might consider using NLP-based voice recognition systems to handle routine customer calls. The system would understand the caller’s request and resolve it over the phone. Theoretically, the caller would be able to explain their request in natural language, as opposed to the way more traditional phone recording software might require the caller to say exact words or phrases in a question-answer fashion.

A nationwide or global business may need to train its call center voice recognition system to understand domain language and the language found in common customer support inquiries in addition to regional accents and dialects. It might pay a crowdsourcing company to find humans to call the business’ customer support line and speak with the system as if they were customers.

For example, a CRM business might receive frequent calls about how to pull up a certain report in its interface. The business might request that crowdsourced humans from all over the world call their system and ask how to pull up the report. As a result, the NLP model behind the system would be trained to understand a highly specific customer inquiry said in a variety of accents and dialects. The business might even request that callers make the call in their home, in a busy office space, or in the backseat of a car so that the system is trained on hearing callers in a variety of acoustic environments.

A fashion brand may need to train its customer service chatbot to understand specific domain language. The chatbot would need to be trained to understand the names of clothing articles, apparel brands, colors, sizes, and fits, as well as the context surrounding which articles of clothing to show to which customers based on their gender and age. Crowdsourced humans may be able to message the chatbot and use this language. Then, they could indicate to the chatbot whether or not it fulfilled on their intent.

For example, if a customer types “The Levi’s I bought for my son don’t fit him,” the NLP-based chatbot would need to understand that the customer is intending to receive a different size of Levi’s jeans for a male wearer. It might then suggest men’s Levi’s jeans of sizes that are slightly larger or smaller than the ones the customer purchased as described on their customer contact record.

Depending on the customer’s geolocation, they may be using “Levi’s” as a stand-in for all jeans, and an NLP model trained on customers from that geolocation may be able to determine if that’s the case. The chatbot would then understand that it should show the jeans of the brand that the customer purchased, even if that brand is not, in fact, Levi’s.

If the customer explained their dilemma to a human customer service agent, the agent would easily know to tell the customer they could send out a pair of the exact same jeans in a size larger or smaller. This is, however, a highly nuanced situation for machines. The NLP model behind the chatbot would need to have seen thousands of labeled examples of how “Levi’s” is a brand (or a stand-in for all jeans), “son” indicates men’s sizes, “don’t fit” indicates not only the need for a differing size, but the intent to make an exchange, which may require a refund of the previous purchase if the jean sizes are differently priced.

Training the chatbot with a series of queries and possible answers that are crowdsourced from humans who fit the target demographic is one way to potentially ensure that the chatbot could more effectively respond to customer requests, and over time the chatbot could learn from customer interaction.

Training an NLP Model on Sentiment

In addition, an eCommerce business might want to train its conversational interface or customer service chatbot to understand the sentiment involved in a customer’s call or email-ticket. The business might want to escalate inquiries from angry customers to the customer service manager, for example. A voice-based customer service chatbot trained only on neutral or content inflections and language might not recognize the severity of a customer inquiry, which might exacerbate the customer’s frustration.

The customer might be calling about an unrecognized charge on their account from a yearly subscription service for which they forgot they signed up. The customer might angrily threaten to contact their bank and charge back the subscription payment; if they feel their anger isn’t being addressed, they might just hang up and call their bank.

The eCommerce business might prevent this by having crowdsourced humans call the voice-based chatbot and pretend to be frustrated while discussing their theoretical unknown charge. That said, people from different parts of the world will express frustration differently. They may use different words or phrases when frustrated or take on a sarcastic tone. An eCommerce company selling to multiple countries may need to have humans from each region in that country call their NLP chatbot and express frustration before the chatbot is ready for a worldwide launch.

A chatbot trained in this way would theoretically be able to escalate any calls it receives in which the customer sounds angry, which could save the eCommerce business from chargebacks and therefore trouble with their merchant account.

Crowdsourced Data Labelling

In order to determine sentiment, however, an NLP model would need these frustrated calls to be labeled “frustrated” or “angry.” After thousands of these calls, the NLP model would theoretically learn to determine frustration in any new calls it receives. Labeled data is necessary for machine learning models in general; it’s what they are trained on. Data labeling can also be crowdsourced.

A global banking brand might pay a crowdsourcing company to scroll through social media posts that mention the bank and label those posts as one sentiment over another, for example. These labeled posts would then be used to train an NLP model to determine the sentiment in posts it has never seen before. Theoretically, the bank would then be able to have its NLP software scroll through social media on a regular basis and determine how the banking brand is being discussed by different demographics of people.

The NLP model might find that customers living in Los Angeles have less positive things to say about the bank than customers living in Boston, and the bank might allocate resources to figuring out why this is the case.

Perhaps the hiring managers of the Los Angeles banks have inadequate or incomplete criteria with which to hire suitable tellers, in which case the bank could correct this. The bank might also use another machine learning software to scroll through the public social media posts of its customers, and it finds that its Los Angeles customers are simply more likely to be vocal about their discontent for large companies than its Boston customers, in which case no action may be required.

The aforementioned fashion brand could crowdsource its own data labeling to build its chatbot. Crowdsourced humans could label “Levi” as a jean brand, dresses as “women’s apparel,” and the phrase “the sweater is itchy,” as an indication to offer the customer a sweater of a different material. The automotive company could have crowdsourced humans label the phrase “backroads” with the action of the car’s navigation system bringing up a route that avoids highways.

Once an NLP-based chatbot is trained on thousands of these labeled terms and phrases, it could theoretically resolve routine customer inquiries through its chat interface or navigate drivers to their destination while avoiding highways.

High-quality labeling is important for training machine learning models for several reasons. A fashion company may want to build a recommendation engine, for example. That company would first need to determine the strata on which it wants customers to be recommended products. Without narrowing how someone might describe a sweatshirt, for example, with a label, they could describe the sweatshirt with any number of words.

The fashion company would need to relay to those doing the labeling that a sweatshirt with a certain look needs to be labeled a certain style. In other words, the company needs to provide the actual labels to the people labeling the data on which the machine learning model will be trained. In doing so, the company may ensure their recommendation engine is making recommendations that increase customers’ lifetime values, especially if the company selects labels that are informed by business intelligence and analytics.

Some companies may require the people labeling their data to have some kind of domain expertise so that they understand the context of the labels provided to them. A company looking to train an NLP model to search health insurance documents might need to hire people who know how medical terminology is discussed in a variety of ways within a health insurance document.

There may be multiple ways of describing the same symptoms and conditions, but the company may only provide one label that encompasses those symptoms and conditions depending on how they are discussed. The people labeling the data would need to understand the context surrounding those symptoms and conditions in order to accurately label them when they’re discussed in a variety of ways.

A company that does not specify how someone should label their data and that does not hire people who have an understanding of those labels may risk wasting time and resources on an inaccurate NLP model.

Crowdsourcing to Meet Business Goals

Crowdsourcing may allow businesses access to large pools of people from a variety of places, each with their own ways of speaking and expressing their emotions. This could prove valuable when building machine learning models. “The advantage of the crowd,” according to Brayan, “is you get human-derived or human-quality data.”

Human-quality data—that which accounts for the nuances of human life—may increase the accuracy of NLP models, allowing them to follow through on their intent and meet the business goals of the companies that build them. Not every business goal is going to require AI, let alone natural language processing that might benefit from crowdsourcing its model training and data labeling.

Business leaders should first determine if they can meet their business goals without using AI well before investing in building their own models. If a business determines it can afford the possible data science or software engineer staff requirements or a lengthy integration process, crowdsourcing is one option it has to train whatever model it intends to build to meet its goals.


This article was sponsored by Appen, and was written, edited, and published in alignment with our transparent Emerj sponsored content guidelines. Learn more about reaching our AI-focused executive audience on our Emerj advertising page.

Header Image Credit: as

Stay Ahead of the AI Curve

Discover the critical AI trends and applications that separate winners from losers in the future of business.

Sign up for the 'AI Advantage' newsletter: