The business world has been talking about AI for several years now, and it’s safe to say that it’s reached a certain cultural moment among executives in areas like banking, insurance, and pharma. Government leaders have been much slower to start the conversation around the capabilities of AI, including the possibilities they offer to militaries and the ethical implications of AI when it comes to governance and the legal system.
That all said, the general populace has been interacting with AI for the greater part of a decade thanks to tech giants like Google, Facebook, Amazon, and Netflix. People have become accustomed to accurate search engines, news and social feeds, and recommendation engines that seem to know us to an uncanny degree. There seems to be a general acceptance of these AI capabilities amongst the general public, and so it makes sense that machine learning would start to make its way into entertainment through other avenues, namely video games and—the focus of this article—toys.
In this report, we detail four companies offering toys for children that make use of a variety of artificial intelligence and machine learning technologies, including speech recognition and machine vision. We also found that some of these toys allow users to in effect code them themselves. As such, we’ve broken this article into two sections:
- Standard Toys with Machine Learning Software
- Block Based Coding Toys
Standard Toys with Machine Learning Software
Sphero offers a Star Wars BB-8 droid and Force Band accessory, which it claims is app-enabled and responds to voice and motion commands using natural language processing and gesture recognition. The product is also available in a black BB-9E model, as well as a droid that uses the same technology but is modeled after R2D2.
Sphero’s BB-8 droid has motions and sounds that are accurate to the Disney character it is modeled after and is made to travel along predetermined paths and explore its surroundings. With the Star Wars App, users can draw a path for the droid to follow, and it will travel in a path identical to the line drawn by the user. It can also follow a path that’s part of an augmented reality (AR) simulation while the App serves as the display for the user. There are simulations of scenes from multiple Star Wars movies a user can choose from. Their mobile device then displays a first-person computer-generated video of the droid acting out the scene with other characters. This allows the user to experience the scene from the droid’s “perspective” and watch their droid move and explore the real world as if it were a science fiction setting.
An accessory for the Sphero Star Wars droids is the Force Band, a wrist-bound device that detects arm and hand gestures to command the droids and simulate “The Force,” a magical ability from the Star Wars franchise. Users can extend their hand out in front of them to command the droid to move forward, and it will move faster with how far the user extends their hand. To bring the droid back, one can turn their hand around and bring their arm back quickly and the droid will turn around. The droid will stop if a user lets their hand rest by their side.
We can infer the machine learning model behind the software was trained on thousands of voice clips involving the verbal commands Sphero and Disney chose the droids to accept in various accents and inflections. These clips would have been labeled as phrases and keywords for the software to respond to. These labeled clips would then be run through the software’s machine learning algorithm. This would have trained the algorithm to discern the sequences and patterns of 1’s and 0’s that, to the human eye, form the sound of a verbal command as displayed in a user’s voice speaking into a microphone. In this case, it is the microphone built into the user’s mobile device that is running the Star Wars app.
The user could then expose the software to voices and commands that are not labeled using the Star Wars app. The algorithm behind the software would then be able to determine how to respond to the command or to let the user know that it is not hearing the command properly. The system then responds to the user’s command with the action that corresponds with the given command.
For example a user could use the vocal cue such as “Okay BB-8,” to prepare the software to take a command. Then, they can command the droid to “look around” or “explore” in order to get the droid to roam freely across the floor or play area. It will accept voice commands like “it’s a trap!” or “run away!” as a signal to turn in the other direction and leave its current area quickly. This allows the user to use the droid with their own imagination, possibly creating scenes of their own to act out. When they want to stop playing or put the droid away, a user can say, “go to sleep” to put the droid in sleep mode.
Below are two 3-minute videos demonstrating how BB-8 and the Star Wars app work as well as how the Force Band controls the droids:
Sphero has raised $119 Million in venture capital and are backed by Grishin Productions and Mercato Partners.
Jon Carroll is CTO at Sphero. He holds an MS in Telecommunications from University of Colorado Boulder. Previously, Carroll served as Sr. Systems Engineer at Infomedia.
Pullstring partnered with Mattel to create and offer Hello Barbie, which it claims can let children have an interactive conversation with the Barbie character using natural language processing.
Hello Barbie connects to a user’s Wi-Fi network using the Hello Barbie mobile app, which allows it to connect to the Pullstring server and begin interactivity. The doll is made to have friendly conversations with users, so a child can ask or answer questions while conversing with the doll. Hello Barbie will ask questions about the user’s friends, family, homework, and career aspirations, and saves a recording of the user’s response to better inform the natural language processing model behind the doll. It will also respond to a user’s answer with continued conversation, such as complimenting their aspiration to become a chef.
Users can also play games with Hello Barbie, such as a story puzzle game where the user chooses how Barbie will write a story. The doll will give the user a story prompt up to a point where deviation can occur, then give the user options to fill in an aspect of the story. For example, if the story was about getting ready for bed after dinner, the doll could stop just before the story states what was for dinner. Then, the doll would give the user the options of “fish,” meat loaf,” or “steak.” The user would then tell Hello Barbie which option they wish to be included in the story, and the doll’s story would include the user’s choice.
We can infer the machine learning model behind the software was trained first on hundreds of thousands of relevant speech requests, such as questions about friends, school, jobs, or Barbie’s interests in various accents and inflections from various types of people. The machine learning algorithm behind the voice recognition system would then transcribe those speech requests into text. Human editors would then correct the transcription and feed the edited text back into the machine learning algorithm. This would have trained the algorithm to recognize and correctly transcribe these speech requests.
Then, the machine learning model would need to be trained on the appropriate responses to certain customer requests. This would have required thousands of text responses to these requests from customer service agents. This text data would have been labeled as responses to user questions or statements and according to the user input that would prompt an individual response. For example, the response “yes, I’m here,” would have been labeled as a response to be used when the user asks, “Hello Barbie, are you there?” The labeled text data would then be run through the software’s machine learning algorithm. This would have trained the algorithm to discern the chains of text that a human might interpret as conversational responses as displayed in a text message.
Below is a short 2-minute video demonstrating how Hello Barbie works:
Pullstring has raised $44.8 Million in venture capital and is backed by Khosla Ventures and Greylock Partners.
Martin Reddy is CTO at Pullstring. He holds a PhD in computer science from the University of Edinburgh. Previously, Reddy served as CEO at Code Reddy.
Block Based Coding Toys
Anki offers a toy robot called Cozmo, which it claims can be an educational tool for children and machine learning students in addition to its entertainment value. Cozmo uses machine vision to recognize and track facial movements of users, other people, dogs, and cats.
The Cozmo robot is capable of numerous entertainment features that children can watch or interact with. For example, a user could program the robot to dance whenever it sees a certain person, or to traverse a maze built from household objects. Users can use their tablet or smartphone to program Cozmo using the Cozmo Code Lab app. The app contains a simplified dashboard for coding the robot which allows children to organize actions or “blocks” in their desired sequence. Then, the Cozmo robot could perform these actions in the order they were “coded.” So if a user wanted to create a simple dance for the robot to perform, they could organize a series of movements in Code Lab and then watch the robot act them out.
If the dance does not play out as the user expected, they can go back to Code Lab and make changes. Children who have never coded before can get started with the starter “sandbox” level of Cozmo Code Lab, and once they become more adept they can move up to the “constructor” level. At this level, users can create more complex experiences with Cozmo, such as a small video game to be played on the robot’s display screen. Cozmo Code Lab uses the programming language Scratch Blocks, which makes it possible to offer both block coding for child users and a full development kit for adult users.
The Cozmo Software Development Kit (SDK) is available in addition to the Code Lab app. The SDK is a Python-based open source development kit that contains robotics and AI functions for both low and high-level programmers. The Cozmo robot’s camera and facial recognition functions allow for SDK users to learn and experiment with machine vision. It also has a wide range of connectivity with other technologies and smart devices, such as Google Assistant and Internet of Things (IoT) devices. For example, an SDK user could program the Cozmo Robot to turn on Hue lights, display and assess data from a Fitbit, or notify the user when their Nest camera detects movement. This allows programmers to explore the utility of Cozmo and learn on a platform with a large amount of free data.
We can infer the machine learning model behind the software was trained on thousands of images showing human, dog, and cat faces from various angles and in various lighting conditions. These images would have been labeled as containing a human, dog, or cat face. These labeled images would then be run through the software’s machine learning algorithm. This would have trained the algorithm to discern the sequences and patterns of 1’s and 0’s that, to the human eye, form the image of the face of a human, dog, or cat as displayed in live footage from a camera. In this case, the camera is the one on the front of the Cozmo robot.
The user could then expose the software to faces that are not labelled using the Cozmo camera. The algorithm behind the software would then be able to discern whether the face belongs to a dog, cat, or human. The system then has this information available to the Cozmo Robot, which allows the robot to react according to what the user programs it to do when it encounters a human, dog, or cat. For example, a user could program the robot to mimic a dog’s bark whenever it “sees” the family dog, and then turn around to avoid further interaction.
Below is a short 1-minute video demonstrating how the Cozmo Code Lab is used to program the Cozmo robot’s actions:
Anki has raised $182 Million in venture capital and is backed by JP Morgan Chase & Co. Mark Palatucci is CTO at Anki. He holds a PhD in Robotics and Machine Learning from Carnegie Mellon. Previously, Palatucci served as a PhD Fellow at Intel.
The Cue robot can be coded by the user to perform certain actions and react to objects in its environment using the mobile app, which enables both block coding and text-based coding for children learning to code. Similar to Anki’s Cozmo robot, the block coding is organized by color-coded blocks that link together in a sequence. Each color corresponds to a type of action, such as rolling in a circle or backing up from a nearby object, and the sequences execute from right to left. Users can toggle the code between this type of block display and traditional text-based code. However, the code itself will be color-coded the same way as the blocks. This is to enable young coding students to see which pieces of written code correspond to their chosen actions. Which could help those transitioning from block coding, a much more simplified method for children, to fully writing out their text code.
The robot also contains a speaker for playing the robot’s voice and “interacting” with the user. There are four avatars to choose from that each has their own voice and attitude, which is reflected in their movements and how they respond in the chat screen of the mobile app. There are sensors of the front and back to monitor the robot’s proximity to walls and other objects.
Central to the Cue robot’s interactivity is the chatbot portion of the mobile app. Users can chat with the avatar they have selected for their robot and play games such as “go fish.” They can also control the physical robot through commanding the chatbot and can likely code new commands for it as well. The chatbot is also capable of creating stories with user input, similar to a Mad Libs or similar fill-in-the-blank game. For example, the chatbot could ask the user for a name for the hero of the story and a celebrity and the chatbot would use those names as characters for the completed story.
We can infer the machine learning model behind the software was trained on hundreds of thousands of text prompts Wonder Workshop created and responses from mock users involving the actions the Cue robot is capable out of the box. These include the fill-in-the-blank storytelling activity and avoiding incoming objects. This text data would have been labeled according to what each command is requesting. The labeled text data would then be run through the software’s machine learning algorithm. This would have trained the algorithm to discern the chains of text that a human might interpret as a written command or user input as displayed in a text message.
A user could then message the chatbot, and the algorithm behind the software would then be able to categorize the message as a command to execute a certain action or a user input for an interactive activity. The chatbot likely then comes up with a confidence interval on how likely it correctly categorized the message. It would also be programmed to take two actions depending on that confidence interval. Above a certain percentage, the chatbot would send an appropriate response to the user. Under a certain percentage, the chatbot would respond with a prompt to repeat or restate the message.
We could not find a demonstration video showing how the software behind Wonder Workshop’s Cue robot works. Wonder Workshop has raised $78.3 Million in venture capital and is backed by CRV and WI Harper Group.
Vikas Gupta is CEO at Wonder Workshop. He holds an MS in computer science from Georgia Institute of Technology. Previously, Gupta served as Head of Consumer Payments at Google.
Header Image Credit: The Verge