Artificial Intelligence in Gestural Interfaces – Possible Near-Term Applications

Ayn de Jesus

Ayn serves as AI Analyst at Emerj - covering artificial intelligence use-cases and trends across industries. She previously held various roles at Accenture.

Artificial Intelligence in Gestural Interfaces - Possible Near-Term Applications

Gesture-based interfaces are applications that allow users to control devices using hand and other body parts. Today, they are found in devices used in home automation, shopping, consumer electronics, virtual reality and augmented reality gaming, navigation, and driving, among others.

A study reported that the global gesture recognition in the retail market is projected to grow by 27.54 percent from 2018 to 2023. To date, some of the top producers of gestural interface products include Intel, Apple, Microsoft, and Google.

According to research titled Hand Gesture Recognition Using Computer Vision, gesture recognition is done in two ways: data glove sensor devices that transform hand and finger motions into digital data, and computer vision which uses a camera. The second method may let humans interact more naturally with machines because it leaves their hands free to move. Computer vision will be the focus of this research, specifically as it relates to:

  • Home Automation
  • Healthcare
  • Automotive
  • Virtual Reality

How does the technology work? Gesture-recognition consists of three layers: Detection, tracking, and recognition:

  • Detection extracts the visual data produced by hand or body movement within the view of the camera.
  • Tracking monitors data frame-by-frame to ensure each movement is captured to make the data analysis and interpretation more accurate.
  • Recognition groups the extracted data to find patterns. Based on the algorithms’ training, it could find a match and determine the kind of gesture that has just been performed. Once the gesture is identified, the system then performs the intended action.

With this research, we hope to give insights to business leaders that are looking into the implementation of gestural recognition tools.



GestureTek offers applications equipped with hand and body-tracking software, motion-sensing display surfaces, and virtual technologies that use computer vision. The company claims that both its touch and touchless gesture-recognition applications can be embedded into a variety of electronic hardware such as toys, games or electronic devices.

Below is a video of how the GestureTek uses its IREX software for health to help rehabilitate physical medicine patients regain their range of motion:

As seen above, patients are guided by an avatar to perform an exercise, starting from a low range and increasing to a high range of motion. The demo shows that the motion is later applied to a game-like scenario to add engagement to the activity.

The Alberta Children’s Hospital needed a system that could assist its therapists in the rehabilitation and disease management of young patients.

GestureTek says it offered IREX (Interactive Rehabilitation and Exercise System), which uses green-screen technology to immerse patients in a virtual sporting or gaming environments, such as mountain climbing and snowboarding, as seen in the video below:

The interactive activities were prescribed by an Alberta Health Services physical or occupational therapist to build balance, mobility and endurance. The company claims that the system can track the movements of the patients over the course of the therapy and treatment, allowing health professionals to see what improvements have been made.

Customized games enable the health facilities to create unique programs per patient using the more than 20 virtual environments, according to the case study. For instance, the software can be programmed to work with a thumb or with the entire body after a traumatic brain injury.

In a related report, the hospital initially said the patients below 7 years old usually required sedation or watched a movie or cartoon during radiation therapy treatments, which could last up to 30 minutes.

Within one year of using IREX, the hospital claims that it had eliminated sedation for five out of eight children between four to seven years old. One family reported that treatment times have been reduced and more conveniently scheduled for the family.

The company also claims to serve the Atlantic Rehabilitation Institute, Beth Abraham Health Services, Kenny Kids Rehabilitation Program, St. John North Shore Hospital, University of Haifa – Department of Occupational Therapy.

Aside from healthcare, the company offers its applications to industries such as retail, broadcast, and restaurants, among others, and claims clients such as Sony, Microsoft, Qualcomm, Disney, and Cisco.

We were unable to find any C-level executives with AI experience on the company’s team, but the company has raised $29.3 million in three rounds of funding and is backed by Telefonica Ventures.

Home Automation

EyeSight Mobile Technologies

EyeSight Technologies develops business and consumer computer vision software for home automation devices that enable

The company claims that these applications are both passive and active sensing. Passive sensing is triggered when the computer vision detects the user’s presence while active sensing is activated through touch-free gestures to control smart home devices.

For gesture recognition, the company claims that its touch-free software is capable of finger tracking, hand tracking and recognizing universal hand signs (wave, shush, etc.), and hand swipes. The computer vision application is also capable of face detection, face identification, face count, gender detection, age estimation, and presence detection.

In home automation, for instance, an individual triggers the computer vision when she walks into a room, causing the system to turn on the lights, and adjust the room temperature. According to the company, the application’s facial analysis algorithms map the facial features of specific family members to activate certain experiences that each family member likes, based on the behavior history.

For instance, if the family member has turned the room temperature to a certain level in the past, the system will adjust the thermostat when that person enters the room.

The 2-minute video below shows how EyeSight’s gesture recognition software can be used with Lenovo computers:

Aside from home automation, the company also develops gesture-based software for automotive, viewer analytics, and electronics clients.

The company has not made case studies available, but a press statement in March 2018 reports that it has partnered with Sony Mobile to equip the interactive projector Xperia Touch with computer vision sensing technology for both touch and touchless interactions.

EyeSight claims that users will be able to directly control the content projected by the device from afar using hand gestures, with the software embedded into the device’s existing built-in camera.

Other listed clients are Samsung, SEAT, Jabil, Soling. The company was recognized by Frost & Sullivan for its innovations in embedded computer vision solutions for cars to address the need for distraction-free driving.

We could not find evidence of any C-level executives on the team with robust AI experience, but Tamir Anavi serves as the Core Technologies and Innovation director at EyeSight Technologies, starting from senior algorithm engineer in 2009. Prior to Eyesight, he was an algorithm developer at Applied Materials and a materials lab instructor at Ruppin Academic Center. He graduated with a Bachelor’s degree in Electronics Engineering, focusing on computer vision.

The company has raised $30.9M in funding and is backed by MaC GP, Mitsui Global Investment, CEVA, and Kuang-Chi Science.


Gestoos offers hand tracking and gesture-recognition applications for consumer electronics used in home automation, digital signages for retail, and automotive use cases. Data collected from the interactions such as user behavior and content are also gathered to continue training the application’s foundational computer vision algorithms.

For external developers, the company offers a software development kit which it claims can create custom gestural applications for Windows, Mac, Linux, Android and Linux ARM devices. The common features the SDK offers include hand and body gesture recognition, and hand tracking.

Gestoos claims that its application works with most depth cameras available on the market, but the company website specifies the Orbbec, Occipital Structure, Asus, and PMD brands.

On the website, the company reports that its home automation application’s gesture recognition technology has the capability to control home lighting systems, adjust the volume or mute the sound of audio systems, and change tracks in a playlist.

The gestures can be created and assigned to the user’s connected tablet or smartphone where the application resides, with each movement equivalent to a command. The company also claims that one gesture can be used to control several devices.

The company did not have a video demo specific to home automation although one video demonstrated how the Gestoos application worked in digital signages. The system can be programmed with custom gestures such as pointing, waving, swiping, picking up, and dropping, each of which translates to a specific command.

As a user makes gestures, the computer vision-equipped camera collects the gestural data, which is analyzed and interpreted by the algorithms to recognize and perform that corresponding command.

As seen in the 2-minute video below, the application’s facial recognition algorithms also recognize gender and estimated age so that it presents products to the correct demographic.

Marcel Alcoverro is the CTO at Gestoos. He obtained his Doctorate in Telecommunications Engineering from the Universitat de Politecnica de Catalunya. Prior to joining Gestoos, he founded his own company, Fezoo Labs, which focused on computer vision, gesture recognition, and machine learning.

Gestoos has raised about $3.3 million in funding but has not yet listed any case studies or marquis clients.

Automotive Control and Safety

Sony DepthSensing Solutions

Sony DepthSensing Solutions, originally SoftKinetic until it was acquired by Sony in 2016, offers a gesture recognition application for the automotive industry.

Sony claims the technology features time-of-flight, which measures the time it takes for gestural data to travel from the source of light, in this instance the infrared sensor to the object and back. This enables the computer vision technology to respond more quickly as it recognizes the intent of the driver or other car occupants and triggers the action quickly.

In the 4-minute video below, the company claims that the application, through hand-tracking sensors and gesture recognition, gives the driver control over the in-car infotainment system, which combines entertainment and information devices such as the audio or video player, in-car phone, and air conditioning:

Sony says the application enables drivers and passengers to interact with the in-car infotainment system using hand gestures to adjust the volume of the audio system, reject or accept a call coming through the in-car phone, as well as regulate the temperature of the car air conditioning system.

The company further claims that the algorithms are trained to recognize the main gestures and is able to disregard other unnecessary gestural noise in the interaction area such as car vibrations. It also has the ability to operate under any lighting condition, the company says. As seen in the video, the system operates with pointing, swiping, and circling gestures, but other customized movements can be programmed as well.

Externally, the company claims that the system also detects the presence and movements of pedestrians, nearby cars, bicycles and other hazards.

Sony’s depth-sensing technology is fitted into all 2017 BMW 5 and BMW 7 series vehicles, a well as Vrvana headsets and ClaXon robots, although the company has not released a case study.

Daniel Van Nieuwenhove, President of Sony DepthSensing Solutions, co-founded SoftKinetic Sensors in 2009. He holds a doctorate degree in Microelectronics and Master’s degree in applied sciences and engineering, electronics and IT engineering from Vrije Universiteit Brussel. Earlier in his career, he served as CTO in Optrima.

As a research assistant at Vrije, he worked with CMOS (complementary metal-oxide-semiconductor) circuits and devices for 3D time-of-flight imagers.

Virtual Reality

Mano Motion

Mano Motion has developed a computer vision application that tracks and recognizes hand gestures in 3D using Android and iOS smartphone cameras. The company claims that the application can be used in augmented and mixed-reality environments for games, Internet of Things devices, consumer electronics, robots, and vehicle systems.

The 1-minute video below demonstrates how the application embedded on a smartphone can recognize finger and hand movements to perform commands to move virtual objects on screen:

According to the company, the software determines the depth that is accurate up to one centimeter and currently recognizes 2 million hand gestures in real-time such as swiping, clicking, and grabbing.

The company also offers software development kits to other companies that would like to integrate the application into their products.

The company has not made any case study available. In a press statement, however, ManoMotion in April 2018 announced the integration of its gesture analysis application into the PMD Pico Flexx virtual reality (VR) cardboard headset. With this integration, the company claims its software can work with any VR or augmented reality hardware.

In 2018, the Sweden-based company plans to set up offices in Palo Alto, Hong Kong, and Shanghai which will conduct sales and marketing initiatives. It also aims to recruit talent from Stanford University. By the end of 2018, the company expects to have hired about 30 people.

Shahrouz Yousefi is the Co-founder and CTO of ManoMotion. He holds a Ph.D. in Media Technology, a Master’s degree in Robotics and Control, and a Licentiate degree in Media Signal Processing. Prior to ManoMotion, he worked at Linnaeus University and KTH Royal Institute of Technology as a researcher. He also worked in 3D motion analysis and interaction design at the Center for Biomedical Engineering, Umea University.

Takeways for Business Leaders

In our research, we note that excluding the global leading companies, a number of computer vision and gesture recognition companies were established only in the past three years. As well, we noticed that a good number of companies offer diverse origins outside of the United States: ManoMOtion is based in Sweden, Gestoos in Spain, EyeSight Mobile Technologies in Israel, and Softkinetic (now Sony DepthSensing Solutions) in Belgium.

The long-established companies offer the technology to a variety of industries including retail, healthcare, defense, and robotics. The startups focus on specific use cases such as consumer electronics for home automation, automotive and mixed reality. Because of the relative youth of these companies, few have case studies. As well, AI experts with robust and long experience were not apparent among the C-level executives.

In terms of features, the gestural interfaces enable users to have a more natural interaction with the machines. In home automation specifically, the use of gestural interfaces offer convenience and a seamless flow for homeowners as the devices anticipate their needs. Facial recognition – also a subset of computer vision – adds a layer of security in the home.

In automotive, gestural interfaces allow drivers to their keep attention on the road. The addition of external sensing capabilities enables better safety for the car occupants and the people around them.

In general, the companies covered claim that the applications are programmable with gestures that the user is more comfortable performing.

However, it is not clear if the applications can recognize gestures from multiple persons at one time. This is one challenge companies face: recognizing synchronous gestures from multiple persons and is the subject of some research efforts by Microsoft and the Institute of Electrical and Electronics Engineers. Sony DepthSensing, claims to have found a way for algorithms to remove “noise” from the interaction area. However, it is not clear if this is the same context as synchronous gestures from multiple persons.

It is worth noting that Microsoft Kinect gesture recognition equipped camera, whose first and second generations were discontinued, received interest as a device for healthcare such as monitoring, screening, and rehabilitation.

Its third-generation depth-sensing capabilities are now present in the HoloLens virtual reality headset. For the fourth-generation version of Kinect, Microsoft and Intel partnered to make the Kinect technology available through Intel’s RealSense depth cameras.

In the future, gestural interfaces could potentially change the way consumers interact not just with television sets, but with other technologies as well.

Daniel Simpkins, CEO of Hillcrest Labs, which provides motion-sensing technology for LG and Logitech, says that gestural applications enable consumers to be more comfortable with technology because “it gives familiarity to people as they move from a world where they just push buttons on a remote.”


Header image credit: Business Korea

Stay Ahead of the AI Curve

Discover the critical AI trends and applications that separate winners from losers in the future of business.

Sign up for the 'AI Advantage' newsletter: