Machine Learning in Gaming – Building AIs to Conquer Virtual Worlds

Gautam Narula

Gautam Narula is a machine learning enthusiast, computer science student at Georgia Tech, and published author. He covers algorithm applications and AI use-cases at Emerj.

Machine Learning in Gaming - Building AIs to Conquer Virtual Worlds

In virtual worlds, AIs are getting smarter. The earliest instance of artificial intelligence in games was in 1952, when a lone graduate student in the UK created a rules-based AI that could play a perfect game of tic-tac-toe. Today, teams of researchers are working on—or have already succeeded in—creating AIs that can defeat humans in increasingly complex games.

There are two distinct categories of games that have attracted the interest of computer scientists, software developers, and machine learning researchers: games of perfect information, where all aspects of the game are known to all players at all times; and games of imperfect information, where some aspects of the game are unknown to players. While complex games exist in both categories, this distinction in player knowledge has significant ramifications in how AIs can tackle the games.

Read on to learn how researchers use machine learning in gaming to create ever more powerful AIs, and how researchers hope to leverage the abilities learned from these virtual worlds to conquer other domains. As a fun preview, here’s a demonstration of an artificial neural network called MarI/O, created by former Microsoft engineer Seth Hendrickson, learning how to conquer a level of Super Mario.

Machine Learning in Games of Perfect Information

In chess and Go, both players can see the entire board and the location of every piece on the board. In the Super Smash Bros. video games, combatants see the entire map and the exact location of every other combatant at any given time. Machines excel in these sorts of games and currently (or soon will) overpower even the best humans. In less complex games, modern computers can simply “brute force” every possible sequence of moves. For example, in 2007 a computer scientist proved that a game of checkers, if optimally played by both sides, would always end in a draw.

Of more interest are games where the number of possible positions are too vast for even supercomputers to brute force. The specific techniques vary by game, so we’ll look at a few case studies of machine learning in popular board and video games of perfect information.


In a seminal 1950 paper, computer scientist Claude Shannon demonstrated that a conservative estimate for the number of possible positions in chess exceeded 10^120, which greatly exceeds the number of atoms in the known universe. Even with modern supercomputers, it is impossible to “solve” chess with pure brute force

In 1997, IBM’s supercomputer Deep Blue defeated reigning world chess champion Garry Kasparov in a six-game match, marking the first time a chess AI had bested the top human player. Since then, chess computers have far surpassed human abilities; Stockfish, one of the top chess AIs, has an estimated Elo rating over 3200, nearly 400 points above human #1 Magnus Carlsen (this translates to a ten-fold increase in skill level over Carlsen).

Stockfish uses a combination of brute force and finely-tuned heuristics to compute numeric evaluations for every legal move in a position. To reduce the massive branching options that occur with each move (any given position may have 25 legal moves; each of those may have 25 legal moves, and so on), Stockfish uses a search algorithm called Alpha-beta pruning to weed out bad moves.

This play strength can be optionally augmented with opening databases and endgame tablebases, which are essentially pre-computed moves for the beginning and end phases of the game.

While chess has been a fertile ground for artificial intelligence research in recent years, the top engines have eschewed the traditional machine learning techniques in favor of a hybrid of human and machine learning, combining the intuitive, strategic power of the top humans (encoded in the heuristics) with a computer’s raw computational power.

This synergistic approach is perhaps best modeled in Advanced Chess, a form of chess where computer-aided humans compete against each other at a level of play higher than either humans or computers alone could achieve.


Go, known in Asia as Baduk, is a board game of even greater complexity than chess. According to the American Go Foundation, the number of legal Go positions is on the order of 10^170. The Google DeepMind team paved the way for computer dominance in Go in the past two years with the development of AlphaGo, an AI that uses deep neural networks to learn from its own games (a technique known as deep reinforcement learning) as well as games played by top human players (a classic supervised learning technique). This reinforcement learning highlights the unique strength of both human and machine learning agents.

Humans are fast learners–with a relatively small data set (say, 25 chess games played), they can show significant improvement in accomplishing new tasks. Conversely, computers may require millions, billions, or even trillions of data points before they achieve a comparable skill level, but the ability to play millions of games against themselves and learn from past mistakes overcomes this weakness.

The key difference in AlphaGo’s approach compared to Stockfish and other chess engines is the absence of “lookahead” search algorithms. DeepMind’s efforts culminated in a stunning 4-1 victory over reigning Go Champion Lee Sedol in early 2016.

Elon Musk, who previously invested in DeepMind, noted AlphaGo’s victory was a feat many experts believed was at least a decade away. In January 2017, Google revealed it had been secretly testing an updated version of AlphaGo against top human players online; the revised AlphaGo did not lose a single game.

While chess engines, with their game-specific heuristics, offer minimal insight into other domains, AlphaGo’s neural network-powered dominance could be applied to many other areas. In an interview with Emerj, DeepMind’s Nando de Freitas describes many different applications of deep learning, the primary AI technique employed by AlphaGo. And, in a blog post shortly before the match with Lee Sedol, Google made their ambitions with AlphaGo clear:

We are thrilled to have mastered Go and thus achieved one of the grand challenges of AI. However, the most significant aspect of all this for us is that AlphaGo isn’t just an “expert” system built with hand-crafted rules; instead it uses general machine learning techniques to figure out for itself how to win at Go. While games are the perfect platform for developing and testing AI algorithms quickly and efficiently, ultimately we want to apply these techniques to important real-world problems. Because the methods we’ve used are general-purpose, our hope is that one day they could be extended to help us address some of society’s toughest and most pressing problems, from climate modelling to complex disease analysis.

Super Smash Bros.

Super Smash Bros. (SSB) is a popular hand-to-hand fighting game franchise that features the most popular characters from Nintendo’s empire of games. It has a vibrant competitive scene, including professional players who regularly compete for five figure prizes in tournaments. SSB differs from chess and Go in one key aspect: SSB is a non-turn taking game. All players make actions simultaneously and must anticipate their opponents’ actions while making their own.

While there are many AIs created for SSB (including in-game AIs created by Nintendo), we’ll limit our analysis to a SmashBot, a popular SSB AI whose open-source codebase can give us deeper insights into how it works. SmashBot is a classic example of a rules-based expert system. Essentially, it’s a more sophisticated version of a chain of if-then statements based on a “knowledge base” built into the AI. The codebase Readme document provides a nice summary of how it works:

“SmashBot makes decisions on a 4 tiered hierarchy of objectives: Goals, Strategies, Tactics, and Chains. Each objective inspects the current game state and decides which lower level objective will be best to achieve it.

Goals are the highest level objective, and inform the AI what the intended overall outcome should be. IE: Beating our opponent in a match, or navigating the menu to select our character.

Strategies are the highest level means that the AI will use to accomplish the overall goal. For instance, the SmashBot will typically take the strategy of baiting the opponent into a poor move.

Tactics are lowish level series of predictable circumstances that we can realistically flowchart our way through. For instance, if the enemy if off the stage we may choose to edge guard them to keep them from getting back on.

Chains are the lowest level of objective that consists of a “chain” of button presses that Smashers will recognize, such as Wavedash, Jump-canceled Upsmash, etc…”

These rules, which act on a frame-by-frame basis, are enough to make SmashBot virtually impossible for even the top human players to beat.

The SmashBot AI shields so effectively at a frame-by-frame level that the opponent cannot score a single hit.

Image credit: SmashBot code repository Readme

Despite SmashBot’s overwhelming dominance over humans, its rigid, domain-specific approach isn’t particularly interesting to researchers. SmashBot or Stockfish could never be effective at anything else; AlphaGo’s approach much more closely resembles human thinking, and could be much more easily applied to other areas. For example, DeepMind mastered many different classic Atari games at a human level using deep reinforcement learning, demonstrating that this approach can easily apply to multiple games.

Machine Learning in Games of Imperfect Information

With current or anticipated AI dominance in many games of perfect information, research interest had shifted to games of imperfect information—not only because such games are much harder for machines to master, but also because these games more closely mimic the more challenging scenarios AIs will face in the real world.

The game that has captured the most research interest is StarCraft (SC) and Starcraft II (SC2). Developed by Blizzard Entertainment, SC/SC2 are real-time strategy games that brought professional esports to the mainstream, especially in South Korea; top Korean players have made over half a million dollars in tournament winnings alone.

SC2 is particularly challenging for computers because, in AI parlance, it’s partially observable. A player can only see the parts of the map around the areas occupied by friendly units or buildings. Unlike chess or go, the game’s current state is not determined merely by the previous state and the AI’s actions, but also the actions of the other player(s) simultaneously—decisions and actions must be made quickly and without full knowledge of what the opponent is doing.

SC2 gameplay can be divided into three key components: micro-management (“micro”), macro-management (“macro”), and overall strategy. Micro is the mechanical control and command of individual units; Macro is the broader strategy of ensuring a player is harvesting enough resources to maintain his army, investing in production facilities and unit upgrades, and building new units to replenish units lost in combat.

Overall strategy concerns scouting your opponent, anticipating opponent attacks, planning your own attacks, and choosing which units to build to counter the opponent’s units. This multifaceted skill requirement makes SC2 the most representative game of human intelligence that AIs have tackled yet.

DeepMind acknowledges the unique challenges posed by SC/SC2 in its official announcement:

StarCraft is an interesting testing environment for current AI research because it provides a useful bridge to the messiness of the real-world. The skills required for an agent to progress through the environment and play StarCraft well could ultimately transfer to real-world tasks…An agent that can play StarCraft will need to demonstrate effective use of memory, an ability to plan over a long time, and the capacity to adapt plans based on new information.

View post on

The blue AI (bottom) destroys the superior army of the red AI (top) because of the latter’s units get stuck on the ramp and don’t return fire. Learning strategies for optimal positioning will be one of the most difficult tasks for any AI, because of the many considerations involved: opponent and own unit composition, high ground vs. low ground, and the existing damage (if any) on friendly and enemy units.

Video credit: AI Craft 3 by LifesAGlitchTV

View post on

After destroying the red AI’s entire army, the blue AI inexplicably retreats and allows red to rebuild its army. Even novice human players would realize that pressing the attack would immediately win the game; a human’s reasoning may go: I destroyed all the opponent’s units I could see -> at this stage in the game, my opponent could not possibly have enough hidden units to overpower my army -> if I attack my opponent’s base now, I will destroy all the production facilities and win the game.
We don’t know exactly why the blue AI chose to retreat, but this lack of contextual understanding demonstrates the challenges of building an AI that can handle context-rich environments.

Video credit: AI Craft 3 by LifesAGlitchTV

View post on

The AI flawlessly splits its individual harvesters for optimal resource harvesting. Such a micro-optimization, while having very little effect on the eventual outcome of the game, is beyond the skills of even top level players. Little reasoning ability is required for this maneuver, just fine mechanical control and multitasking abilities–areas where computers thrive over humans.

Video credit:  AI Craft I by LifesAGlitchTV

The announcement also elaborates on Blizzard and Google’s collaboration to create an open, programmatic SC2 platform for any ML researcher to build upon. To ensure that the AI will win through intelligence, rather than faster mechanical speed (computers can programmatically issue commands instantly; humans must physically move a mouse or hit a key on the keyboard), AIs will be limited to the same number of actions per minute that top human players can achieve—typically in the range of 300 actions per minute.

As of now, an amateur SC2 player with a modest amount of experience can beat SC2 AIs without much difficulty. Professional SC2 players are confident that AlphaGo won’t reach human level abilities anytime soon, but it’s worth noting that, until 2015, no AI had ever beaten a professional Go player. One year later, the AI was unstoppable.

AI, Games, and the Future of Intelligence

The introduction of machine learning in gaming promises to have profound effects on both the games themselves and in unrelated areas. Professional chess was transformed by the introduction of computers, which could reveal hidden flaws in previously unquestioned analysis; allow players to instantly find and train with databases of millions of competitors’ past games; and provide new tools to allow the next generation of young players to improve even more rapidly than their predecessors (the most recent world chess championship in November 2016 was the first to feature two players who both came of age in the era of computer training and became grandmasters by the age of 13).

Go may soon undergo a similar metamorphosis. And more importantly, the strategies used to conquer these games may soon allow us to conquer even more complex, if unrelated, domains, such as cancer diagnosis, climate change mitigation, and real-time translation. Virtual worlds may be a fertile training ground for AIs to learn before being released into the real world.

Perhaps the real allure of machine learning in sophisticated games is that such games, to quote Shannon’s paper, are “considered to require ‘thinking’ for skillful play; a solution of this problem will force us either to admit the possibility of a mechanized thinking or to further restrict our concept of ‘thinking’.” It’s clear that the machines can’t brute force their way to victory in messy, imperfect games. In a game as thoroughly human as Starcraft II, will an AI not only play better than a human, but also like a human?

Perhaps, by building machines capable of conquering SC2, we may move closer to the ultimate benchmark of machine intelligence: a machine that passes the Turing Test.


Image credit: Chess Daily News

Stay Ahead of the AI Curve

Discover the critical AI trends and applications that separate winners from losers in the future of business.

Sign up for the 'AI Advantage' newsletter: