AI spends 7,000 hours beating Pokemon Red's first gym, but still can't find the second one after 50,000 hours

 Pokemon Red.
Pokemon Red.

One programmer has given an AI model 50,000 hours worth of training in how to play Pokemon Red, leading to an algorithm that's capable of exploring the game and building a team to defeat the first gym leader - but not one that can find its way through Mt. Moon or know better than to keep buying Magikarp. Most of all, this exercise is a fascinating way to get an idea of how machine learning actually works.

As outlined in an extensive video by Peter Whidden, the AI is able to interact with the game through the usual control inputs on an emulator. It hits a button and looks at the screen to see what happened, the same as a human player. Whidden set learning sessions at two hours worth of game time apiece, though with emulation sped up those sessions could be completed in around six minutes of real-time - and the process was further sped up by running 40 testing sessions simultaneously.

Since a machine algorithm doesn't inherently care about beating a video game, Whidden set up particular goals for the AI to be rewarded for. To encourage curious exploration, the AI got a reward point whenever it saw something new, as measured by noticeably different pixels appearing on-screen. That has some unintended consequences - the AI would just stare, fascinated, at the slight animation of water, for example - but it broadly served to get the computer motivated to make it from Pallet Town through Viridian Forest and up to Pewter City, where the first gym battle against Brock takes place.

The AI needs further rewards and punishments, too. With rewards all tied up in seeing new things, the AI just wants to keep moving forward, which means it doesn't care about fighting battles or catching Pokemon, so it initially just ran away from every encounter. So Whidden added a system where the AI is rewarded based on the total level of its active Pokemon party.

That worked to keep the AI fighting for XP and catching Pokemon, but it had an unintended consequence, too. When the AI went to a Pokemon Center, it interacted with the PC there and deposited a few Pokemon. That dramatically dropped the total level of the party, ripping away a mass of reward points all at once. That was roughly equivalent to a traumatic experience for the AI, causing it to avoid Pokemon Centers altogether - thus refusing to heal the party until Whidden tweaked the reward systems again.

Since the AI essentially keeps doing things at random until it manages to figure out something that'll get it reward points, the fight against Brock proved to be a particular issue since you need to take advantage of his rock-type Pokemon's elemental weaknesses to do any real damage against them. It's only by virtue of one particular iteration where the AI's Squirtle happened to be out of PP for every move except Bubblebeam that the algorithm managed to pick up on how to beat the gym.

Yet while the AI is bad at figuring out things that might come pretty naturally to human players, it pretty quickly learns other, much more esoteric things. Whidden realized at a certain point that the algorithm would always plot a very specific, seemingly nonsensical path from Pallet Town up until the first encounter with a wild Pokemon. That seemed weird until it became clear that this precise series of inputs guaranteed that the wild Pokemon could be captured with a single throw of a Pokeball. Yes, the AI spontaneously learned the very art of RNG manipulation that speedrunners spend years developing.

Beating Brock made for a pretty natural end goal for the project, but Whidden did let the AI run longer to see what would happen, and it did make it deep into Mt. Moon - but the dungeon's dank, samey passages were so off-putting to the AI that it was never able to find its way to the other side, so it was never able to find the second gym at Cerulean City.

One thing the AI did love, however, was buying Magikarp. The shady guy who sells you the worst Pokemon of all time at a ridiculous markup is pretty much a joke at this point, but for the AI, buying that Magikarp is a quick way to get five more levels worth of Pokemon in its party - the best deal in the game! Apparently, the AI bought that Magikarp over 10,000 times.

Oh, and for one last anecdote about the magic of a computer doing random things: at one point, the AI captured a Rattata and named the Pokemon 'AI.' Sometimes these things work out just a little too perfectly.

AI-generated art and writing is extremely controversial, but some veteran developers believe that in the game industry, "the money is still going to drive absolutely everybody" to make use of machine learning.