Teaching AI to Play Atari Will Help Robots Make Sense of Our World
Google is teaching machines to play Atari games like Space Invaders, Video Pinball, and Breakout. And they’re getting pretty good.
At DeepMind, a Google subsidiary based in Cambridge, England, researchers have built artificial intelligence software that’s so adept at these classic games, it can sometimes beat a human player—and a professional, at that. This may seem like a frivolous, if intriguing, pursuit. But it’s a step toward something bigger. If a machine can learn to navigate the digital world of a video game, Google says, it eventually could learn to navigate the real world, too. Today, this AI can play Space Invaders. Tomorrow, it could control the robots that will build our gadgets and toys, and the autonomous cars that will drive from place to place entirely on their own.
Google isn’t the only one with this vision of AI leaping from games to reality. Backed by $3.3 million in funding from big names like Peter Thiel and Jerry Yang, a new startup called Osaro is pushing in the same direction. In an echo of DeepMind, Osaro has built an AI engine that can play classic games. But the company’s ultimate aim is to offer this technology as a way of driving the next generation of robots used in warehouses and factories. Much like humans, it gets better through practice. “Think about kids. They learn a lot through trial and error,” says Osaro founder and CEO Itamar Arel. “They come to understand what maximizes pleasure and minimizes pain.”
First Games, Then the World
Like DeepMind’s technology, Osaro’s AI engine is based on deep neural networks, the same basic tech that helps identify photos, recognize speech, and translate from one language to another inside Google, Facebook, Microsoft, and other tech giants. And like DeepMind, Osaro applies a second breed of AI called reinforcement learning—algorithms that help machines conquer tasks through repeated trial and error. Deep learning has proven remarkably adept at tasks of perception. If you feed enough photos into a neural net—a network of machines that approximate the web of neurons in the brain—it can learn to identify everything in that photo. In much the same way, it can grasp the current “state” of a video game. But reinforcement learning can take things further still. It lets machines take actions based on what they’ve perceived.
After a neural net grasps the state of a video game, reinforcement learning can use this information to help a machine decide what move to make next. Similarly, after a neural net provides a “picture” of the world around a robot, reinforcement algorithms can help it perform a particular task in that environment. Chris Nicholson, founder of AI startup Skymind, says the combination of these two technologies will push AI beyond online services like Google and into the real world. “Navigating a game space is the first step towards navigating the real world,” Nicholson says.
That’s certainly the plan at Osaro. Led by Arel, a former computer science professor who helped build a company that applied deep neural nets to financial trading, Osaro is testing its tech with robot simulators such as Gazebo, a tool overseen by the nonprofit Open Source Robotics Foundation. Such simulators are another stepping stone toward a time when AI drives factories and warehouses. First games. Then game-like robotic simulators. Then robots.
A System of Rewards
To help machines understand the state of a game—“where’s my player, where’s the ball, where’s the other player,” Arel says—Osaro is using recurrent neural networks. These are, essentially, neural nets that exhibit a kind of short-term memory. They can better understand the state of a game based on how it looked in the recent past. “You can’t really tell what’s going on in a game just by looking at a single frame,” Arel says. “You need to look at a sequence of frames to know if, say, a ball is going left or right, if it’s accelerating or decelerating.”
Then Oraro’s reinforcement algorithms can act on what the neural nets perceive. If neural nets mimic the web of neurons in the neural cortex—the portion of the brain that builds our view of the world—reinforcement algorithms mimic the neurons in the basal ganglia, which helps control our movements and learn our habits. Just as these neurons release dopamine when you do something positive—something that works—reinforcement learning operates on a similar reward system. “Dopamine is a signal that indicates whether something is good. It helps you move from one state to another based on what works,” Arel says. “The signals involved in reinforcement are similar.”
In other words, if a machine’s move results in a higher score—the digital dopamine—it will adjust its behavior accordingly. “Each decision—whether to take action one versus action two—is driven by rewards,” Arel explains. “In a game environment, the rewards are points. The system tries to maximize points.” If it attempt enough moves, processing them across tens or even hundreds of machines, the system can learn to play the game on par with a human. The name Osaro is a nod to this process. It’s short for Observation, State inference, Action, Reward, and—as the loop continues—Observation.
These systems are a long way from real human thought. As OSRF’s Nate Koenig points out, navigating a robot through the real world is significantly more difficult than navigating a bunch of bits through Space Invaders. “Games live in a very strict world. There are rules that define a very small space,” he says. “If you’re going to teach a robot something, you might have to take into account that a bird might fly in front of it or a baby will get in its way.”
Still, the ideas at the heart of Osaro are promising. Though the real world is more complex than a game, we often tackle its challenges in similar ways. With Osaro reinforcement algorithms, the rewards may come when a robot picks up an object and puts it in the right place. And those rewards might be taken away when it drops the thing. It’s not an exact reproduction of the human brain. But as Arel says: “It’s bio-inspired.”
View original post here –