MIT Researchers Want to Teach Robots How to Wash Dishes
The robots arrived years ago. They help build stuff in factories. They shuttle packages and products across the massive warehouses that drive Amazon’s worldwide retail operation. And so much more. But Ilker Yildirim envisions a robot that can operate with a bit more subtlety, a bot that needn’t operate according to pre-programmed movements. This machine could respond to changes in its environment, much like humans do, and predict what will happen when one action is chosen over another. He envisions a robot that can do your dishes.
That’s a harder task than you might think. It involves predicting what will happen when you stack one dish on top of another; when you put it under the kitchen faucet; when you place it your dish washer. We humans do this intuitively, and Yildirim aims to duplicate this kind of intuition with hardware and software.
Yildirim is a post doc associated with MIT’s Brain and Cognitive Science Department and its Computer Science and Artificial Intelligence Lab, or CSAIL. Together with others at MIT, he recently published a research paper describing an artificially intelligent system that can predict how objects will move in certain situations. Will an object fall when placed on another? Will it slide when placed on a ramp? In some cases, the system can predict these movements as well as humans. Yildirim sees this as a stepping stone to a new breed a robot, including machines that could do your dishes.
“These won’t be manufacturing robots, which have a pretty finely defined set of actions that they need to take over and over again,” he says. “These are robots that must deal with uncertainty. If a robot places dishes in a dishwasher, it must understand the subtleties of how they stack on top of each other. It must know if it will topple them if it takes a certain action. It must deeply understand its physical environments.”
This work is part of a broader effort to give machines this kind of understanding. In the fall, during an event with a small group of reporters at the company’s headquarters in Menlo Park, California, Facebook Chief Technology Officer Mike Schroepfer showed off a similar system built by the company’s AI researchers. Given an image of several stacked blocks, the system could predict whether the stack would fall or not. As Schroepfer quipped: Facebook is teaching its machines to play Jenga. But this is more than mere game playing. It’s a step towards not only the future of Internet services like Facebook, but, as Yildirim explains, a new kind of robot.
Both the Facebook and MIT experiments rely on deep neural networks—networks of hardware and software that approximate the web of neurons in the human brain. If you feed enough photos of a car in these neural nets, they can learn to identify a car. If you feed them enough spoken words, they can learn to recognize what you say. If you feed them a bunch of computer malware, they can learn to identify a virus. But there are so many other possibilities.
Yildirim and his colleagues start with videos that show all kinds of objects moving and colliding in various ways. But the researchers also use a 3D physics engine—called Bullet—which lets them build digital simulations of such events, simulations that model the physics of the objects. These models can determine how the objects will behave, right down to the speed they will travel. The researchers then feed both of these datasets—the videos and the simulations—into a deep neural net. After analyzing enough data, it can learn to recognize objects, infer their physical makeup, and then predict how they will behave.
This system combines two types of AI—physics simulation and deep learning—and both are necessary. Sure, on its own, a physics simulation can predict movements without fail. But you must program it for each particular scenario. The trick here is that if you feed many scenarios into a deep neural net—providing both the visual imagery and the physics—the system can learn to analyze situations it has never seen before. Even if shown just a few static frames of scene, Yildirim says, the system can estimate the mass of the objects and the frictions and reliably predict what will happen.
Among other things, the project shows that AI often involves a combination of various technologies. At the moment, the press has heaped a huge amount of attention on deep learning. But there are so many other forms of AI, and they can often achieve new results by working in tandem. Yildirim and his team have pitted their system against real humans, having each predict the outcome of certain events, and the AI can hold its own. “The system is similar to humans, in terms of average performance and the kinds of errors we’re making,” he says. You’re still a long way from your own dish-washing robotic house servant. But you’re not as far away as you were.