Yelp’s Using Image Search to Change How It Finds You a Bar
Frances Haugen was part of the first wave of people to use Google back in 1996. Her mother, a faculty member at the University of Iowa1, showed her the search engine, which was still a research project at Stanford University. Haugen was blown away at what Larry Page and Sergey Brin had built. “The idea that you could actually peer into a giant mountain of data was amazing,” she says.
Haugen has been obsessed with search technology ever since. She landed a job at Google after college and spent several years working there, first as an engineer and later as a product manager. Now she works for Yelp. You might not think of the sprawling review site as a search company, but search really is at the core of what Yelp does. You don’t just want a list of the top ranked restaurants nearby; you want restaurants near you that serve cronuts, or have a great view, or allow dogs, or are good for birthday parties—or maybe all four.
But the written reviews and descriptions alone that users post about a place may not contain all those details. Much of that useful information is likely locked up in the millions of photos that users have uploaded. A picture of a dog eating a cronut with birthday candles in it and the Manhattan skyline behind it is a pretty good indication that the restaurant it was taken in meets your requirements. There are other, less trivial examples of why this would be useful, as well.
“My neighbor’s wife is in a wheelchair,” Haugen says. “He used to look through hundreds of photos trying to see what the inside looks like so he could find out if it would be wheelchair friendly.”
If Yelp’s computers could index which photos have wheelchairs in them, the company could offer a more educated guess as to which ones are the most accessible. Yelp is still a long way from being able to do something like that. The hard part, obviously, is teaching computers actually recognize what’s in those photos. But Haugen and her team have started building on the foundations of an image recognition system at the company that could completely change the way they do search.
Finding the Best
To get started, Yelp’s first image recognition project isn’t actually focused on search but on surfacing the best photos taken at different locations. You see a handful of photos at the top of every Yelp entry, and these photos form your first impression of a business. What Haugen and her team have set out to do is find a way to automatically determine the perfect selection of images to give users the best sense of what that business has to offer.
“We’re trying to figure how can we bubble up the the best photos, the photo that’s going to make you take that risk,” she says. “The photo that going to let you go to that new hair stylist, or let you pick that wedding venue, or pick the restaurant to take a friend out to their birthday dinner.”
That means Yelp needs a way to tell the difference between a photo of a mouthwatering steak and a blurry, drunken selfie. The obvious way to deal with this would be to rely on captions, but many photos uploaded to the site either don’t have captions or have a caption that simply says something like “amazing.”
Alternately, Yelp could try to have users rate photos and only display the three best. But that might not provide enough diversity. When you’re looking at the entry for Jim Bob’s Steak House, you probably don’t want to see three different photos of steak, no matter how well composed those shots are. You’ll also want to see a photo of the fully-loaded baked potato with Gummy Bears on it, and the mechanical bull out front. Short of hiring people to filter through every single photo on the site and decide which ones to use, Yelp needs a way to teach computers to recognize what’s actually in those pictures.
Yelp obviously isn’t the first company to deal with this challenge. Google and Facebook—not to mention law enforcement and spy agencies—have been working on facial recognition for years. A startup called Orbital Insight has been working to estimate the amount of oil left in reserves and to spot illegal deforestation by analyzing photos taken from space. Much like Yelp, a travel guide startup called Jetpac, which was acquired Google last year, had the idea of analyzing photos to determine which bars and restaurants were, say, dog-friendly. What almost all of these efforts have in common is a branch of artificial intelligence called “deep learning,” which aims to make machines smarter by drawing inspiration from the structure of the human brain.
Tech’s giants have dominated the deep learning field in recent years. Google and Facebook have hired some of the academic pioneers of the field and acquired several startups to bring their expertise in-house. Microsoft, meanwhile, turned to deep learning to build Skype Translate. But the tech giants don’t have a monopoly on artificial intelligence. Because so much of the foundational research is public, companies like Yelp are able to take advantage of deep learning, too.
To get its system up and running, the Yelp engineering team used a piece of open source software called Caffe to build a neural network—a piece of software inspired by the connections between neurons in the human brain—based on a paper by some of the pioneers of deep learning . But software can’t do all the work. In order to recognize an object, whether that’s a cat or a cupcake or a Volkswagen Bug, the algorithms much be ‘trained’ by humans. For that, Yelp hired people through the crowd sourcing site Crowdflower to help label a large number of photos.
Yelp’s initial deep learning efforts have been focused on grouping restaurant photos into four categories: pictures of food, interior views, exterior views and pictures of menus. But eventually Haugen hopes all the data discovered through this process will eventually find its way into Yelp’s search functionality. In the meantime, Haugen is learning a lot about what types of photos people click on most often. Linearity works well, such as a photo of three cups of coffee lined up in a row. Smiles are always good, as is the color blue. And low angles are best. “If you’re going to take a picture,” she says, “you should get down on your dinner’s level.”
1Correction 10/19/2015 at 13:11 AM ET: An earlier version of this story said Haugen’s mother was a professor at Stanford University. It was actually University of Iowa.
Continue reading –