New Tool Analyzes a Video’s Sound for Better Search Results
How do you find videos on the Internet? You type in some words. The trouble is that videos aren’t words. They’re moving pictures. Or rather, they’re a blend of moving pictures and sound.
David Luan, the co-founder of a company called Dextro, is among those working to improve online video searches by building systems that find videos not just by looking for embedded keywords tags, but by automatically analyzing both pictures and sound. “This moves us closer to making web video easily searchable and discoverable,” Luan says, “that depends on going beyond metatags and really understanding what a video is all about.”
He calls this system “Site, Sound, and Motion,” and you can see a demo on the company website. Basically, it sucks in videos that random people have posted to Twitter and gives you a way of searching through them. You can, say, search for all the Donald Trump videos, and this will bring up a video when Trump turns up only when someone asks Lady Gaga if she’s a Donald fan.
The tool is not meant as a consumer service, but Luan and Dextro will offer the technology to other businesses interested in offering video search tools on their own apps and sites. Dextro already offers similar services: earlier this year, it showed off a tool for finding feeds streaming across Periscope, Twitter’s real-time video broadcasting tool. The difference with the new tool is that analyzes sound as well as the images.
“We’re handling what’s spoken on screen as well as the motion,” Luan says, “putting them into one model that shows what a video is all about.”
The tool is part of a widespread movement to automatically identify images, recognize sound, and even understand natural language using a breed of artificial intelligence called deep learning. With deep learning, large networks of machines—known as neural networks—learn to perform tasks by analyzing enormous amounts of data. Dextro’s system learns by analyzing large numbers of videos.
But it also uses other techniques to identify sound in videos. It works not just to recognize speech, but to a certain extent, understand the ideas behind what it said. “We try to extract the most interesting concepts and topics that are coming out of everything happening on screen,” Luan says, though he declined to explain the particulars. But the upshot is that system doesn’t just analyze sound. It doesn’t just analyze images. It analyzes both to extract the most meaning.