It’s Really, Really Hard to Turn Speech Into GIFs
The perfect GIF is a difficult thing to find. In an Internet rife with search options, there are a seemingly infinite number of options. But choosing a GIF that animates exactly what you want it to? More difficult. But a new web app called Giftawk will quite literally translate your requests into GIF form.
The browser-based app simply needs access to your microphone, and then asks you to speak into it. Say anything you want, and boom—it services up a GIF or GIFs that act out your speech. (For the record, my own GIF maker turned these GIFs into weird pop art—they don’t actually look like that.)
Things get more complicated, though, when you try full sentences. Developer Adam Lusted created the Chrome web app uses Chrome’s speech recognition API. “I then split the phrase up and fetched relevant GIFs for each word from the Giphy API,” he says. “It was relatively easy to make. I was unsure how well it would work because the speech recognition API is a bleeding edge technology.”
While Giftawk is one of the first speech-to-GIF translators, it isn’t the only app trying to link language and animated images. Last year, the MIT Media Lab introduced GIFGIF, a multi-purpose GIF service that offers a variety of ways to take common human communication and turn it into GIFs. There’s Text to GIF (type something in, get a GIF out) and Face to GIF (the camera uses facial analysis to register a GIF). In addition to being objectively fun, GIFGIF wanted to use the data it received to help assign emotional assessment to GIFs, furthering how we might unilaterally understand them (or not).
While the team wanted to see how different countries assigned emotion to GIFs, there were too many Americans using the voting system to make such a determination. Still, Travis Rich, who works at MIT Media Lab on the project, says they were able to make some observations.
“Anecdotally, the thing I noticed most frequently was the difference in demographics,” Rich says. “Many older people I talked to or demoed the work to just couldn’t fathom that people were really, truly, honestly using GIFs in their emails or messaging clients. They understood emoji just fine, but couldn’t see how GIFs carried the same ambiguity.”
For example, he says these people related a dancing Will Smith Gif as meaning “Will Smith,” not “generic sense of happiness.”
GIFGIF also turned to the Giphy API to power its apps—and were more than happy with the decision.
“We were always concerned they’d throttle or cut off our API key due to high loads, but everything kept chugging along smoothly,” says Rich. “The cost of hosting, serving, and managing GIFs at the scale they do is astounding, and the scales they offer are really unmatched right now.” Giftalk also relies on the Giphy API.
And this is exactly what Giphy wants to see. “Our goal with the Giphy API is to make it dead simple for developers to incorporate the power of our search and sharing features right inside of their app,” says Nam Ngyuen, who’s in charge on Giphy API integration. He points out a few similar projects using the Giphy API, like Gifline (which translates GIFs inside Gmail) and ZZZine (which turns your top tweets into GIFs). (There’s plenty more where that came from at Giphy Labs.)
Heart Emoji vs Head Emoji
The business of translating text and speech into Internet-preferred mediums isn’t a GIF-only operation. Emoji have also dominated this landscape—there are a myriad of emoji translation services available. Maybe there’s something about emoji that just makes them easier to decipher; they are a more compact language format, there’s less going on. Perhaps written and spoken language is more ably identified by Unicode.
“Generally speaking, I think emoji are more composable, meaning they can be composed into more complex meanings than a single GIF,” says Fred Benenson, who is responsible for emoji translations including part of a White House big data report and Moby Dick (his version is called Emoji Dick). He doesn’t discount GIFs, though: “Good GIFs have the advantage of being incredibly specific while being very versatile,” offering this strange moment in television history as being able to “convey a funny, weird reaction.”
“In terms of translation purposes, if Google were to extend its image recognition and tagging system to identify GIFs, it could pair well with language,” Benenson says. “That’s a lot of hard work, however, and emoji can probably do better approximating words or concepts in a more straightforward way.” While emoji might be more easily interpreted, that doesn’t mean we’ve reached universal understanding.
A recent study from One Hour Translation found that depending on location and language, a person will read strings of emoji text very differently. One person’s “they found him and he ran away” (finger point emoji followed by wind blowing emoji) is another person’s “you farted!”
A Problem for Machines
Rich thinks that creating a capable GIF translator has less to do with developing the app than furthering understanding how we interpret the images themselves.
“‘I love carrots,’ ‘I love Jesus,’ and ‘I love Kate’ carry such incredibly different meanings despite the fact that the three sentences are nearly identical,” he says. “Parsing even this trivially simple sentence is the real hard part.”
GIFGIF has a demo that shows off its ability to identify sentiment, and if eventually someone is able to make an engine that converts sentiment to text, then it could be plugged into the GIFGIF backend and produce a GIF that more accurately represents your feelings.
Question is: Who will build it? Certainly Google, with its trove of national language processing data, coupled with its database of animated GIFs, could make inroads. Giphy easily has a leg up on any competition: Its GIF search engine almost functions as a translator. For now, we’ll have to be content with the variety of entertaining GIF-making apps at our disposal. At least in the search for function, we’re having fun… looped over and over and over.
Continue reading –