In a Public Policy Polling survey, quite a few Texans say they’ll vote for Harambe for president in November. If you haven’t looked at the Internet in a while, Harambe was a gorilla fatally shot by a zookeeper after a toddler fell into his pen, but he’s more than that. He’s a meme, and his candidacy in Texas represents the voice of the Internet insinuating its way into polling. It’s silly, but it’s actually a sign of positive change.

Traditional polling methods aren’t working the way they used to. Upstart analytics firms like Civis and conventional pollsters like PPP, Ipsos, and Pew Research Institute have all been hunting for new, more data-centric ways to uncover the will of the whole public, rather than just the tiny slice willing to answer a random call on their landline. The trending solution is to incorporate data mined from the Internet, especially from social media. It’s a crucial, overdue shift. Even though the Internet is a cesspool of trolls, it’s also where millions of Americans go to express opinions that pollsters might not even think to ask about.

How It Works

People have tweeted about Donald Trump over 22 million times since the Republican National Convention. Data-wise, that’s an analyst’s dream. “Twitter data is relatively easy to get, and quantity has a quality all of its own,” says Seth Redmore, CMO at sentiment analysis company Lexalytics, which has conducted political polls for The Boston Globe.

So to put the data to use, analysts pull together all of the relevant mentions (names and handles are the obvious ones, but hashtags and memes can get parsed, too) using boolean search queries, which are basically just keyword searches that use the operators “or,” “and,” and “not” to refine results. Then they filter for sentiment using highly accurate natural language processing algorithms. “Machines are much better at doing sentiment analysis than humans now, especially on a large scale,” says Apoorv Agarwal, a computer scientist at Columbia University. Which is why there are whole companies devoted to mining social media text for clues of rising trends, often for marketing and stock research.

According to Fabio Rojas, a sociologist at Indiana University who conducted a study correlating Twitter mentions and candidate success, “More tweets equals more votes.”

Measuring Political Reach

So political polling is another natural fit for sentiment analysis. “We saw from the very beginning that Trump’s social mentions really took off,” says Kellan Terry, lead analyst at sentiment analysis company Brandwatch. “We considered him to be a legitimate candidate when pundits were still saying he was a joke.” For Terry, the turning point was the fifth Republican debate in December, when Trump was mentioned on Twitter over 100,000 times more than any other candidate—a gap Terry calls “very rare indeed.”

And though mainstream media cast Hillary Clinton as uninspiring and Bernie Sanders as a maverick who had the nation frothing with support, the number of impressions—the unique accounts interacting with Clinton on Twitter—versus mentions—told a different story. During the second Democratic debate, Sanders had 209,000 mentions to Clinton’s 168,000, but Clinton trounced him in impressions, 1 billion to 531 million. Even five months later, during the eighth debate, the scenario was the same: in mentions, Sanders had Clinton beat 134,000 to 111,000, but her impressions outweighed his by almost 3 million. So while Sanders’ supporters were more individually Tweet-happy, Clinton’s reach was wider. “The polls and the social media inputs were saying the same thing,” says Clifford Young, President of US Public Affairs at Ipsos, which includes data from online sources like Facebook and even Xbox Live in its polling numbers. “The errors made by the pundits were not errors in the data stream.”


Still, Twitter’s not a perfect medium for polls. Natural language processing algorithms can tell if you’re being sarcastic, but not necessarily if you’re tweeting something you don’t believe because you’ve got your professional hat on. And it’s not an unbiased sample of the electorate, either. “Only 23-percent of people who use the Internet use Twitter, so it’s not a representative sample—it skews younger, urban, and educated,” says Kennedy, Director of Survey Research at Pew Research Center. “But it brings something to the table as a supplement to a rigorous public opinion poll.”

Asking Different Questions

Social media data gives you a sense of the zeitgeist in a way that multiple choice questions never will. “Say I wanted to learn about what music people are listening to,” says Rojas. “I would have to sit down beforehand and come up with the list. But what if I don’t know about Taylor Swift or Justin Bieber?” Polls are generated by a small group of people, and they can’t know everything. Social media is a sample of what people actually talk about, what actually draws their attention, and the issues that really matter to them.

That sentiment matters, and pollsters can (and in PPP’s case, do) use it to direct their questioning. “People clue us in on stuff online all the time,” says Jim Williams, a polling analyst at PPP. They even ask the Internet where and on what they should poll next, hence Harambe’s presence in its poll. But, Williams says, joke suggestions aside, Twitter’s input also helps pollsters include the finer points of local and national politics. And even the Harambe question itself actually tells the pollsters something interesting.

From that question, Williams says PPP can tell that “there’s a certain number of people who don’t like Hillary Clinton or Donald Trump, and they will vote for anyone else, no matter who and what it is.” Even a dead gorilla. Because in polling, as with Twitter as a whole, you’ve got to wade through the memes to find the truth.


Want More Accurate Polls? Maybe Ask Twitter