Everyone expected a contentious and heated presidential debate on Sunday, and the candidates delivered. Like many people, I wanted a more substantive discussion of real issues, so I went on a quest, sifting through the emotion-filled tangents and creating a data-driven view of the debate between Hillary Clinton and Donald Trump.



Edward Lee is a marketing scientist for Google and lives in Ann Arbor, Michigan with his wife.

I acquired the annotated transcript of the second presidential debate on The Washington Post website. Using the tm software package in the R programming language, I extracted data insights from the transcript text file with a semi-structured format.

First, Trump went on the offensive against Clinton—as he promised to after the first debate.

Comparing the Pronouns Used in the First and Second Debates

pronouns.jpgEdward Lee

These graphs show the percentage of commonly used pronouns in order to understand who each candidate addressed while speaking. The pronouns are grouped to encompass variations (“we” includes we’re, we’ll, we’d, and us, for example). In addition, Trump’s “name” group includes the various ways he addressed Clinton by name, including her first and last names and “secretary of state.” Clinton’s “name” group includes the various ways she addressed Donald Trump by name.

Trump consistently used “I” in both debates, but in the second debate, he dramatically increased his use of “she” in his attacks against Clinton. On the other hand, Clinton most often used “we” during the first debate, but said “I” more during the second debate as Trump forced her to defend her 30-plus years of public service.

Second, Trump may have said more, but he did not say much compared to Clinton. I compared the number of words each candidate said and analyzed the degree to which those words were unique—in other words, the extent to which candidates repeated the same language. Clinton varied her language more.

The table below examines the frequency of all words versus the frequency of unique words each candidate said. Then, I divided all words by unique words, to better understand the magnitude of repeated words.

Frequency of Unique Words

Screen-Shot-2016-10-10-at-12.35.10-PM.jpgEdward Lee

The frequency table indicates that Trump said more words than Clinton, but not more unique words, implying that Trump repeated himself more than Clinton. The more prepared a candidate is, the more likely he or she will use a varied vocabulary.

Third, “disaster” was Trump’s favorite word in the second debate, after common words like “people,” “state,” etc.

Unique Words by Debate

disparity3.jpgEdward Lee

Words Used in Both Debates

commanlity2.jpgEdward Lee

These word clouds highlight the most frequently used stemmed words (excluding pronouns, prepositions, and other stopwords) comparing (1) words unique to each debate and (2) words similar in both debates. The idea is to show how each candidate shifted strategy in the second debate.

In the first debate, after the common words (people, country, state, etc.), Trump’s relied most heavily on “companies” and “job,” in the context of outsourcing work, while Clinton favored “policy” and “jobs” in the context of an evolving job market. During the second debate, Trump’s used “disaster” most frequently, while Clinton most often said “insurance” and “children.”

Fourth, Trump showed up to defend his reputation and divert attention away from him.

I analyzed the candidates’ statements on nine topic segments—serving as role models, inclusion, taxes, healthcare, foreign affairs, the Supreme Court, energy, and Clinton and Trump themselves.

Unique Words by Topic

unique_words.jpgEdward Lee

This graph shows the number of unique words in each debate segment. The innermost boundary shows the absolute minimum number of unique words across the segments; and the outermost boundary shows the absolute maximum number of unique words across the segments.

Based on this visual, it becomes clear upon which topics candidates were more and less prepared. Trump over-indexed his unique words on defending himself, while Clinton over-indexed her unique words on the topic of diversity and inclusion. Comparing them, Trump seems to have taken more opportunities to speak during segments in which he is most comfortable being critical (Trump, Clinton, healthcare, and foreign affairs). Based on my analysis, Trump seemed to have prepared the most on defending his waning candidacy, while Clinton seemed to have prepared the most on the topic of diversity and inclusion in this town-hall format.

Previously, Trump solicited votes by saying he is going to be the voice for his supporters. See for yourself which topics the candidates discussed most in each debate segment.

Topic: Being a Role Model

Apt_Behavior3.jpgEdward Lee

Topic: Trump

Trump_Mistake3.jpgEdward Lee

Topic: Clinton

Clinton_Mistake3.jpgEdward Lee

Topic: Healthcare

Healthcare3.jpgEdward Lee

Topic: Inclusion

Inclusion3.jpgEdward Lee

Topic: Taxes

tax3.jpgEdward Lee

Topic: Foreign Affairs

International3.jpgEdward Lee

Topic: Energy

energy3.jpgEdward Lee

Topic: Supreme Court

Supreme_Court3.jpgEdward Lee

People elect the country’s leaders to represent and empower them. Based on these words used on the biggest stage in politics, which candidate would you feel most comfortable representing you?

Read more: 

Fascinating Insight Into the Language of the Second Debate