The math behind Trump’s tweets

Written by Anthony Bonato and Lyndsay Roach, Ryerson University. Photo credit AP Photo/Evan Vucci. Originally published in The Conversation.

President Donald Trump delivers a lot of information through Twitter. Here he speaks in the Oval Office of the White House, March 2018.

United States President Donald Trump has a preoccupation with Twitter. Since his account @realDonaldTrump became active in March 2009, it has amassed 53.2 million followers, making it the 18th most popular account on the social media site.

While Trump has tweeted more than 38,000 times, his tweets during and after the 2016 presidential election made his Twitter account a lightning rod for the media and the public. Major news outlets like CNN, CBC, and BBC routinely embed tweets from @realDonaldTrump in their online stories. The Daily Show even turned Trump’s tweets into a mock presidential museum.

In a controversial and unparalleled fashion, Trump uses Twitter as a vehicle for his political announcements. On high-impact issues such as the U.S. travel ban, transgender military recruits and immigration, to name a few, Trump used Twitter to communicate policy decisions.

Alec Baldwin on ‘Saturday Night Live’ in a 2016 sketch on how Trump, then the president-elect, couldn’t stop tweeting. Photo credit NBC.

Given the volume of Trump’s tweets and their potential political relevance, we thought it would be revealing and novel to use mathematical methods to analyze the web of interactions formed by his most frequently used keywords.

Network analysis

One of our primary goals was to uncover communities, which represent groupings of thematically related keywords. We formed co-occurrence networks based on Trump’s tweets, where nodes are keywords, and form links between two keywords if they appear in the same tweet. For example, if the keywords “bad” and “media” appear in the same tweet, they receive a link.

Using an online archive of the president’s tweets on GitHub, we extracted the top 100 keywords from Trump’s Twitter account from each of the last four years. We removed retweets and common words like “it” and “the.”

Some nodes were combined if the keyword was made up of two words; for example, “white” and “house” became “white house;” others such as “e-mail” and “e-mails” were kept separate because Trump used them in different contexts. Labels containing more than one word without spaces are hashtags that frequently appear in the tweets.

We visualized networks of keywords in @realDonaldTrump using the open source software Gephi with the ForceAtlas2 layout algorithm. Communities are groups of nodes that are more likely linked to each other than to other nodes in the network. Gephi uses the Louvain method on network modularity to identify communities, where modularity measures the strength of the division into communities. The Louvain method is an algorithm that optimizes the modularity of a network, so the higher the modularity, the better the division into communities.

The communities were uncovered as a byproduct of the overall network structure, and not by any manual manipulation on our parts. The Gephi software randomly assigned colours to each community: keywords with the same colour are thematically related.


The following network visualizations represent keywords from Trump’s Twitter account taken in 2015 and 2016, leading up to the 2016 U.S. presidential election. Links and nodes were resized based on their relative frequency.

The keyword network from Trump’s 2015 tweets.

In the 2015 network, the two nodes with the most links are “trump” and “realdonaldtrump,” which both appear in the purple community. The likely reason why Trump’s name came up so often as a keyword in 2015 was that he was campaigning for the Republican primary, and his tweets often included compliments made about or by him.

The purple community containing “cruz,” “rubio,” and “carson,” and the green community containing “kasich” and “bush” correspond to his Republican primary opponents.

In the 2016 network, the communities reflect his race against the Democratic nominee Hillary Clinton. The purple community appears to focus on Clinton and the Democratic Party, containing “crooked,” “fbi,” “emails,” and his hashtag “draintheswamp.”

The keyword network from Trump’s 2016 tweets.

In the orange community, there are keywords “rally,” “new hampshire” and “michigan,” along with his hashtag “makeamericagreatagain.” In the blue community, we observe the swing states “ohio” and “florida,” and his shortened hashtag “maga” that stands for “Make America Great Again.”

Next we looked at the 2017 and 2018 networks which correspond to the first and second years of Trump’s presidency.

In the 2017 network, the blue community corresponds to Trump’s dislike of the media, and it contains “fake,” “news,” “cnn,” “bad” and “media.” The orange community contains “hillary clinton,” “fbi” and “crooked.

The keyword network from Trump’s 2017 tweets.

The green community corresponds to domestic policy issues such as “healthcare,” “economy,” “jobs,” “tax,” “reform,” and “cut,” while the purple community has a cluster related to foreign policy issues such as “security,” “china,” and “north korea.”

The keyword network from Trump’s 2018 tweets.

In the 2018 network, communities emerged related to trade (in orange) and borders and immigration (in purple). Trump’s focus on the media and Clinton continues unabated and moves into the blue community. He frequently tweeted about “tax,” “cuts,” and “jobs” in the green community.

Five communities revealed

While’s Trump’s words spoken in the traditional media may at times appear unpredictable, our analysis suggests a long-term trend with his tweets.

Considering that Trump tweets on average ten times a day and on a range of issues, it is remarkable that in each of the four years, his Twitter networks consistently split up into precisely five communities. In other words, by accident or design, his tweets tend to focus on five broad topics each year since 2015. Some of the issues morph over time, and this is evident from before and after his presidency.

The content in the communities sometimes beg further questions. For example, in the 2018 network, the green community contains the keywords “russia,” “comey,” and “collusion.” These refer to the ongoing Russia investigation. The green community, however, also includes “crooked” and “hillary,” and we leave it to pundits to explain how all these keywords are related.

Our take is that by repeating keywords together, his sizable Twitter audience will view them as more likely linked in real life.

Trump is unlikely to stop or even reduce his tweeting anytime soon. Twitter represents a vital aspect of Trump’s media engagement.

Our analysis used network science to map out Trump’s keywords on Twitter and their interactions over the timescale of years. From this approach, we obtain a historical view of the topics that matter to him. A potential future research plan would be to map Trump’s Twitter networks over shorter time periods such as months, weeks or even days.

Every politician and public figure on Twitter have associated with them an evolving web of keywords. These networks are not always evident in our break-neck 24-hour news cycle, and our approach holds the potential to make these hidden networks more visible. We need only to look to network science to uncover them.