As our understanding of big data grows, can we harness its power to predict the future? At the Warwick Institute for the Science of Cities (WISC), Adam Tsakalidis is researching how to best unlock the secrets of our political future using the ever growing big data source, social media.
In the run up to the May 2015 general election, politicians marched the campaign trail and polls churned out updates on the voting preferences of the UK electorate. For Adam Tsakalidis, a WISC student and keen follower of politics, it was a great opportunity to put his research to the test.
“My research focusses on social media analysis in general and I’m trying to extract knowledge from this huge amount of social data in order to predict future events, for example, in politics,” says Adam.
“Data is becoming more and more available and we can learn more and more from it. Currently I want to analyse this data using sentiment and network analysis methods to see if the outcome of elections can be predicted before they happen. Trying to extract the sentiment out of a tweet is a difficult task by itself; thus, using this information in order to predict which party will be victorious is a big challenge.”
That’s where Adam’s research comes in. Focussing on the top cities in Europe, he used publically available data from the European Commission to monitor online discussions in different countries and different languages to see how political opinions were changing over time.
While Twitter may cause 21st century problems such as trolling and spamming, it’s incredibly useful for big data researchers like Adam since its data is publically available in most cases (unlike many other social media), streamlined, and encourages users to share news. Studies have shown that Twitter has been used in the past for stock market prediction and has even alerted the world to earthquakes before traditional media.
Adam explains, “When researchers first tackled this problem it was proposed that we could find out what Jane Bloggs was saying about David Cameron on Twitter and this might have revealed something about her intention to vote. So, if Ed Miliband was mentioned in 30 out of 100 political tweets in the last week before the elections, this would mean that Labour would get 30% of the votes. However, this was a very naïve way of measuring and was unsuccessfully applied to different elections.”
When we combined the two results from polls and Twitter, we collected the most accurate predictions for each party’s voting share
A number of papers were written; "How (not) to predict elections", "I wanted to predict elections using Twitter and all I got was this lousy paper" and a response to the researchers who first pioneered this field entitled, "Why the Pirate Party Won the German Election of 2009". They argued, along with others, that these researchers were basing their results on a lot of assumptions, normalising the data and fitting their method to the results. However, this started a research field in which quite a lot of work has been done to predict the election results.”
Adam and his team have built on these simplistic analysis techniques. Adam first worked on the Euro elections in 2014, analysing political chatter from Greece, Germany and the Netherlands. To avoid accusations of cheating, the team published their vote percentage predictions for each Greek political party before the results were called.
“Our predictions were quite close to the election results and we applied the exact same approach to Germany and the Netherlands after the announcement with similar outcomes,” explains Adam. We used data from Twitter and data from the official polls, and found that taking both into consideration yielded far better results than using only polls. When we combined the two results from polls and Twitter, we collected the most accurate predictions for each party’s voting share.
“That worked in all countries with different parameters as well. We didn’t use any sophisticated language processing algorithms, our process was very naïve, simplistic and we didn’t expect good results, but yet we saw that even simple such methods could work effectively.”
Adam’s second phase of research analysed tweets relevant to the Greek general elections of 2015. The team perfected the model used in the Euro elections, adding more variables and improving the parameters using mathematical models. The result was published at 8:30am local time, just after the ballots had opened:
Our research is indicating that both the politicians and commentators on social media have a part to play. It’s not just about the parties themselves, it’s about the users.
“There were 31 recent polls leading up to the elections and I added in my results. My mean absolute error was just 0.5, better than all 31 of these polls. Then we compared our predictions with those of the exit polls, as announced once the ballots closed, and we were closer to the mark than all three exit polls.”
There is a lot of research in the field natural language processing, including the task of sentiment analysis. While there are significant improvements during the past years in this area, new challenges also arise.
“Sentiment analysis is a very open task, and to examine data at this level requires us to build complex mathematical tools. Analysis has to be contextual. For example, if you trained a model on sports tweets and applied it to politics, the model would not work as well. Likewise, a one-year-old politics model would probably have huge errors. It’s also important to be able to understand which particular entities in a tweet – people or policies for example - the sentiment is directed towards.
“Future developments in artificial intelligence may mean that computers could learn languages and create rules that work for analysing data - the computer may actually think in order to analyse the data. The approach we use is about training a model on some hand-labelled data and making it work. They won’t be “intelligent” in this sense, but they will work, which is the important thing for now.”
Adam is currently working with his former research team in Greece (Multimedia Knowledge and Social Media Analytics Laboratory, ITI, CERTH) on models for the upcoming UK elections, together with researchers in natural language processing and social media analysis from the Department of Computer Science at the University of Warwick and the Department of Journalism at City University London. Predictions for the success of each party will be released before the ballots open on May 7th.
But just how important is it to take on board social media when predicting voting results?
“It’s simply another important form of opinion polling,” says Adam. “In several cases we have found some interesting patterns. For example, in the current study of the UK elections, we noticed that the sentiment during the seven leader debate became highly polarised. It’s correlations like this that we’re trying to unearth, correlations that are indicating that both the politicians and commentators on social media have a part to play. It’s not just about the parties themselves, it’s about the users.”
Adam Tsakalidis is a PhD candidate studying at the Warwick Institute for the Science of Cities (WISC). WISC researches how to make cities smarter by taking into account the big data available to us from sources such as transport, governmental organisations or social media. Adam's thesis is supervised by Alexandra Cristea and Stephen Jarvis. You can read more about his work with the European 2014 elections here.
Terms for republishing
The text in this article is licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0).