Can big data reveal the mood of the electorate?

18 April 2015

This video can not be played

To play this video you need to enable JavaScript in your browser.

Can social media really show how people will vote?

By Adam Fleming

BBC News Political reporter

It feels as if every day I get emails from companies with names like TheySay, TalkWalker, and emoSense telling me which party is winning the election based on social media buzz. There is a technical label for what they do: sentiment analysis.

But is it accurate, and what does it really tell us?

"Some of the commercial companies do it brilliantly, some do it terribly," says Carl Miller of the left-leaning think-tank Demos which has set up the Centre for the Analysis of Social Media to examine this booming business.

"It is a way of analysing hundreds of thousands of online conversations that we could never read ourselves but it should never be confused with an opinion poll."

(L-R): Green Party leader Natalie Bennett, Liberal Democrat leader Nick Clegg, UKIP leader Nigel Farage, Labour leader Ed Miliband, Plaid Cymru leader Leanne Wood, Scottish National Party leader Nicola Sturgeon and British Prime Minister and Conservative leader David Cameron

Getty Images

The leaders' debate took place in front of an audience of about 200 "real" people

While the nation was glued to its screens for the televised general election debates, Carl and his team at Demos monitored Twitter's "firehose" - the real-time feed of every tweet in the world.

During the clash between the seven main party leaders on 2 April, their algorithm identified 420,000 relevant tweets. They were classified as positive or negative - "cheers" or "boos".

David Cameron, Conservative: 32% cheers v 68% boos
Nigel Farage, UKIP: 40% cheers v 60% boos
Ed Miliband, Labour: 47% cheers v 53% boos
Nick Clegg, Liberal Democrat: 48% cheers v 52% boos
Natalie Bennett, Green: 64% cheers v 36% boos
Leanne Wood, Plaid Cymru: 66% cheers v 34% boos
Nicola Sturgeon, SNP: 83% cheers v 17% boos

The Demos model is based on technology developed by the Text Analytics Group at the University of Sussex.

"Computers are really good pattern recognition machines, and what you're trying to do is get the computer to connect the patterns in the tweets with the categories you are assigning tweets to," explains Dr Jeremy Reffin.

Computers struggle to understand sarcasm, explains Dr Reffin

First, a human being chooses the hashtags that are likely to be most relevant.

Then the algorithm is taught how to classify each tweet, using technology called Natural Language Processing. It has to learn how to distinguish between an opinion and a statement of fact.

The computer throws up examples and asks whether it has made the right decision, a process known as assisted machine learning.

The system was honed using data from reality TV shows like X Factor, which are effectively elections that are held every week.

But some of the big challenges in this area became clear when doctoral student Simon Wibberley shows me a spreadsheet listing every tweet from the leaders debate.

One said: "Ad-break. Time for a kitten in a hat. #leadersdebate". But the algorithm classified this as a cheer.

There are other tweets that say one thing but that are classified as the opposite.

"It's slightly unfair to challenge it on a case-by-case basis," argues Mr Wibberley.

He claims the system can make errors on a tweet-by-tweet basis, but it tends to make the right decisions on a larger scale.

The team also has to employ a technique called network analysis to separate out clusters of journalists and political professionals who are tweeting each other.

Yet I cannot escape the feeling that the audience on Twitter is not as balanced as the sample for an opinion poll.

Then there is one particularly British issue.

"Sarcasm," says Dr Reffin. "At this stage computers have a real problem with sarcasm."

The number of Twitter accounts in the UK is dwarfed by the 35 million users of Facebook in Britain.

The social network has published details of the number of interactions - which include likes, comments and shares - for each political party between 1 January and 7 April.

UKIP: 9.7 million interactions
Conservatives: 8.2 million interactions
Labour: 6.6 million interactions
Liberal Democrats: 1.3 million interactions
SNP: 1.3 million interactions

But Facebook's politics specialist Elizabeth Linder warns about over-interpreting the data.

Elizabeth Linder, Facebook politics specialist

People might not post their personal political views on Facebook, says Elizabeth Linder

"I think it's difficult… because a lot of people are sharing content that they maybe don't agree with, or they're sharing content because they're saying 'I'm a little bit confused by all of this, what do you all think?'," she says.

"I think instead what we are seeing is the potential to reach people and that they care about politics on Facebook."

She adds that many users may comment publicly on a political party's page but limit their personal views to private conversations with family and friends so the rest of us cannot see them.

Facebook has been able to make some connections between users' likes - such as music and films - and their political views, though.

Like all big data, social scientists would ask whether those are direct relationships or just coincidences.

"It'll be quite some time before [big data] can stand shoulder to shoulder with the social sciences in terms of how rigorous it is," says Carl Miller of Demos.

As a political journalist, I will definitely soak up all this new information, but I will still be reading the polls. And spending too much time reading Twitter.

Watch more reports on BBC Click on the BBC News Channel and BBC World News. Find out more at Click's website and @BBCClick.

Around the BBC

Election 2015