Political barometer methodology


Objectives

The political barometer's ranking is based on the number of public tweets that mention each politician (among a list of more than 140 politicians) and for the first 20 politicians, we determine how many tweets are positive, neutral, criticisms or rejections.

Tweets about politicians do not necessarily mention their full name or their Twitter account. Users often call politicians with initials, nicknames or even joke hashtags. Wordplays are frequent. As a matter of fact, very few tweets mentioning politicians explicitly state their first and last names.

While top politicians, and especially the incumbent President Nicolas Sarkozy, are regularly mentioned in tweets in other languages from foreign news sources, conversations about French politicians mostly take place in French. As a result, political barometer's ranking is only based on messages in French.

The occurrence of a given keyword such as the name of a politician does not necessarily mean this message is about a politician. Indeed, all top politicians are called by names or initials that are subject to polysemy. Polysemy is the property for a word to have multiple meanings. Disambiguation and polysemy filtering are required to count tweets that are actually about each politician.

Process description

The diagram below summarizes the process flow to produce political barometer figures.


Message harvesting

This first phase consists in gathering messages from Twitter. This process is permanent, 24/7. For each politicians a set of keywords is defined and is given to Twitter which returns matching messages.
The keyword set may be updated in order to include new words relating to a politician (current affairs, neologism, new nicknames…).

During this phase, we also filter out spams and automated messages by thousands. For example, we filter out all messages posted from toushollande.fr website. We constantly look out for suspicious activities such as large scale tweet-stuffing.

Examples of keywords used for messages harvesting:

Politician NameKeywords used to retrieve messages that may be about the politician
Nicolas Sarkozy
  • sarkozy;
  • sarko;
  • sarko_2012;
Ségolène Royal
  • royal;
  • ségo;
  • ségolène;

Language filtering

Messages returned by Twitter may be in any language. We only keep messages in French and filter out tweets with no content (URLs only).

Language filtering is performed with Semiocast's language identification algorithm.


Disambiguation and polysemy filtering

Each tweet containing keywords for a given politician is analyzed to determine if it actually is about this politician. This process is based on Semiocast's polysemy filtering technology. Quality is ensured by manually verifying results on samples. When volumes of tweets about two politicians are extremely close, precision is increased to surely establish the ranking.

During this phase, messages are also attributed to the proper politicians. In particular, when a family name is shared by several figures, this includes attributing messages to the proper member of the family. As politicians are frequently called by their last names, only the context can be used to distinguish say Frédéric Mitterrand from François Mitterrand. In few cases, even human experts cannot determine who was the subject of the message. The ambiguity can even be the subject of a joke as shown below:

“Si Ségolène Royal veut succéder à Mitterrand, pas de problème, je lui réserve le Ministère de la Culture”Tweet from a fake account of Martine Aubry.

At the end of this stage, we compute the ranking and establish the list of 20 politicians who are most mentioned over the period.

Examples of polysemic names

Sign or WordPossible meanings
Sarkozy
  • Nicolas Sarkozy, the French politician, current French president;
  • Jean Sarkozy, the son of Nicolas Sarkozy.
Royal
  • Ségolène Royal, the French politician;
  • Royal Canin, the dog and cat food brand;
  • Royal, the adjective in French;
  • Port Royal, a road or train station in Paris, France.

Mood evaluation

Mood evaluation is performed using Semiocast's semantic analysis and machine learning algorithms. While we do not manually tag each message individually, our mood evaluation models are recalibrated monthly from thousands of hand-coded messages.

Tweets are sorted in four categories. These categories are defined as follows:

  • Approvals: For most politicians, very few messages are positive. As a consequence, we count as an approval any message for which authors recognize a positive value in the politician or in their platform, agree with one or several ideas or express their support for this politician. If a message contains both such a remark and a criticism or a rejection, it is counted as an approval nevertheless.
  • Rejections: We count as rejections tweets that clearly express their author will never support this politician. This category also includes insults.
  • Criticisms: We count as criticisms statements of disagreement, mockery, derision and disdain.
  • Neutral: A message is considered as neutral if it is a simple spread of a newspaper article (with no comment) or if the sender did not express an approval or a criticism sentiment.

See all the results

Click here to see the results