“Des chercheurs ont démontré que les prévisions de recettes des films au box-office basées sur l’analyse de sentiments sur Twitter étaient plus précises que celles basées sur les études de marché.”
Opinions posted at online stores, social networking posts, reviews written on specialist sites, blog posts, forum messages, answers on forms, etc. Today the web contains a huge amount of text data that can be analyzed to find out the opinions and sentiments of internet users.
This is the aim of sentiment analysis, a subfield of text mining, which consists in extracting subjective information from text sources thanks to natural language processing.
Researchers have proven that box-office revenue predictions for movies based on sentiment analysis on Twitter are more precise than those based on market analyses.
Sentiment analysis combines a set of techniques that determine the polarity (positive, negative, or neutral sentiment) expressed by a sentence or an entire document. Some tools place the message on a spectrum describing varying degrees of positivity or negativity, detect emotions (joy, sadness, anger, hope, worry, etc.), or identify intentions (for example: interested, uninterested).
These techniques use sentiment lexicons – dictionaries of words labelled according to whether they correspond to positive or negative sentiments –, different methods of machine learning, or a combination of both.
Predicting the success of a movie
Sentiment analysis can be used to predict the success and box-office revenue of a movie from data gathered on social media. Indeed, it is possible to measure the buzz created around a movie or to explore public opinions and expectations in the comments published under YouTube trailers, accompanied by Twitter hashtags or likes and emojis on Facebook, on IMDb or Rotten Tomatoes reviews, or in blog posts. Several research teams have attempted to prove that box-office revenue predictions for movies based on sentiment analysis from these sources, from Twitter in particular, are more precise than those based on market analyses.
In the United Kingdom, researchers at the National College of Ireland School of Computing carried out a project aiming to predict and classify the box-office success of around fifteen movies (will the movie be a “flop”, have “average” success, be a “hit” or a “superhit”?) thanks to the extraction of sentiments and emotions from close to 85,000 tweets, combined with machine learning algorithms. They used the Syuzhet package, which breaks tweets down into positive and negative tweets and assigns them an emotion score. Four different machine learning models were tested to analyze the data, two of which achieved accuracy of over 60 %, a better result than the models used in other studies.
Anticipating stock market prices
Investors and traders have always sought to anticipate the evolution of financial markets using different types of financial analyses. Artificial intelligence and machine learning have provided them with new tools: predictive models, built from large quantities of financial data. Now, finance is taking hold of sentiment analysis to complete its toolbox.
Behavioral finance has shown that emotions can influence investment decisions and therefore affect stock market prices. Capturing collective emotions – and in particular those of investors – could therefore make it possible to predict stock market movements.
In 2011, researchers were already analyzing the correlation between emotions expressed on Twitter and stock market indices (Dow Jones, NASDAQ, S&P 500). Their preliminary work concluded that when the emotions (hope, fear, worry) expressed on the microblogging platform were strong on a given day, the Dow Jones dropped the next day. “[…] Just checking on Twitter for emotional outbursts of any kind gives a predictor of how the stock market will be doing the next day.”
Some Indian researchers wanted to prove the value of using the sentiments and opinions expressed about companies on social networks and economic news websites to fine-tune the prediction of variations in a company’s stock prices.
Sentiment analysis can exploit other data sources such as biometric data (voice, facial expressions) and even musical data. An international research team suggested to measure investors’ state of mind based not on text data from comments posted on social networks but on data from music streaming platform Spotify.
Their paper, published in the “Journal of Financial Economics”, shows that it is possible to use the musical choices of Spotify users to measure the average mood of individuals in a country and link this to stock-market movements to predict the national index.
Measuring public opinion
The importance of social networks in politics is well known. People use them to share their points of view, to discuss current affairs, to interact with elected representatives and politicians… This makes them a research field for analysts wishing to measure public opinion on a specific subject, a political party or politician, during drafting of legislation or prior to an election.
During an election campaign, for example, sentiment analysis can make it possible to find out whether a candidate is popular or not in a particular region, or if they are perceived as being credible on a given topic. Several studies have even attempted to show that it is possible to predict election results based only on the analysis of Twitter data. With over 436 million active monthly users at the beginning of 2022, the social network has become a great influence in the political arena. One Medium user attempted to determine the political mood of each US state several weeks before the 2020 presidential election, using Twitter data. He collected tweets over the previous week using Twitter’s API and performed sentiment analysis with the help of VADER (Valence Aware Dictionary and sEntiment Reasoner). This open-source tool based on a lexicon of sentiments is specifically adapted to sentiments expressed on social media. It makes it possible to determine both their polarity and their intensity.
The author categorized the US states as follows: Strongly Republican, Strongly Democratic, Somewhat Republican, Somewhat Democratic, and states in which data was insufficient. According to his analysis, 21 states were counted as Republican and 16 as Democratic, according to sentiments expressed on Twitter. The opinion of 13 states remained undetermined due to insufficient data. Although several of these predictions turned out to be wrong, the experiment will very likely precede others as conversation, including political conversation, takes up more and more space in the field of social networks. Enabling measurement of the opinions and expectations, of the hopes and fears of internet users, sentiment analysis is a rapidly growing field thanks to advances made in natural language processing and the many applications that it enables.