Authors: Petra Kralj Novak,Jasmina Smailović,Borut Sluban,Igor Mozetič
ArXiv: 1509.07761
Document:
PDF
DOI
Abstract URL: http://arxiv.org/abs/1509.07761v2
There is a new generation of emoticons, called emojis, that is increasingly
being used in mobile communications and social media. In the past two years,
over ten billion emojis were used on Twitter. Emojis are Unicode graphic
symbols, used as a shorthand to express concepts and ideas. In contrast to the
small number of well-known emoticons that carry clear emotional contents, there
are hundreds of emojis. But what are their emotional contents? We provide the
first emoji sentiment lexicon, called the Emoji Sentiment Ranking, and draw a
sentiment map of the 751 most frequently used emojis. The sentiment of the
emojis is computed from the sentiment of the tweets in which they occur. We
engaged 83 human annotators to label over 1.6 million tweets in 13 European
languages by the sentiment polarity (negative, neutral, or positive). About 4%
of the annotated tweets contain emojis. The sentiment analysis of the emojis
allows us to draw several interesting conclusions. It turns out that most of
the emojis are positive, especially the most popular ones. The sentiment
distribution of the tweets with and without emojis is significantly different.
The inter-annotator agreement on the tweets with emojis is higher. Emojis tend
to occur at the end of the tweets, and their sentiment polarity increases with
the distance. We observe no significant differences in the emoji rankings
between the 13 languages and the Emoji Sentiment Ranking. Consequently, we
propose our Emoji Sentiment Ranking as a European language-independent resource
for automated sentiment analysis. Finally, the paper provides a formalization
of sentiment and a novel visualization in the form of a sentiment bar.