Table Of ContentIrony and Sarcasm Detection in Twitter:
The Role of Affective Content
PhD Candidate
Delia Irazú Hernández Farías
Thesis Advisors
Paolo Rosso
Universitat Politècnica de València, Spain
Viviana Patti
Università degli Studi di Torino, Italy
Valencia, September 2017
Dipartimento di Informatica
Dottorato di ricerca in Informatica
Ciclo XXX
Irony and Sarcasm Detection in Twitter:
The Role of Affective Content
Tesi presentata da:
Delia Irazú Hernández Farías
Tutors:
Paolo Rosso
Universitat Politècnica de València, Spain
Viviana Patti
Università degli Studi di Torino, Italy
Coordinatore del dottorato:
Marco Grangetto
Settembre 2017
Settore scientifico-disciplinare di afferenza: INF/01
Acknowledgments
I would like to like to express my most sincere gratitude to all of those who
have made this work possible.
Firstly, to my advisors: Paolo Rosso and Viviana Patti, without their help,
it would not have been possible to conclude this thesis. Thanks a lot for all the
time dedicated to our interesting and fascinating research topic #withoutsarcasm
:D.
Paolo, thank you for all the opportunities you have given me since more
than five years ago. Thanks a lot for encouraging me to be a better PhD student
and also for all your advice and patience. I am very thankful for your help and
support during these years. I just want to say this in all the languages I speak:
thank you! grazie! gracias!
Viviana, thank you so much for all the support and help that you’ve given
me. I really appreciate that you have made me collaborate in different projects.
Thank you for inviting me to spend part of my PhD in a beautiful city such
as Torino (giving to me the opportunity to learn a new language: Italian).
Sinceramente, Grazie mille!
I’m really thankful to the reviewers of this thesis: Rachel Giora, Horacio
Saggion, and Pavel Braslavski; thanks for your valuable comments about my
thesis. Thank you very much to the members of the evaluation tribunal of this
thesis: Horacio Saggion, Elisabetta Fersini, and Roberto Basili.
Thank you so much to Universitat Politècnica de València (UPV) and
Università degli Studi di Torino (UniTo) for all the facilities and support
provided to me. And also to the people in the Pattern Recognition and Human
Language Technology (PRHLT) research center.
Thanks to all the people from different countries and cultures that shared
some time in the laboratory at UPV with me. A special mention is for Maite:
thank you so much for the time and experiences we share during this period:
moltes gràcies!
I also want to say GRAZIE to the people at UniTo, especially to Emilio
Sulis, Cristina Bosco, and Mirko Lai (who learned to speak his own version of
Spanish with me).
Thank you to all the people who have shared not only good (also bad and
stressful) moments but also their lives with me in Valencia and Torino.
Thanks to my grandfather, aunts, cousins, and friends in Mexico for always
having words of encouragement for me.
Last but not least, I would say thank you to the most important people in
my life: my mom and my brother. Thank you for being always there supporting,
helping, and encouraging me no matter the distance. Mami: Thank you so
much for taking care of us and also for always having a smile even in rough
times.
Delia Irazú Hernández Farías
València, July 2017.
Funding
This work has been funded by the National Council for Science and
Technology (CONACyT - Mexico) with the Grant No. 218109/313683.
Part of the research was carried out in the framework of the SomEMBED
TIN2015-71147-C2-1-P MINECO project.
Abstract
Investigating how people express themselves in social media has
attracted the attention of several disciplines due to the great potential
for research that it represents. Social media platforms, like Twitter, offer
a face-saving ability that allows users to express themselves employing
figurative language devices such as irony to achieve different communi-
cation purposes. Ironic utterances in such platforms are generated by
users that most of the time have only an intuitive definition of what
irony is. Dealing with such kind of content represents a big challenge for
computational linguistics. Irony is closely associated with the indirect
expression of feelings, emotions and evaluations, intended as the writer’s
attitude or stance towards a particular target entity involved in the
ironic utterance. Thus, interest in detecting the presence of irony in
social media texts has grown significantly in the recent years, also for the
impact on natural language processing (NLP) areas related to sentiment
analysis, where irony detection is important to avoid misinterpreting
ironic statements as literal.
In this thesis, we introduce the problem of detecting irony in social
media under a computational linguistics perspective. We propose to
address this task by focusing, in particular, on the role of affective
information for detecting the presence of such figurative language device.
Attemptingtotakeadvantageofthesubjectiveintrinsicvalueenclosedin
ironic expressions, we present a novel model, called emotIDM, for detect-
ing irony relying on a wide range of affective features. For characterising
an ironic utterance, we used an extensive set of resources covering differ-
ent facets of affect from sentiment to finer-grained emotions. We address
irony detection by casting it as a binary classification problem. To eval-
uate our model, we collected a set of Twitter corpora used by scholars in
previous research, to be used as benchmarks with a two-fold purpose: to
compare the performance of our model against other approaches in the
state of the art, and to evaluate its robustness across several different
aspects related to the characteristics of the corpora, such as collection
mode, size and imbalance degree. Results show that emotIDM has a
competitive performance across the experiments carried out, validating
the effectiveness of the proposed approach. In most cases, our outcomes
outperform those from the related work confirming that affective in-
formation helps in distinguishing between ironic and non-ironic tweets.
Another objective of the thesis is to investigate the differences among
tweets labeled with #irony and #sarcasm. Our aim is to contribute to
the less investigated topic in computational linguistics on the separation
betweenironyandsarcasminsocialmedia, again, withaspecialfocuson
affective features. We also studied a less explored hashtag that has been
used by scholars for collecting samples of sarcastic intention: #not. We
find data-driven arguments on the differences among tweets containing
these hashtags, suggesting that the above mentioned hashtags are used
to refer different figurative language devices. We identify promising
features based on affect-related phenomena for discriminating among
differentkindsoffigurativelanguagedevicesandourclassificationresults
outperform the state of the art. We also analyse the role of polarity
reversal in tweets containing ironic hashtags, observing that the impact
of such phenomenon varies. In the case of tweets labeled with #sarcasm
often there is a full reversal (varying from a polarity to its opposite,
almost always from positive to negative polarity), whereas in the case of
those tagged with #irony there is an attenuation of the polarity (mostly
from negative to neutral).
Detecting irony in user-generated content could have a broad range
of applications. Undoubtedly, one of the areas that can benefit most
from irony detection is sentiment analysis. We analyse the impact
of irony and sarcasm on sentiment analysis, observing a drop in the
performance of NLP systems developed for this task when irony is
present. Therefore, we explored the possible use of our findings in
irony detection for the development of an irony-aware sentiment analysis
system, assuming that the identification of ironic content could help to
improve the correct identification of sentiment polarity. To this aim, we
incorporated emotIDM into a pipeline for determining the polarity of a
given Twitter message. We compared our results with the state of the
art determined by the ‘Semeval-2015 Task 11: Sentiment Analysis of
FigurativeLanguageinTwitter’sharedtask,demonstratingtherelevance
ofconsideringaffectiveinformationtogetherwithfeaturesalertingonthe
presenceofironyforperformingsentimentanalysisoffigurativelanguage
for this kind of social media texts. To summarize, we demonstrated
the usefulness of exploiting different facets of affective information for
dealing with the presence of irony in Twitter.
Description:sentiment analysis systems will need to understand when human communications in social media make use of figurative language devices such as irony and sarcasm. Acknowledgments. The National Council for Science and Technology (CONACyT Mexico) has funded the research work of Delia Irazú