- 1 -
Introduction
Sentiment analysis, also known as opinion mining, is a relatively new field of
study that analyses subjective information in texts using Natural Language Processing
(NLP) techniques. The growing attention of researchers for this area is due to its many
possible applications in several domains. One of these domains is tourism, since tourists
tend to share their ideas and experiences on travel websites, such as TripAdvisor, and
their opinions can be valuable both for peers and facilities.
The interest for this topic was raised by the desire to find a common ground on
which Computational Linguistics and English for Specific Purposes (ESP) could
interact, as they are both subjects in which research is flourishing because of their
applicability in numerous contexts. Thence, Sentiment Analysis, which is a branch of
Computational Linguistics, and Tourism English, which is a branch of English for
Specific Purposes, were combined in this project.
Therefore, this dissertation will focus on sentiment analysis in the domain of
tourism. More specifically, its goal is to illustrate an analysis carried out by the author
on TripAdvisor’s hotel reviews using the using the live demo of a specific
computational tool.
The first chapter of this work will provide a theoretical background by giving an
overview of the literature in sentiment analysis. After defining opinion mining and
outlining its most common uses, some terminology issues will be discussed. Moreover,
some popular tasks will be looked at, namely polarity classification, which aims at
determining whether a text is positive or negative and which can be performed at
document-, sentence- or word-level, affective state classification, which deals with
emotion recognition, and sarcasm detection, which has the objective of uncovering
irony in texts. Furthermore, this first chapter will illustrate the ways in which sentiment
lexicons are generated and used and it will explain the ways in which computer
mediated communication can influence sentiment analysis. Finally, it will focus on the
importance of review mining in the touristic domain.
- 2 -
The second chapter of this dissertation will present the OpeNER Project, an
analysis system funded by the European Union and implemented by researchers from
Italy, Spain and Holland. It performs, among other tasks, also sentiment analysis and it
implements components that were trained on the touristic domain and, more precisely,
on accommodation reviews. Moreover, it is available online for free through some web-
services and a live demo. Firstly, objectives and tasks of this project will be illustrated.
Secondly, the three layers of its architecture will be described: they are pipelines, which
are chains of components, components, which are software that perform specific tasks,
and cores, which are the working part of components. In addition to this, some
examples will be provided in order to explain how each component works. Thirdly, the
OpeNER live demo will be described, as it is the actual tool that was employed for the
analysis of the reviews.
Finally, the third chapter will focus on the actual analysis of TripAdvisor’s hotel
reviews, which was done with the OpeNER demo and aimed at evaluating the demo’s
performances regarding polarity detection at word-level and document-level. The first
part of the chapter will describe TripAdvisor, a travel web platform on which users can
publish travel related contents, and its reviews, which are structured in a specific, steady
way. Afterwards, the second part will show how the experiment was concretely carried
out and thus will illustrate how reviews were sampled to create a corpus, how they were
investigated and how results were evaluated using specific metrics. Hence the third and
last section will present the results of the quantitative analysis and will provide some
general qualitative observations about the main issues in the demo’s performance as far
as polarity detection is concerned.
- 3 -
Chapter 1-
Sentiment Analysis: taxonomy, objectives and tasks
Sentiment analysis, also known as opinion mining, is the field of study that
“deals with the computational treatment of […] opinion, sentiment, and subjectivity in
text” (Pang et al. 2008: 9) and that classifies “language which carries an evaluative or
affective stance” (Brooke, 2009: 15).
The raw textual data available from natural language, however, are usually
unstructured, that is, they are easily understood by humans but not by machines, and are
not easily machine processable (Cambria et al. 2013). The first necessary step in their
analysis is hence to transform unstructured texts into structured data through natural
language processing (NLP) techniques (Bisio et al. 2017). Therefore, the primary
objective of sentiment analysis is the creation of “automatic tools able to extract
subjective information from texts in natural languages […], to create structured and
actionable knowledge” (Pozzi et al. 2017: 1-2).
Although little research has been carried out before the year 2001, the situation
has changed from that year on, mostly thanks to the huge quantity of opinionated data
available on the internet (Pang et al. 2008). The web 2.0 has become a broad discussion
space that has generated greater interconnection among people and the possibility to
share ideas even with those who live in faraway Countries, by building virtual online
communities, in which people can share ideas and can collaborate (Kontopoulos et al.
2013). Social networks, in particular, open windows on individuals’ lives and ideas and
are therefore boosting this trend (Pozzi et al. 2017). Nevertheless, even before the
advent of social networking sites, bloggers wrote about their personal experiences with
products and services and invited readers to comment, as they still do. In addition to
this, in customer review sites or forums, people can discuss, provide their own opinion
or ask for other people’s opinions (Boiy et al. 2009).
Due to users’ reliance upon online advice and recommendations, analysing this
kind of precious, up to date information has become of paramount importance and is
- 4 -
useful in numerous applications (Pang et al. 2008). Consequently, sentiment analysis is
a very active research area in natural language processing, providing challenging
research problems and is therefore an opportunity for researchers to make significant
progress on all fronts of NLP (Pang et al. 2008 and Cambria et al. 2013).
Furthermore, sentiment analysis is used in several domains: it is employed by
businesses and organizations that want to have an insight in their customers’ satisfaction
and needs and that have realized that “consumer voices can wield enormous influence in
shaping the opinions of other consumers” (Zabin et al. 2008 in Pang et al. 2008: 2); it is
employed by politicians who want to monitor their electorate’s believes in order to sway
citizens’ votes or in order to modify their campaigns accordingly; it is employed by
individuals who seek peers’ opinions to help them in decision making; it has been
employed in medicine by Cobb et al. (2013) to determine the influence of online
messages on smokers' choices to use the medicine varenicline during smoking-cessation
treatments; finally, automated classifiers could fix errors in ratings if users accidentally
select a low rating but what is written in their review is positive (Cabral et al. in Pang et
al. 2008).
This chapter will first describe the terminology used in sentiment analysis and
the main concepts involved, and then illustrate its major tasks and the development of
the research in the field.
1.1 Taxonomy and terminology issues
A first clarification must be made about the expression sentiment analysis itself.
First of all, sentiment analysis and opinion mining are usually employed as synonyms,
even though, according to Cambria et al. (2013), they focus respectively on emotion
recognition and polarity detection. The phrases review mining and appraisal extraction
have been used, too (Pang et al. 2008). Other expressions, such as opinion extraction,
subjectivity analysis, affect analysis, emotion analysis, etc., however, cannot be used as
equivalent of sentiment analysis, because each actually refers to a specific task. In
- 5 -
particular, Pozzi et al. (2017) argue that the phrase polarity classification is often used
when talking about sentiment analysis in general, even though polarity classification is
just one of the tasks of sentiment analysis, that aims at extracting positive, negative and
neutral orientations (called polarities) from texts.
It is on polarity classification that this work will focus, even though this task has
already been more or less dealt with, and the research focus has shifted to more complex
assignments such as affective states identification (e.g. the uncovering of emotions such
as anger or happiness) or sarcasm detection (Hart, 2013).
As already pointed out, the goal of sentiment analysis is to extract subjective
information from texts (Pozzi et al. 2017). Consequently, a major step towards an
understanding of this subject is to explain what we mean by subjective information.
Under this label, we can find “peoples’ opinions, sentiments, evaluations, appraisals,
attitudes, and emotions towards entities such as products, services, organizations,
individuals, issues, events, topics, and their attributes.” (Liu, 2012: 1).
Thus, the first obvious aim of sentiment analysis must be to determine whether a
text is objective or subjective. This task is called subjectivity classification and
distinguishes texts that convey factual information from those that convey personal
views and beliefs (Pozzi et al. 2017). For instance, a sentence such as “Hotel Danieli is
in Venice” is an objective one because it simply states where that particular hotel is
located, whereas “Hotel Danieli is expensive” is subjective because it provides a
personal point of view. The distinction, however, is not always so straightforward and
might depend also on the readers’ knowledge and perspective (Liu, 2012).
Another terminological distinction must be made between opinion and
sentiment, even though they are generally used interchangeably. Whereas an opinion
can be considered a person’s concrete idea about something, a sentiment is something
more linked to emotions.
Nonetheless, the two concepts are strictly connected, as “identification of
sentiment is often exploited for detecting [opinion] polarity” (Cambria et al. 2013: 15)
and a sentiment can be described as a feeling caused by the opinion itself (Pozzi et al.