Cross-lingual sentiment analysis for under-resourced languages

Barnes, Jeremy Claude

Cross-lingual sentiment analysis for under-resourced languages

Barnes, Jeremy Claude

Supervised by:

Toni Badia Cardús Director
Patrik Lambert Co-director

Defence university: Universitat Pompeu Fabra

Fecha de defensa: 28 January 2019

Committee:

Sebastian Padó Chair
Horacio Saggion Secretary
Alexandra Balahur Dobrescu Committee member

Type: Thesis

Teseo: 580818 DIALNET TDX editor

Abstract

The amount of user-generated content available on the internet is constantly growing and with it, the opportunity to find methods which allow us to infer valuable information from this content. Sentiment Analysis is a task that allows us to calculate the polarity of this content automatically. While some languages, such as English, have a vast array of resources to enable sentiment analysis, most under-resourced languages lack them. \emph{Cross-lingual Sentiment Analysis} (CLSA) attempts to make use of resource-rich languages in order to create or improve sentiment analysis systems in a under-resourced language. Machine translation is the most common way of transferring these resources, yet it is not always available nor the optimal solution. The objective of this thesis is to explore approaches than enable sentiment analysis in under-resourced languages, while moving from coarse- to fine-grained sentiment. Until now, there has been little investigation into CLSA for languages that lack large amounts of parallel data. Here, we propose cross-lingual sentiment approaches that have minimal parallel data requirements, while making the best use of available monolingual data. We start by determining the characteristics of state-of-the-art monolingual sentiment models that would be interesting for this task and comparing machine translation and cross-lingual distributional representations. We propose a model to incorporate sentiment information into bilingual distributional representations, by jointly optimizing them for semantics and sentiment, showing state-of-the-art performance when combined with machine translation. We then move these approaches to aspect-level and subsequently test them on a variety of language families and domains. Finally, we show that this approach can also be suitable for domain adaptation.