Cross-lingual sentiment analysis for under-resourced languages
- Toni Badia Cardús Director
- Patrik Lambert Co-director
Defence university: Universitat Pompeu Fabra
Fecha de defensa: 28 January 2019
- Sebastian Padó Chair
- Horacio Saggion Secretary
- Alexandra Balahur Dobrescu Committee member
Type: Thesis
Abstract
The amount of user-generated content available on the internet is constantly growing and with it, the opportunity to find methods which allow us to infer valuable information from this content. Sentiment Analysis is a task that allows us to calculate the polarity of this content automatically. While some languages, such as English, have a vast array of resources to enable sentiment analysis, most under-resourced languages lack them. \emph{Cross-lingual Sentiment Analysis} (CLSA) attempts to make use of resource-rich languages in order to create or improve sentiment analysis systems in a under-resourced language. Machine translation is the most common way of transferring these resources, yet it is not always available nor the optimal solution. The objective of this thesis is to explore approaches than enable sentiment analysis in under-resourced languages, while moving from coarse- to fine-grained sentiment. Until now, there has been little investigation into CLSA for languages that lack large amounts of parallel data. Here, we propose cross-lingual sentiment approaches that have minimal parallel data requirements, while making the best use of available monolingual data. We start by determining the characteristics of state-of-the-art monolingual sentiment models that would be interesting for this task and comparing machine translation and cross-lingual distributional representations. We propose a model to incorporate sentiment information into bilingual distributional representations, by jointly optimizing them for semantics and sentiment, showing state-of-the-art performance when combined with machine translation. We then move these approaches to aspect-level and subsequently test them on a variety of language families and domains. Finally, we show that this approach can also be suitable for domain adaptation.