Sobre el efecto del orden de las palabras en el análisis de sentimiento crosslingüe

  1. Barnes, Jeremy
  2. Atrio, Àlex R.
  3. Badia Cardús, Toni
Revista:
Procesamiento del lenguaje natural

ISSN: 1135-5948

Año de publicación: 2019

Número: 63

Páginas: 23-30

Tipo: Artículo

Otras publicaciones en: Procesamiento del lenguaje natural

Resumen

Los modelos de análisis de sentimiento que actualmente representan el estado del arte utilizan el orden de las palabras, ya sea explícitamente al preentrenar con un objetivo de modelización del lenguaje, ya sea implícitamente al recurrir a redes neuronales recurrentes (RNR) o convolucionales (RNC). Esto es un problema para los acercamientos crosslingües que emplean vectores bilingües para entrenar, ya que la diferencia del orden de las palabras entre la lengua de origen y la de destino no se resuelve. En este trabajo, exploramos el reordenamiento de las palabras como etapa de procesamiento previa para la clasificación de sentimiento crosslingüe a nivel de frase, con dos combinaciones de idiomas (Inglés-Castellano, Inglés-Catalán). Descubrimos que aunque el reordenamiento ayuda a los dos modelos, los RNC son más sensibles al reordenamiento local, mientras un reordenamiento global beneficia a los RNR.

Referencias bibliográficas

  • Agerri, R., M. Cuadros, S. Gaines, and G. Rigau. 2013. OpeNER: Open polarity enhanced named entity recognition. Sociedad Española para el Procesamiento del Lenguaje Natural, 51(Septiembre):215– 218.
  • Artetxe, M., G. Labaka, and E. Agirre. 2016. Learning principled bilingual mappings of word embeddings while preserving mono lingual invariance. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 2289– 2294.
  • Artetxe, M., G. Labaka, and E. Agirre. 2017. Learning bilingual word embeddings with (almost) no bilingual data. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 451–462.
  • Artetxe, M., G. Labaka, and E. Agirre. 2018a. A robust self-learning method for fully unsupervised cross-lingual mappings of word embeddings. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 789–798. Association for Computational Linguistics.
  • Artetxe, M., G. Labaka, and E. Agirre. 2018b. Unsupervised statistical machine translation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, November.
  • Artetxe, M., G. Labaka, E. Agirre, and K. Cho. 2018. Unsupervised neural machine translation. In Proceedings of the Sixth International Conference on Learning Representations, April.
  • Balahur, A. and M. Turchi. 2014. Comparative experiments using supervised learning and machine translation for multilingual sentiment analysis. Computer Speech & Language, 28(1):56–75.
  • Banea, C., R. Mihalcea, J. Wiebe, and S. Hassan. 2008. Multilingual subjectivity analysis using machine translation. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, pages 127–135.
  • Barnes, J., T. Badia, and P. Lambert. 2018. MultiBooked: A Corpus of Basque and Catalan Hotel Reviews Annotated for Aspect-level Sentiment Classification. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan, May 7-12, 2018.
  • Barnes, J., R. Klinger, and S. Schulte im Walde. 2018. Bilingual sentiment embeddings: Joint projection of sentiment across languages. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2483–2493.
  • Bisazza, A. and M. Federico. 2016. A Survey of Word Reordering in Statistical Machine Translation: Computational Models and Language Phenomena. Computational Linguistics, 42:163–205.
  • Chen, X., B. Athiwaratkun, Y. Sun, K. Q. Weinberger, and C. Cardie. 2016. Adversarial deep averaging networks for crosslingual sentiment classification. CoRR, abs/1606.01614.
  • Collins, M., P. Koehn, and I. Kucerova. 2005. Clause restructuring for statistical machine translation. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05), pages 531– 540.
  • Crego, J. M. and J. B. Mariño. 2006a. Improving statistical mt by coupling reordering and decoding. Machine Translation, 20(3):199–215.
  • Crego, J. M. and J. B. Mariño. 2006b. Integration of pos tag-based source reordering into smt decoding by an extended search graph. In Proceedings of the 7th Conference of the Association for Machine Translation in the Americas (AMTA), pages 29– 36. Cambridge.
  • Devlin, J., M.-W. Chang, K. Lee, and K. Toutanova. 2018. Bert: Pretraining of deep bidirectional transformers for language understanding. CoRR, abs/1810.04805.
  • dos Santos, C. N. and M. Gatti. 2014. Deep convolutional neural networks for sentiment analysis of short texts. In Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, pages 69–78, Dublin, Ireland, August.
  • Gojun, A. and A. Fraser. 2012. Determining the placement of german verbs in english– to–german smt. In Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, pages 726–735.
  • Hangya, V., F. Braune, A. Fraser, and H. Schultze. 2018. Two methods for do main adaptation of bilingual tasks: Delightfully simple and broadly applicable. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 810–820.
  • Howard, J. and S. Ruder. 2018. Universal language model fine-tuning for text classification. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 328–339.
  • Hu, M. and B. Liu. 2004. Mining opinion features in customer reviews. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2004), pages 168–177.
  • Iyyer, M., V. Manjunatha, J. Boyd-Graber, and H. Daume III. 2015. Deep unordered composition rivals syntactic methods for text classification. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1681–1691, Beijing, China.
  • Kiritchenko, S., X. Zhu, C. Cherry, and S. M. Mohammad. 2014. NRC-Canada-2014: Detecting aspects and sentiment in customer reviews. Proceedings of the 8th International Workshop on Semantic Evaluation, pages 437–442.
  • Klinger, R. and P. Cimiano. 2015. Instance selection improves cross-lingual model training for fine-grained sentiment analysis. In Proceedings of the Nineteenth Conference on Computational Natural Language Learning, pages 153–163, Beijing, China, July.
  • Lample, G., A. Conneau, L. Denoyer, and M. Ranzato. 2018a. Unsupervised machine translation using monolingual corpora only. In International Conference on Learning Representations.
  • Lample, G., A. Conneau, M. Ranzato, L. Denoyer, and H. Jégou. 2018b. Word translation without parallel data. In International Conference on Learning Representations.
  • Lample, G., M. Ott, A. Conneau, L. Denoyer, and M. Ranzato. 2018c. Phrase-based & neural unsupervised machine translation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language
  • Processing, pages 5039–5049, Brussels, Belgium, October-November.
  • Luong, T., H. Pham, and C. D. Manning. 2015. Bilingual word representations with monolingual quality in mind. In Proceedings of the 1st Workshop on Vector Space Modeling for Natural Language Processing, pages 151–159.
  • Meng, X., F. Wei, X. Liu, M. Zhou, G. Xu, and H. Wang. 2012. Cross-lingual mixture model for sentiment classification. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 572–581, Jeju Island, Korea, July.
  • Nakagawa, T. 2015. Efficient top-down btg parsing for machine translation preordering. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 208–218.
  • Neubig, G., T. Watanabe, and S. Mori. 2012. Inducing a discriminative parser to optimize machine translation reordering. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages 843–853.
  • Peters, M., M. Neumann, M. Iyyer, M. Gardner, C. Clark, K. Lee, and L. Zettlemoyer. 2018. Deep contextualized word representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 2227–2237.
  • Severyn, A. and A. Moschitti. 2015. Unitn: Training deep convolutional neural network for twitter sentiment classification. In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), pages 464–469.
  • Socher, R., A. Perelygin, J. Wu, J. Chuang, C. Manning, A. Ng, and C. Potts. 2013. Recursive deep models for semantic compositionality over a sentiment treebank. Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pages 1631–1642.