Estudio de la cortesía en traducción automática neuronalmodelos ajustados y modelos multirregistro para el castellano

  1. Aranberri, Nora
  2. Soler Uguet, Celia
Revista:
Procesamiento del lenguaje natural

ISSN: 1135-5948

Año de publicación: 2023

Número: 70

Páginas: 199-212

Tipo: Artículo

Otras publicaciones en: Procesamiento del lenguaje natural

Resumen

En la actualidad, la traducción automática neuronal es capaz de generar traducciones de alta calidad en lo que respecta a la precisión gramatical y la fluidez. Así, es hora de ampliar los objetivos de investigación y considerar aspectos de la lengua que van más allá de los atributos mencionados para seguir superando los límites de la tecnología. En este trabajo, nos centramos en la cortesía. En concreto, adaptamos y exploramos, para el castellano, dos enfoques diferentes de adaptación al dominio: modelos ajustados y modelos multilingües. Los resultados de las evaluaciones automáticas y manuales parecen indicar que el segundo podría ser mejor para lograr un equilibrio de calidad entre todos los registros (formal, informal y neutro). El ajuste de modelos parece sufrir de olvido catastrófico, lo que conduce a un peor rendimiento general de los motores.

Referencias bibliográficas

  • Aharoni, R., M. Johnson, and O. Firat. 2019. Massively multilingual neural machine translation. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 3874–3884, Minneapolis, Minnesota, June. Association for Computational Linguistics.
  • Bane, F. and A. Zaretskaya. 2021. Selecting the best data filtering method for NMT training. In Proceedings of Machine Translation Summit XVIII: Users and Providers Track, pages 89–97, Virtual, August. Association for Machine Translation in the Americas.
  • Bapna, A., N. Arivazhagan, and O. Firat. 2019. Simple, scalable adaptation for neural machine translation. arXiv preprint arXiv:1909.08478.
  • Briz, A. 2010. Lo coloquial y lo formal, el eje de la variedad ling ̈u ́ıstica. De moneda nunca usada: Estudios dedicados a Jos ́e Ma Enguita Utrilla, 125:133.
  • Brown, P. 2015. Politeness and language. The International Encyclopedia of the Social and Behavioural Sciences (IESBS),(2nd ed.), pages 326–330.
  • Bulte, B. and A. Tezcan. 2019. Neural fuzzy repair: Integrating fuzzy matches into neural machine translation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 1800–1809, Florence, Italy, July. Association for Computational Linguistics.
  • Chu, C., R. Dabre, and S. Kurohashi. 2017. An empirical comparison of domain adaptation methods for neural machine translation. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 385–391, Vancouver, Canada, July. Association for Computational Linguistics.
  • Chu, C. and R. Wang. 2018. A survey of domain adaptation for neural machine translation. In Proceedings of the 27th International Conference on Computational Linguistics, page 1304–1319, Santa Fe, New Mexico, USA, August. Association for Computational Linguistics.
  • Dinu, G., P. Mathur, M. Federico, and Y. AlOnaizan. 2019. Training neural machine translation to apply terminology constraints. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3063– 3068, Florence, Italy, July. Association for Computational Linguistics.
  • Etchegoyhen, T., E. Mart ́ınez Garcia, A. Azpeitia, G. Labaka, I. Alegria, I. Cortes Etxabe, A. Jauregi Carrera, I. Ellakuria Santos, M. Martin, and E. Calonge. 2018. Neural machine translation of Basque. In Proceedings of the 21st Annual Conference of the European Association for Machine Translation, pages 139– 148, Alacant, Spain, May. European Association for Machine Translation.
  • Feely, W., E. Hasler, and A. de Gispert. 2019. Controlling japanese honorifics in english-to-japanese neural machine translation. In Proceedings of the 6th Workshop on Asian Translation, pages 45–53.
  • Halliday, M., A. McIntosh, and P. Stevens. 1964. The language Science and Language Teaching. London. Longman.
  • Haugh, M. 2005. The importance of “place” in japanese politeness: Implications for cross-cultural and intercultural analyses. Japanese Politeness: Implications for Cross-Cultural and Intercultural Analyses, 2(1):41–68.
  • Junczys-Dowmunt, M., R. Grundkiewicz, T. Dwojak, H. Hoang, K. Heafield, T. Neckermann, F. Seide, U. Germann, A. F. Aji, N. Bogoychev, A. F. T. Martins, and A. Birch. 2018. Marian: Fast neural machine translation in C++. In Proceedings of ACL 2018, System Demonstrations, pages 116–121, Melbourne, Australia, July. Association for Computational Linguistics.
  • Kell, G. 2018. Overcoming catastrophic forgetting in neural machine translation. Ph.D. thesis, MPhil dissertation, University of Cambridge.
  • Kirkpatrick, J., R. Pascanu, N. Rabinowitz, J. Veness, G. Desjardins, A. A. Rusu, K. Milan, J. Quan, T. Ramalho, Exploring politeness control in NMT: fine-tuned vs. multi-register models in Castilian Spanish A. Grabska-Barwinska, et al. 2017. Overcoming catastrophic forgetting in neural networks. Proceedings of the National Academy of Sciences, 114(13):3521–3526.
  • Kobus, C., J. Crego, and J. Senellart. 2017. Domain control for neural machine translation. In Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017, pages 372–378, Varna, Bulgaria, September. INCOMA Ltd.
  • Koehn, P. 2004. Statistical significance tests for machine translation evaluation. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, pages 388–395, Barcelona, Spain, July. Association for Computational Linguistics.
  • Koehn, P. and R. Knowles. 2017. Six challenges for neural machine translation. In Proceedings of the First Workshop on Neural Machine Translation, pages 28–39, Vancouver, August. Association for Computational Linguistics.
  • Luong, M.-T. and C. Manning. 2015. Stanford neural machine translation systems for spoken language domains. In Proceedings of the 12th International Workshop on Spoken Language Translation: Evaluation Campaign, pages 76–79, Da Nang, Vietnam, December 3-4.
  • Matthiessen, C. and M. Halliday. 1997. Systemic functional grammar. Amsterdam and London: Benjamins & Whurr.
  • Ott, M., S. Edunov, A. Baevski, A. Fan, S. Gross, N. Ng, D. Grangier, and M. Auli. 2019. fairseq: A fast, extensible toolkit for sequence modeling. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), pages 48–53, Minneapolis, Minnesota, June. Association for Computational Linguistics.
  • Popovi ́c, M. 2015. chrF: character ngram F-score for automatic MT evaluation. In Proceedings of the Tenth Workshop on Statistical Machine Translation, pages 392–395, Lisbon, Portugal, September. Association for Computational Linguistics.
  • Post, M. 2018. A call for clarity in reporting BLEU scores. In Proceedings of the Third Conference on Machine Translation: Research Papers, pages 186–191, Brussels, Belgium, October. Association for Computational Linguistics.
  • Post, M. and D. Vilar. 2018. Fast lexically constrained decoding with dynamic beam allocation for neural machine translation. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 1314–1324, New Orleans, Louisiana, June. Association for Computational Linguistics.
  • Rei, R., A. C. Farinha, C. Stewart, L. Coheur, and A. Lavie. 2021a. MTTelescope: An interactive platform for contrastive evaluation of MT systems. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: System Demonstrations, pages 73–80, Online, August. Association for Computational Linguistics.
  • Rei, R., A. C. Farinha, C. Zerva, D. van Stigt, C. Stewart, P. Ramos, T. Glushkova, A. F. T. Martins, and A. Lavie. 2021b. Are references really needed? unbabelIST 2021 submission for the metrics shared task. In Proceedings of the Sixth Conference on Machine Translation, pages 1030–1040, Online, November. Association for Computational Linguistics.
  • Sennrich, R., B. Haddow, and A. Birch. 2016a. Controlling politeness in neural machine translation via side constraints. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 35– 40, San Diego, California, June. Association for Computational Linguistics.
  • Sennrich, R., B. Haddow, and A. Birch. 2016b. Neural machine translation of rare words with subword units. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1715–1725, Berlin, Germany, August. Association for Computational Linguistics. Celia Soler Uguet, Nora Aranberri 210
  • Sennrich, R., M. Volk, and G. Schneider. 2013. Exploiting synergies between open resources for German dependency parsing, POS-tagging, and morphological analysis. In Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013, pages 601–609, Hissar, Bulgaria, September. INCOMA Ltd. Shoumen, BULGARIA.
  • Tiedemann, J. 2012. Parallel data, tools and interfaces in OPUS. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC’12), pages 2214–2218, Istanbul, Turkey, May. European Language Resources Association (ELRA).
  • Tiedemann, J. and S. Thottingal. 2020. OPUS-MT – building open translation services for the world. In Proceedings of the 22nd Annual Conference of the European Association for Machine Translation, pages 479–480, Lisboa, Portugal, November. European Association for Machine Translation.
  • Van Merri ̈enboer, B., D. Bahdanau, V. Dumoulin, D. Serdyuk, D. WardeFarley, J. Chorowski, and Y. Bengio. 2015. Blocks and fuel: Frameworks for deep learning. arXiv preprint arXiv:1506.00619.
  • Vanmassenhove, E., D. Shterionov, and M. Gwilliam. 2021. Machine translationese: Effects of algorithmic bias on linguistic complexity in machine translation. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 2203–2213, Online, April. Association for Computational Linguistics.
  • Vaswani, A., N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin. 2017. Attention is all you need. In 31st Conference on Neural Information Processing Systems (NIPS 2017), pages 1–11, Long Beach, CA, USA, December. Exploring politeness control