Tectogrammar-based machine translation for English-Spanish and English-Basque

Nora Aranberri; Gorka Labaka; Oneka Jauregi; Arantza Díaz de Ilarraza; Iñaki Alegría; Eneko Agirre

Tectogrammar-based machine translation for English-Spanish and English-Basque

Revista:

Procesamiento del lenguaje natural

ISSN: 1135-5948

Año de publicación: 2016

Número: 56

Páginas: 73-80

Tipo: Artículo

DIALNET GOOGLE SCHOLAR RUA editor

Otras publicaciones en: Procesamiento del lenguaje natural

Resumen

Presentamos los primeros sistemas de traducción automática para inglés-español e inglés-euskara basados en tectogramática. A partir del modelo ya existente inglés-checo, describimos las herramientas para el análisis y síntesis, y los recursos para la trasferencia. La evaluación muestra el potencial de estos sistemas para adaptarse a nuevas lenguas y dominios.

Referencias bibliográficas

Agerri, R., J. Bermudez, and G. Rigau. 2014. IXA pipeline: Efficient and ready to use multilingual NLP tools. In Conference on Language Resources and Evaluation, Reykjavik.
Aranberri, N., G. Labaka, A. DÄ±Ìaz de Ilarraza, and K. Sarasola. 2015. Exploiting portability to build an RBMT prototype for a new source language. In Proceedings of EAMT 2015, Antalya.
Berger, A., V. Della Pietra, and S. Della Pietra. 1996. A maximum entropy approach to natural language processing. Computational linguistics, 22(1):39–71.
Brandt, M., H. Loftsson, H. SigurthoÌrsson, and F. Tyers. 2011. Apertium-icenlp: A rule-based Icelandic to English machine translation system. In Proceedings of EAMT 2011, Leuven, Belgium.
Crouse, M., R. Nowak, and R. Baraniuk. 1998. Wavelet-based statistical signal processing using hidden markov models. Signal Processing, IEEE Transactions, 46(4):886–902.
Duˇsek, O. and F. Jurˇc´ıˇcek. 2013. Robust multilingual statistical morphological generation models. ACL 2013, page 158.
Duˇsek, O., Z. Zabokrtsk´y, M. Popel, ˇ M. Majliˇs, M. Nov´ak, and D. Mareˇcek. 2012. Formemes in English-Czech deep syntactic MT. In Proceedings of WMT7, pages 267–274
Hajiˇc, J., J. Panevov´a, E. Hajiˇcov´a, P. Sgall, P. Pajas, J. Step´anek, J. Havelka, ˇ M. Mikulov´a, Z. Zabokrtsk´y, and ˇ M. Sevcıkov´a Razımov´a. 2006. Prague ˇ dependency treebank 2.0. CD-ROM, Linguistic Data Consortium, LDC Catalog No.: LDC2006T01, Philadelphia, 98.
Hajiˇcov´a, E. 2000. Dependency-based underlying-structure tagging of a very large Czech corpus. TAL. Traitement automatique des langues, 41(1):57–78.
Mareˇcek, D., M. Popel, and Z. Zabokrtsk´y. ˇ 2010. Maximum entropy translation model in dependency-based MT framework. In Proceedings of WMT5 and MetricsMATR, pages 201–206. ACL.
Mayor, A., I. Alegria, A. DÄ±Ìaz de Ilarraza, G. Labaka, M. Lersundi, and K. Sarasola. 2011. Matxin, an open-source rule-based machine translation system for Basque. Machine translation, 25(1):53–82.
Popel, M. 2009. Ways to improve the quality of English-Czech machine translation. Master’s thesis, Institute of Formal and Applied Linguistics, Charles University, Prague, Czech Republic.
Popel, M. and Z. ZÌabokrtskyÌ. 2010. TectoMT: modular NLP framework. In Advances in natural language processing. Springer, pages 293–304.
Sgall, P. 1967. Functional sentence perspective in a generative description. Prague studies in mathematical linguistics, 2(203-225).
Zabokrtsk´y, Z. 2010. From treebanking ˇ to machine translation. Habilitation thesis, Charles University, Prague, Czech Republic.
Zeman, D. 2008. Reusable tagset conversion using tagset drivers. In Proceedings of LREC, pages 213–218.
Zeman, D., O. DusÌek, D. MarecÌek, M. Popel, L. Ramasamy, J. SÌteÌpaÌnek, Z. ZÌabokrtskyÌ, and J. HajicÌ. 2014. HamleDT: Harmonized multi-language dependency treebank. Language Resources and Evaluation, 48(4):601–637.

Fuente de los datos: Dialnet