Learning to map variation-standard forms in Basque using a limited parallel corpus and the standard morphology

  1. Izaskun Etxeberria
  2. Iñaki Alegria
  3. Mans Hulden
  4. Larraitz Uria
Revista:
Procesamiento del lenguaje natural

ISSN: 1135-5948

Año de publicación: 2014

Número: 52

Páginas: 13-20

Tipo: Artículo

Otras publicaciones en: Procesamiento del lenguaje natural

Resumen

Este artículo explora tres diferentes métodos de aprendizaje de las variantes de un idioma (formas dialectales o diacrónicas) a partir de un pequeño corpus paralelo suponiendo que la morfología estándar está disponible

Referencias bibliográficas

  • Alegria, I., Aranzabe, M., Ezeiza, N., Ezeiza, A., and Urizar, R. (2002). Using finite state technology in natural language processing of basque. In LNCS: Implementation and Application of Automata, volume 2494, pages 1-12. Springer.
  • Almeida, J. J., Santos, A., and Simoes, A. (2010). Bigorna-a toolkit for orthography migration challenges. In Seventh International Conference on Language Resources and Evaluation (LREC2010), Valletta, Malta.
  • Beesley, K. R. and Karttunen, L. (2002). Finite-state morphology: Xerox tools and techniques. Studies in Natural Language Processing. Cambridge University Press.
  • Hulden, M. (2009). Foma: a finite-state compiler and library. In Proc. of the 12th Conference of the EACL, pages 29-32, Athens, Greece. ACL.
  • Hulden, M., Alegria, I., Etxeberria, I., and Maritxalar, M. (2011). Learning word-level dialectal variation as phonological replacement rules using a limited parallel corpus. In Proc. of the Dialects'2011. EMNLP, pages 39-48.
  • Kestemont, M., Daelemans, W., and Pauw, G. D. (2010). Weigh your words| memory-based lemmatization for Middle Dutch. Literary and Linguistic Computing, 25(3):287-301.
  • Koskenniemi, K. (1991). A discovery procedure for two-level phonology. Computational Lexicology and Lexicography: A Special Issue Dedicated to Bernard Quemada, pages 451-446.
  • Mann, G. S. and Yarowsky, D. (2001). Multipath translation lexicon induction via bridge languages. In Proc. of the second meeting of the NAACL, NAACL '01, pages 1-8. Association for Computational Linguistics.
  • Muggleton, S. and De Raedt, L. (1994). Inductive logic programming: Theory and methods. The Journal of Logic Program- ming, 19:629-679.
  • Novak, J. R., Minematsu, N., and Hirose, K. (2012). WFST-based grapheme-tophoneme conversion: Open source tools for alignment, model-building and decoding. In Proc. of the 10th FSMNLP.
  • Scherrer, Y. (2007). Adaptive string distance measures for bilingual dialect lexicon induction. In Proceedings of the 45th Annual Meeting of the ACL., ACL '07, pages 55-60. ACL.