Learning to map variation-standard forms in Basque using a limited parallel corpus and the standard morphology
- Izaskun Etxeberria
- Iñaki Alegria
- Mans Hulden
- Larraitz Uria
ISSN: 1135-5948
Datum der Publikation: 2014
Nummer: 52
Seiten: 13-20
Art: Artikel
Andere Publikationen in: Procesamiento del lenguaje natural
Zusammenfassung
Este artículo explora tres diferentes métodos de aprendizaje de las variantes de un idioma (formas dialectales o diacrónicas) a partir de un pequeño corpus paralelo suponiendo que la morfología estándar está disponible
Bibliographische Referenzen
- Alegria, I., Aranzabe, M., Ezeiza, N., Ezeiza, A., and Urizar, R. (2002). Using finite state technology in natural language processing of basque. In LNCS: Implementation and Application of Automata, volume 2494, pages 1-12. Springer.
- Almeida, J. J., Santos, A., and Simoes, A. (2010). Bigorna-a toolkit for orthography migration challenges. In Seventh International Conference on Language Resources and Evaluation (LREC2010), Valletta, Malta.
- Beesley, K. R. and Karttunen, L. (2002). Finite-state morphology: Xerox tools and techniques. Studies in Natural Language Processing. Cambridge University Press.
- Hulden, M. (2009). Foma: a finite-state compiler and library. In Proc. of the 12th Conference of the EACL, pages 29-32, Athens, Greece. ACL.
- Hulden, M., Alegria, I., Etxeberria, I., and Maritxalar, M. (2011). Learning word-level dialectal variation as phonological replacement rules using a limited parallel corpus. In Proc. of the Dialects'2011. EMNLP, pages 39-48.
- Kestemont, M., Daelemans, W., and Pauw, G. D. (2010). Weigh your words| memory-based lemmatization for Middle Dutch. Literary and Linguistic Computing, 25(3):287-301.
- Koskenniemi, K. (1991). A discovery procedure for two-level phonology. Computational Lexicology and Lexicography: A Special Issue Dedicated to Bernard Quemada, pages 451-446.
- Mann, G. S. and Yarowsky, D. (2001). Multipath translation lexicon induction via bridge languages. In Proc. of the second meeting of the NAACL, NAACL '01, pages 1-8. Association for Computational Linguistics.
- Muggleton, S. and De Raedt, L. (1994). Inductive logic programming: Theory and methods. The Journal of Logic Program- ming, 19:629-679.
- Novak, J. R., Minematsu, N., and Hirose, K. (2012). WFST-based grapheme-tophoneme conversion: Open source tools for alignment, model-building and decoding. In Proc. of the 10th FSMNLP.
- Scherrer, Y. (2007). Adaptive string distance measures for bilingual dialect lexicon induction. In Proceedings of the 45th Annual Meeting of the ACL., ACL '07, pages 55-60. ACL.