EusEduSega Dependency-Based EDU Segmentation for Basque

  1. Mikel Iruskieta
  2. Benat Zapirain
Revista:
Procesamiento del lenguaje natural

ISSN: 1135-5948

Año de publicación: 2015

Número: 55

Páginas: 41-48

Tipo: Artículo

Otras publicaciones en: Procesamiento del lenguaje natural

Resumen

Presentamos en este artículo el primer segmentador discursivo para el euskera (EusEduSeg) implementado con heurísticas basadas en dependencias sintácticas y reglas lingüísticas. Experimentos preliminares muestran resultados de más del 85 % F1 en el etiquetado de EDUs sobre el Basque RST TreeBank.

Referencias bibliográficas

  • Aduriz, I., B. Arrieta, J.M. Arriola, A. Diaz de Ilarraza, E. Iza-girre, and A. Ondarra. 2006. Muga Gramatikaren optimizazioa (MuGa). Technical report, EHU.
  • Afantenos, S. D., P. Denis, P. Muller, and L. Danlos. 2010. Learning recursive segments for discourse parsing. In Seventh conference on Inter-national Language Resources and Evalua-tion, pages 3578–3584, Paris, France, 19-21 May.
  • Aranzabe, M. J. 2008. Dependentzia-ereduan oinarritutako baliabide sintak-tikoak: zuhaitz-bankua eta gramatika konputazionala. Doktore-tesia, Euskal Herriko Unibertsitatea, Donostia.
  • Arrieta, B. 2010. Azaleko sintaxi-aren tratamendua ikasketa automatikoko tekniken bidez: euskarako kateen eta perpausen identifikazioa eta bere erabilera koma-zuzentzaile batean. Doktore-tesia, Euskal Herriko Unibertsitatea, Donostia.
  • Blühdorn, H., 2008. Subordination and coordination in syntax, semantics and discourse: Evidence from the study of connectives. ’Subordination’ versus ’Coordination’ in Sentence and Text. Benjamins, Amsterdam.
  • Carlson, L. and Daniel M. 2001. Discourse tagging reference manual. Technical report.
  • da Cunha, I., E. SanJuan, J.M. TorresMoreno, M. Lloberes, and I. Castellón. 2010a. Discourse segmentation for Spanish based on shallow parsing. In 9th Mexican international conference on Advances in artificial intelligence: Part I, pages 13–23, Pachuca, Mexico, 8-13 November. Springer-Verlag.
  • da Cunha, I., E. SanJuan, J.M. TorresMoreno, M. Lloberes, and I. Castellón. 2010b. Diseg: Un segmentador discursivo automatico para el español. Procesamiento de Lenguaje Nat-ural, 45.
  • Diaz de Ilarraza, A., K. Gojenola, and M. Oronoz. 2005. Design and Development of a System for the Detection of Agreement Errors in Basque. In Computational Linguistics and Intelligent Text Processing, pages 793–802. Springer.
  • Hearst, M. A. 1997. TextTiling: Segmenting text into multi-paragraph subtopic passages. Computational linguistics, 23(1):33–64.
  • Iruskieta, M., M. J. Aranzabe, A. Diaz de Ilarraza, I. Gonzalez, M. Lersundi, and O. Lopez de la Calle. 2013. The RST Basque TreeBank: an online search interface to check rhetorical relations. In 4th Workshop ”RST and Discourse Studies”, Brasil, October 21-23.
  • Iruskieta, M., A. Diaz de Ilarraza, and M. Lersundi. 2011. Bases para la implementación de un segmentador discursivo para el euskera. In 8th Brazilian Symposium in Information and Human Language Technology (STIL 2011), OCTOBER 2011.
  • Lehmann, C. 1985. Towards a typology of clause linkage. In Conference on Clause Combining, volume 1, pages 181–248.
  • Liong, T. 2000. Adverbial clauses, func- tional grammar, and the change from sentence grammar to discourse-text grammar. Círculo de lingüística aplicada a la comunicación, 4(2).
  • Mann, W. C. and S. A. Thompson. 1987. Rhetorical Structure Theory: A Theory of Text Organization. Text, 8(3):243– 281.
  • Marcu, D., 1999. Discourse trees are good indicators of importance in text, pages 123–136. Advances in Automatic Text Summarization. MIT, Cambridge.
  • Miltsakaki, E., R. Prasad, A. Joshi, and B. L. Webber. 2004. Annotating discourse connectives and their arguments. In HLT/NAACL Workshop on Frontiers in Corpus Annotation, pages 9–16, Boston, USA.
  • O’Donnell, M. 2000. RSTTool 2.4: a markup tool for Rhetorical Structure Theory. In First International Conference on Natural Language Generation INLG ’00, volume 14, pages 253–256, Mitzpe Ramon, June12-16. ACL.
  • Pardo, T. A. S., M. G. V. Nunes, and L. H. M. Rino. 2004. DiZer: An Automatic Discourse Analyzer for Brazilian Portuguese. Advances in Artificial Intelligence–SBIA 2004, pages 224–234.
  • Salaburu, P. 2012. Menderakuntza eta menderagailuak (Sareko Euskal Gramatika: SEG). http://www.ehu.es/seg/morf/5/2/2/2.
  • Soricut, R. and D. Marcu. 2003. Sentence level discourse parsing using syntactic and lexical information. In 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, volume 1, pages 149–156. Association for Computational Linguistics.
  • Subba, R. and B. Di Eugenio. 2007. Automatic discourse segmentation using neural networks. In 11th Workshop on the Semantics and Pragmatics of Dialogue, page 189–190, Trento, Italy, 30-1 MayJune.
  • Thompson, S. A., R. Longacre, and Shin Ja J. Hwang, 1985. Adverbial clauses, volume 2 of Language Typology and Syntactic Description: Complex Constructions, pages 171–234. Cambridge University Press, New York.
  • Tofiloski, M., J. Brooke, and M. Taboada. 2009. A syntactic and lexical-based discourse segmenter. In 47th Annual Meeting of the Association for Computational Linguistics, pages 77–80, Suntec, Singapore, 2-7 August. ACL.
  • van der Vliet, N. 2010. Syntax-based discourse segmentation of Dutch text. In 15th Student Session, ESSLLI, pages 203– 210, Ljubljana, Slovenia, 1-12 August.