A Supervised Central Unit Detector for Spanish

  1. Kepa Bengoetxea
  2. Mikel Iruskieta
Revista:
Procesamiento del lenguaje natural

ISSN: 1135-5948

Año de publicación: 2018

Número: 60

Páginas: 29-36

Tipo: Artículo

Otras publicaciones en: Procesamiento del lenguaje natural

Resumen

En este artículo presentamos el primer detector de la Unidad Central (CU) de resúmenes científicos en castellano basado en técnicas de aprendizaje automático. Para ello, nos hemos basado en la anotación del Spanish RST Treebank anotado bajo la Teoría de la Estructura Retórica o Rhetorical Structure Theory (RST). El método empleado para detectar la unidad central es el modelo de bolsa de palabras utilizando clasificadores como Naive Bayes y SVM. Finalmente, evaluamos el rendimiento de los clasificadores y hemos creado el detector de CUs usando el mejor clasificador.

Referencias bibliográficas

  • Alkorta, J., K. Gojenola, M. Iruskieta, and M. Taboada. 2017. Using lexical level information in discourse structures for basque sentiment analysis. In Proceedings of the 6th Workshop on Recent Advances in RST and Related Formalisms, pages 39–47, Santiago de Compostela, Spain, September 4 2017. ACL.
  • Bengoetxea, K., A. Atutxa, and M. Iruskieta. 2017. Un detector de la unidad central de un texto basado en técnicas de aprendizaje automático en textos cient́ıficos para el euskera. Procesamiento del Lenguaje Natural, 58:37–44.
  • Burstein, J., D. Marcu, S. Andreyev, and M. Chodorow. 2001. Towards automatic classification of discourse elements in essays. In Proceedings of the 39th annual Meeting on Association for Computational Linguistics, pages 98–105. ACL.
  • Carreras, X., I. Chao, L. Padró, and M. Padró. 2004. Freeling: An open-source suite of language analyzers. In LREC.
  • Chambers, J. M. 1983. Graphical methods for data analysis. Wadsworth Belmont, CA.
  • Cortes, C. and V. Vapnik. 1995. Supportvector networks. Machine learning, 20(3):273–297.
  • da Cunha, I., E. SanJuan, J.-M. TorresMoreno, M. T. Cabré, and G. Sierra. 2012. A symbolic approach for automatic detection of nuclearity and rhetorical relations among intra-sentence discourse segments in spanish. In International Conference on Intelligent Text Processing and Computational Linguistics, pages 462–474. Springer.
  • da Cunha, I., E. SanJuan, J.-M. TorresMoreno, M. Lloberas, and I. Castellón. 2010. Diseg: Un segmentador discursivo automático para el español. Procesamiento del Lenguaje Natural, 45:145– 152.
  • da Cunha, I., J.-M. Torres-Moreno, and G. Sierra. 2011. On the Development of the RST Spanish Treebank. In 5th Linguistic Annotation Workshop (LAW V ’11), pages 1–10, Portland, USA, 23 June. ACL.
  • Iruskieta, M., J. Antonio, and G. Labaka. 2016. Detecting the central units in two different genres and languages: a preliminary study of brazilian portuguese and basque texts. Procesamiento de Lenguaje Natural, 56:65–72.
  • Iruskieta, M., M. Aranzabe, A. Diaz de Ilarraza, I. Gonzalez, M. Lersundi, and O. L. de la Calle. 2013. The RST Basque TreeBank: an online search interface to check rhetorical relations. In 4th Workshop ”RST and Discourse Studies”, Brasil, October 21-23.
  • Iruskieta, M., A. D. de Ilarraza, and M. Lersundi. 2014. The annotation of the central unit in rhetorical structure trees: A key step in annotating rhetorical relations. In COLING, pages 466–475.
  • John, G. H. and P. Langley. 1995. Estimating continuous distributions in bayesian classifiers. In Proceedings of the Eleventh conference on Uncertainty in artificial intelligence, pages 338–345. Morgan Kaufmann Publishers Inc.
  • Kermanidis, K. L. 2012. Mining authors’ personality traits from modern greek spontaneous text. In Proc. of Workshop on Corpora for Research on Emotion Sentiment & Social Signals, in conjunction with LREC, pages 90–93. Citeseer.
  • Mairesse, F., M. A. Walker, M. R. Mehl, and R. K. Moore. 2007. Using linguistic cues for the automatic recognition of personality in conversation and text. Journal of artificial intelligence research, 30:457–500.
  • Mann, W. C. and S. A. Thompson. 1988. Rhetorical structure theory: Toward a functional theory of text organization. Text-Interdisciplinary Journal for the Study of Discourse, 8(3):243–281.
  • Manning, C. D., P. Raghavan, and H. Schtze. 2008. Relevance feedback and query expansion. Introduction to Information Retrieval. Cambridge University Press, New York.
  • Marcu, D. 2000. The rhetorical parsing of unrestricted texts: A surfacebased approach. Computational Linguistics, 26(3):395–448.
  • McCallum, A., K. Nigam, et al. 1998. A comparison of event models for naive bayes text classification. In AAAI-98 workshop on learning for text categorization, volume 752, pages 41–48. Madison, WI.
  • Platt, J. 1998. Sequential minimal optimization: A fast algorithm for training support vector machines. Technical report. MSRTR-98-14. Microsoft Research.
  • Schneider, K.-M. 2005. Techniques for improving the performance of naive bayes for text classification. Computational Linguistics and Intelligent Text Processing, pages 682–693.
  • Schuller, B. W., S. Steidl, A. Batliner, E. Nöth, A. Vinciarelli, F. Burkhardt, R. Van Son, F. Weninger, F. Eyben, T. Bocklet, G. Mohammadi, and B. Weiss. 2012. The interspeech 2012 speaker trait challenge. In Interspeech, volume 2012, pages 254–257.