EusHeidelTime: Time Expression Extraction and Normalisation for Basque

  1. Begoña Altuna Díaz
  2. María Jesús Aranzabe Urruzola
  3. Arantza Díaz de Ilarraza Sánchez
Revista:
Procesamiento del lenguaje natural

ISSN: 1135-5948

Año de publicación: 2017

Número: 59

Páginas: 15-22

Tipo: Artículo

Otras publicaciones en: Procesamiento del lenguaje natural

Resumen

La información temporal ayuda a organizar la información textual situando las acciones y los estados en el tiempo. Por eso, es importante identificar los puntos e intervalos temporales en el texto, así como los tiempos a los que estos se refieren. Hemos desarrollado EusHeidelTime para la extracción y normalización de expresiones temporales para el euskera. Para ello, hemos analizado las expresiones temporales en euskera, hemos creado las reglas y recursos para la herramienta y hemos construido un corpus para el desarrollo y la evaluación. Finalmente, hemos realizado un experimento para evaluar el rendimiento de EusHeidelTime. Hemos conseguido resultados satisfactorios en una lengua con morfología rica.

Información de financiación

This work was financed by the Basque Government scholarship PRE 2016 2 294.

Financiadores

Referencias bibliográficas

  • Altuna, B., M. J. Aranzabe, and A. Dı́az de Ilarraza. 2014. Euskarazko denboraegiturak. Azterketa eta etiketatze esperimentua. Linguamática, 6(2):13–24, Dezembro.
  • Aramaki, E., Y. Miura, M. Tonoike, T. Ohkuma, H. Mashuichi, and K. Ohe. 2009. Text2table: Medical text summarization system based on named entity recognition and modality identification. In Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing, BioNLP ’09, pages 185– 192, Stroudsburg, PA, USA. Association for Computational Linguistics.
  • Bartalesi Lenzi, V., G. Moretti, and R. Sprugnoli. 2012. CAT: the CELCT Annotation Tool. In N. Calzolari, K. Choukri, T. Declerck, M. U. Doğan, B. Maegaard, J. Mariani, J. Odijk, and S. Piperidis, editors, Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC’12), pages 333–338, Istanbul, Turkey. European Language Resources Association (ELRA).
  • Bauer, S., S. Clark, and T. Graepel. 2015. Learning to Identify Historical Figures for Timeline Creation from Wikipedia Articles. In L. Aiello and D. E. McFarland, editors, SocInfo 2014 International Workshops, Barcelona, Spain, November 11, 2014, Revised Selected Papers, volume 8852 of Lecture Notes in Computer Science, pages 234–243, Barcelona, Spain.
  • Bethard, S. and J. H. Martin. 2013. ClearTK-TimeML: A minimalist approach to TempEval 2013. In S. Manandhar and D. Yuret, editors, Second Joint Conference on Lexical and Computational Semantics (*SEM) 2: Seventh International Workshop on Semantic Evaluation (SemEval 2013), pages 10–14, Stroudsburg, PA, USA. Association for Computational Linguistics.
  • Bittar, A. 2010. Building a TimeBank for French: a Reference Corpus Annotated According to the ISO-TimeML Standard. Ph.D. thesis, Université Paris Diderot, Paris.
  • Ferrucci, D. and A. Lally. 2004. UIMA: an architectural approach to unstructured information processing in the corporate research environment. Natural Language Engineering, 10(3–4):327–348.
  • Fokkens, A., A. Soroa, Z. Beloki, N. Ockeloen, G. Rigau, W. R. van Hage, and P. Vossen. 2014. NAF and GAF: Linking linguistic annotations. In Proceedings 10th Joint ISO-ACL SIGSEM Workshop on Interoperable Semantic Annotation, page 9, Reykjavik, Iceland.
  • Jang, S. B., J. Baldwin, and I. Mani. 2004. Automatic TIMEX2 Tagging of Korean News. ACM Transactions on Asian Language Information Processing (TALIP), 3(1):51–65, March.
  • Kawai, H., A. Jatowt, K. Tanaka, K. Kunieda, and K. Yamada. 2010. Chronoseeker: Search engine for future and past events. In Proceedings of the 4th International Conference on Uniquitous Information Management and Communication, ICUIMC ’10, pages 25:1–25:10.
  • Llorens, H., E. Saquete, and B. Navarro. 2010. TIPSem (English and Spanish): Evaluating CRFs and Semantic Roles in TempEval-2. In Proceedings of the 5th International Workshop on Semantic Evaluation, SemEval ’10, pages 284–291, Stroudsburg, PA, USA. Association for Computational Linguistics.
  • Mani, I. and G. Wilson. 2000. Robust Temporal Processing of News. In Proceedings of the 38th Annual Meeting on Association for Computational Linguistics, pages 69–76, Stroudsburg, PA, USA.
  • Mazur, P. and R. Dale. 2010. WikiWars: A New Corpus for Research on Temporal Expressions. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, EMNLP ’10, pages 913–922, Stroudsburg, PA, USA. Association for Computational Linguistics.
  • Minard, A.-L., M. Speranza, R. Urizar, B. na Altuna, M. van Erp, A. Schoen, and C. van Son. 2016. MEANTIME, the NewsReader Multilingual Event and Time Corpus. In N. Calzolari, K. Choukri, T. Declerck, S. Goggi, M. Grobelnik, B. Maegaard, J. Mariani, H. Mazo, A. Moreno, J. Odijk, and S. Piperidis, editors, Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), Paris, France, may. European Language Resources Association (ELRA).
  • Moriceau, V. and X. Tannier. 2014. French Resources for Extraction and Normalization of Temporal Expressions with HeidelTime. In N. Calzolari, K. Choukri, T. Declerck, H. Loftsson, B. Maegaard, J. Mariani, A. Moreno, J. Odijk, and S. Piperidis, editors, Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC 2014), Reykjavik, Iceland, may. European Language Resources Association (ELRA).
  • Otegi, A., N. Ezeiza, I. Goenaga, and G. Labaka. 2016. A Modular Chain of NLP Tools for Basque. In P. Sojka, A. Horák, I. Kopeček, and K. Pala, editors, Proceedings of the 19th International Conference on Text, Speech and Dialogue — TSD 2016, Brno, Czech Republic, volume 9924 of Lecture Notes in Artificial Intelligence, pages 93–100. Springer International Publishing.
  • Pustejovsky, J., M. Verhagen, R. Sauŕı, J. Littman, R. Gaizauskas, G. Katz, I. Mani, R. Knippen, and A. Setzer. 2006. TimeBank 1.2. Technical report, Linguistic Data Consortium.
  • Radinsky, K. and E. Horvitz. 2013. Mining the web to predict future events. In Proceedings of the sixth ACM international conference on Web search and data mining, pages 255–264. ACM.
  • Skukan, L., G. Glavaš, and J. Šnajder. 2014. HeidelTime.Hr: Extracting and Normalizing Temporal Expressions in Croatian. In Proceedings of the Nineth Language Technologies Conference, pages 99–103. Information Society.
  • Strötgen, J. and M. Gertz. 2010." HeidelTime: High Quality Rule-based Extraction and Normalization of Temporal Expressions. In Proceedings of the 5th International Workshop on Semantic Evaluation, SemEval ’10, pages 321–324, Stroudsburg, PA, USA. Association for Computational Linguistics.
  • Strötgen, J. and M. Gertz. 2011. WikiWarsDE: a German Corpus of Narratives Annotated with Temporal Expressions. In H. Hedeland, T. Schmidt, and K. Wörner, editors, Multilingual Resources and Multilingual Applications. Proceedings of the Conference of the German Society for Computational Linguistics and Language Technology (GSCL) 2011, pages 129–134, Hamburg University.
  • Strötgen, J. and M. Gertz. 2013. Multilingual and Cross-domain Temporal Tagging. Language Resources and Evaluation, 47(2):269–298.
  • TimeML Working Group. 2010. TimeML Annotation Guidelines version 1.3. Manuscript. Technical report, Brandeis University.
  • UzZaman, N., H. Llorens, J. F. Allen, L. Derczynski, M. Verhagen, and J. Pustejovsky. 2013. TempEval-3: Evaluating Events, Time Expressions, and Temporal Relations. In S. Manandhar and D. Yuret, editors, Second Joint Conference on Lexical and Computational Semantics (*SEM) 2: Seventh International Workshop on Semantic Evaluation (SemEval 2013), volume 2, pages 1–9, Stroudsburg, PA, USA. Association for Computational Linguistics.
  • van de Camp, M. and H. Christiansen. 2013. Resolving relative time expressions in Dutch text with Constraint Handling Rules. In Revised Selected Papers of the 7th International Workshop on Constraint Solving and Language Processing Volume 8114, CSLP 2012, pages 166– 177, New York, NY, USA. Springer-Verlag New York, Inc.
  • Verhagen, M. and J. Pustejovsky. 2008. Temporal Processing with the TARSQI Toolkit. In 22n d International Conference on on Computational Linguistics: Demonstration Papers, COLING ’08, pages 189–192, Stroudsburg, PA, USA. Association for Computational Linguistics.
  • Wu, M., W. Li, Q. Lu, and B. Li. 2005. CTEMP: A Chinese Temporal Parser for Extracting and Normalizing Temporal Information. In R. Dale, K.-F. Wong, J. Su, and O. Y. Kwong, editors, Natural Language Processing – IJCNLP 2005: Second International Joint Conference, Jeju Island, Korea, October 11-13, 2005. Proceedings, pages 694–706, Berlin, Heidelberg. Springer.