Removing Noisy Mentions for Distant Supervision

  1. Intxaurrondo, Ander
  2. Surdeanu, Mihai
  3. López de Lacalle Lecuona, Oier
  4. Agirre Bengoa, Eneko
Revista:
Procesamiento del lenguaje natural

ISSN: 1135-5948

Año de publicación: 2013

Número: 51

Páginas: 41-48

Tipo: Artículo

Otras publicaciones en: Procesamiento del lenguaje natural

Resumen

Los metodos para Extraccion de Informacion basados en la Supervision a Distancia se basan en usar tuplas correctas para adquirir menciones de esas tuplas, y as entrenar un sistema tradicional de extraccion de informacion supervisado. En este artculo analizamos las fuentes de ruido en las menciones, y exploramos metodos sencillos para ltrar menciones ruidosas. Los resultados demuestran que combinando el ltrado de tuplas por frecuencia, la informacion mutua y la eliminacion de men- ciones lejos de los centroides de sus respectivas etiquetas mejora los resultados de dos modelos de extraccion de informacion signi cativamente.

Referencias bibliográficas

  • Berg-Kirkpatrick, Taylor, David Burkett, and Dan Klein. 2012. An empirical investigation of statistical significance in nlp. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, EMNLP-CoNLL '12, pages 995-1005, Stroudsburg, PA, USA. Association for Computational Linguistics.
  • Craven, Mark and Johan Kumlien. 1999. Constructing biological knowledge bases by extracting information from text sources. In Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology, pages 77-86. AAAI Press.
  • Hoffmann, Raphael, Congle Zhang, Xiao Ling, Luke Zettlemoyer, and Daniel S. Weld. 2011. Knowledge-based weak supervision for information extraction of overlapping relations. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1, HLT '11, pages 541-550, Stroudsburg, PA, USA. Association for Computational Linguistics.
  • Min, Bonan, Xiang Li, Ralph Grishman, and Sun Ang. 2012. New york university 2012 system for kbp slot lling. In Proceedings of the Fifth Text Analysis Conference (TAC 2012). National Institute of Standards and Technology (NIST).
  • Mintz, Mike, Steven Bills, Rion Snow, and Dan Jurafsky. 2009. Distant supervision for relation extraction without la- beled data. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2, ACL '09, pages 1003-1011, Stroudsburg, PA, USA. Association for Computational Linguistics.
  • Riedel, Sebastian, Limin Yao, and Andrew McCallum. 2010. Modeling relations and their mentions without labeled text. In Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases (ECML PKDD '10).
  • Sandhaus, Evan. 2008. The new york times annotated corpus. In Linguistic Data Consortium, Philadelphia.
  • Surdeanu, Mihai, Julie Tibshirani, Ramesh Nallapati, and Christopher D. Manning. 2012. Multi-instance multi-label learning for relation extraction. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, EMNLP-CoNLL '12, pages 455-465, Stroudsburg, PA, USA. Association for Computational Linguistics.