Removing Noisy Mentions for Distant Supervision

  1. Intxaurrondo, Ander
  2. Surdeanu, Mihai
  3. López de Lacalle Lecuona, Oier
  4. Agirre Bengoa, Eneko
Revista:
Procesamiento del lenguaje natural

ISSN: 1135-5948

Año de publicación: 2013

Número: 51

Páginas: 41-48

Tipo: Artículo

Otras publicaciones en: Procesamiento del lenguaje natural

Resumen

Relation Extraction methods based on Distant Supervision rely on true tuples to retrieve noisy mentions, which are then used to train traditional supervised relation extraction methods. In this paper we analyze the sources of noise in the mentions, and explore simple methods to lter out noisy mentions. The results show that a combination of mention frequency cut-o , Pointwise Mutual Information and removal of mentions which are far from the feature centroids of relation labels is able to signi cantly improve the results of two relation extraction models.

Referencias bibliográficas

  • Berg-Kirkpatrick, Taylor, David Burkett, and Dan Klein. 2012. An empirical investigation of statistical significance in nlp. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, EMNLP-CoNLL '12, pages 995-1005, Stroudsburg, PA, USA. Association for Computational Linguistics.
  • Craven, Mark and Johan Kumlien. 1999. Constructing biological knowledge bases by extracting information from text sources. In Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology, pages 77-86. AAAI Press.
  • Hoffmann, Raphael, Congle Zhang, Xiao Ling, Luke Zettlemoyer, and Daniel S. Weld. 2011. Knowledge-based weak supervision for information extraction of overlapping relations. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1, HLT '11, pages 541-550, Stroudsburg, PA, USA. Association for Computational Linguistics.
  • Min, Bonan, Xiang Li, Ralph Grishman, and Sun Ang. 2012. New york university 2012 system for kbp slot lling. In Proceedings of the Fifth Text Analysis Conference (TAC 2012). National Institute of Standards and Technology (NIST).
  • Mintz, Mike, Steven Bills, Rion Snow, and Dan Jurafsky. 2009. Distant supervision for relation extraction without la- beled data. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2, ACL '09, pages 1003-1011, Stroudsburg, PA, USA. Association for Computational Linguistics.
  • Riedel, Sebastian, Limin Yao, and Andrew McCallum. 2010. Modeling relations and their mentions without labeled text. In Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases (ECML PKDD '10).
  • Sandhaus, Evan. 2008. The new york times annotated corpus. In Linguistic Data Consortium, Philadelphia.
  • Surdeanu, Mihai, Julie Tibshirani, Ramesh Nallapati, and Christopher D. Manning. 2012. Multi-instance multi-label learning for relation extraction. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, EMNLP-CoNLL '12, pages 455-465, Stroudsburg, PA, USA. Association for Computational Linguistics.