Reentrenamientoaprendizaje semisupervisado de los sentidos de las palabras

  1. Palomar Sanz, Manuel
  2. Rigau Claramunt, Germán
  3. Suárez Cueto, Armando
Journal:
Procesamiento del lenguaje natural

ISSN: 1135-5948

Year of publication: 2005

Issue: 34

Pages: 49-66

Type: Article

More publications in: Procesamiento del lenguaje natural

Abstract

This paper presents re-training, a bootstrapping algorithm that automatically acquires semantically annotated data, ensuring high levels of precision. This algorithm uses a corpus-based system of word sense disambiguation that relies on maximum entropy probability models. The re-training method consists of the iterative feeding of training-classification cycles with new and high-confidence examples. The process relies on several filters that ensure the accuracy of the disambiguation by discarding uncertain classifications. This new method is inspired by co-training algorithms, but it makes stronger assumptions on when to assign a label to a linguistic contex