Reentrenamientoaprendizaje semisupervisado de los sentidos de las palabras
- Palomar Sanz, Manuel
- Rigau Claramunt, Germán
- Suárez Cueto, Armando
ISSN: 1135-5948
Year of publication: 2005
Issue: 34
Pages: 49-66
Type: Article
More publications in: Procesamiento del lenguaje natural
Abstract
This paper presents re-training, a bootstrapping algorithm that automatically acquires semantically annotated data, ensuring high levels of precision. This algorithm uses a corpus-based system of word sense disambiguation that relies on maximum entropy probability models. The re-training method consists of the iterative feeding of training-classification cycles with new and high-confidence examples. The process relies on several filters that ensure the accuracy of the disambiguation by discarding uncertain classifications. This new method is inspired by co-training algorithms, but it makes stronger assumptions on when to assign a label to a linguistic contex