Chunk and clause identification for basque by filtering and ranking with perceptrons

  1. Alegría Loinaz, Iñaki
  2. Arrieta Kortajarena, Bertol
  3. Carreras, Xavier
  4. Díaz de Ilarraza Sánchez, Arantza
  5. Uria Garin, Larraitz
Journal:
Procesamiento del lenguaje natural

ISSN: 1135-5948

Year of publication: 2008

Issue: 41

Pages: 5-12

Type: Article

More publications in: Procesamiento del lenguaje natural

Abstract

This paper presents systems for syntactic chunking and clause identification for Basque, combining rule-based grammars with machine-learning techniques. Precisely, we used Filtering-Ranking with Perceptrons (Carreras, Màrquez and Castro, 2005): a learning model that recognizes partial syntactic structures in sentences, obtaining state-of-the-art performance for these tasks in English. This model allows incorporating a rich set of features to represent syntactic phrases, making possible to use information from different sources. We used this property in order to include more linguistic features in the learning model and the results obtained in chunking have been improved greatly. This way, we have made up for the relatively small training data available for Basque to learn a chunking model. In the case of clause identification, our preliminary results are low, which suggest that this is due to the free order of Basque and to the small corpus available.