Chunk and clause identification for basque by filtering and ranking with perceptrons

  1. Alegría Loinaz, Iñaki
  2. Arrieta Kortajarena, Bertol
  3. Carreras, Xavier
  4. Díaz de Ilarraza Sánchez, Arantza
  5. Uria Garin, Larraitz
Aldizkaria:
Procesamiento del lenguaje natural

ISSN: 1135-5948

Argitalpen urtea: 2008

Zenbakia: 41

Orrialdeak: 5-12

Mota: Artikulua

Beste argitalpen batzuk: Procesamiento del lenguaje natural

Laburpena

This paper presents systems for syntactic chunking and clause identification for Basque, combining rule-based grammars with machine-learning techniques. Precisely, we used Filtering-Ranking with Perceptrons (Carreras, Màrquez and Castro, 2005): a learning model that recognizes partial syntactic structures in sentences, obtaining state-of-the-art performance for these tasks in English. This model allows incorporating a rich set of features to represent syntactic phrases, making possible to use information from different sources. We used this property in order to include more linguistic features in the learning model and the results obtained in chunking have been improved greatly. This way, we have made up for the relatively small training data available for Basque to learn a chunking model. In the case of clause identification, our preliminary results are low, which suggest that this is due to the free order of Basque and to the small corpus available.