From dependencies to constituents in the reference corpus for the processing of Basque (EPEC)

  1. Díaz de Ilarraza Sánchez, Arantza
  2. Fernández Terrones, Enrique
  3. Aldezabal Roteta, Izaskun
  4. Aranzabe Urruzola, María Jesús
Journal:
Procesamiento del lenguaje natural

ISSN: 1135-5948

Year of publication: 2008

Issue: 41

Pages: 147-154

Type: Article

More publications in: Procesamiento del lenguaje natural

Abstract

In this paper the process for turning a dependency-based corpus to a constituent- based one is explained. For this purpose, first both the Dependency and the Constituent formalism are analized and then the corresponding equivalences of linguistic phenomena are treated. This process has had different phases in which the linguistic equivalences have been improved. Finally, the evaluation process is briefly explained and, as a result, we get corpora annotated in the two different formalisms usually proposed for syntactic tagging. If the linguistic equivalences are the same, the conversion process could be expanded to other corpus; otherwise, new equivalences should be defined.