Construcción de una base de conocimiento léxico multilíngüe de amplia coberturaMultilingual Central Repository

  1. Gonzalez-Agirre, Aitor 1
  2. Rigau, German
  1. 1 Departamento de Informática, Universidade do Minho
Journal:
Linguamática

ISSN: 1647-0818

Year of publication: 2013

Volume: 5

Issue: 1

Pages: 13-28

Type: Article

More publications in: Linguamática

Abstract

The use of wide coverage and general domain semantic resources has become a common practice and often necesary by existing systems Natural Language Processing (NLP). WordNet is by far the most widely used semantic resource in NLP. Following the success of WordNet, the EuroWordNet project has designed a multilingual semantic infrastructure to develop wordnets for a set of European languages. In EuroWordNet, these wordnets are interconnected with links stored in the Inter-Lingual Index (ILI). Following the EuroWordNet architecture, the MEANING project has developed the first versions of Multilingual Central Repository (MCR) using WordNet 1.6 as ILI. Thus, maintaining the compatibility between wordnets of different languages ​​and versions. This version of the MCR integrates six different versions of the English WordNet (1.6 to 3.0) and wordnets in Spanish, Catalan, Basque and Italian, along with more than a million semantic relationships between concepts and semantic properties different ontologies. We recently developed a new version of MCR using WordNet 3.0 as ILI. This new version of the MCR integrates wordnets of five different languages: English, Spanish, Catalan, Basque and Galician. The current version of MCR, like the previous one, systematically integrates thousands of semantic relations between concepts. In addition, the MCR is enriched with about 460,000 semantic and ontological properties including Base Level Concepts, Top Ontology, WordNet Domains and AdimenSUMO, providing all ontological consistency the integrated semantic wordnets and resources on it.