Bilingual Dictionary Drafting:Bootstrapping WordNet and BabelNet

  1. David Lindemann
  2. Fritz Kliche
Libro:
Electronic lexicography in the 21st century: Proceedings of eLex 2017 conference
  1. Iztok Kosem (coord.)
  2. Carole Tiberius (coord.)
  3. Miloš Jakubíček (coord.)
  4. Jelena Kallas (coord.)
  5. Simon Krek (coord.)
  6. Vít Baisa (coord.)

Editorial: Lexical Computing

Año de publicación: 2017

Páginas: 23-42

Congreso: eLEX : Electronic lexicography in the 21st century (5. 2017. Leiden)

Tipo: Aportación congreso

Resumen

In this paper, we present a simple method for drafting sense-disambiguated bilingual dictionary content using lexical data extracted from merged wordnets, on the one hand, and from BabelNet, a very large resource built auto matically from wordnets and other sources, on the other. Our motivation for using English-Basque as a showcase is the fact that Basque is still lacking bilingual lexicographical products of significant size and quality for any combination other than with the five major European languages. At the same time, it is our aim to provide a comprehensive guide to bilingual dictionary content drafting using English as pivot language, by bootstrapping wordnet-like reso urces; an approach that may be of interest for lexicographers working on dictionary project s dealing with other combinations that have not been covered in lexicography but where su ch resources are available. We present our experiments, together with an evaluation, in two dimensions: (1) A quantitative evaluation by describing the intersections of the obtained vocabularies with a basic lemma list of Standard Basque, the language for which we intend to provide dictionary drafts, and (2) a manual qualitative evaluation by measuring the adequateness of the bootstrapped translation equivalences. We thus compare recall and precisi on of the applied dictionary drafting methods considering different subsets of the draft dict ionary data. We also discuss advantages and shortcomings of the described approach in gene ral, and draw conclusions about the usefulness of the selected sources in the lexicographical production process.