Application of singing synthesis techniquest to bertsolaritza

Supervised by:
  1. Eva Navas Cordón Director
  2. Inmaculada Hernáez Rioja Director

Defence university: Universidad del País Vasco - Euskal Herriko Unibertsitatea

Fecha de defensa: 18 December 2020

  1. Laura Docío Fernández Chair
  2. Ibon Saratxaga Couceiro Secretary
  3. George Kafentzis Committee member
  1. Ingeniería de Comunicaciones

Type: Thesis

Teseo: 153476 DIALNET lock_openADDI editor


This thesis focuses on the development of a new bertsolaritza singing voice synthesis system using as base original bertsolaritza live session recordings. The challenge of this work is not only the implementation of a singing voice synthesis system. The recorded corpus of bertsolaritza contains the transcriptions of improvised verses, but the audio files contain multiple elements that are not singing voice. As the majority of the recorded audios are live sessions, the voice of a speaker, applause of the public and noise are part of the database. In addition, the musical labeling of the singing voice is not included in the database. With a database of these properties, the aim of this work is to create methods to clean, segment and label the audios in the bertsolaritza and analyze the possibility of using them to create synthesis models for bertsolaritza singing voice synthesis.We have developed methods to automatically obtain the singing voice segments in the recordings, creating new speech and singing voice classification algorithms. The segmentation of bertso utterances and phonemes has been performed in a multi-singer database. The segmentation algorithms proposed have the capacity to align material from unseen bertsolaris in the future. After that, we analyzed the musical properties of the bertsolaritza art and compared the theoretical melodies in the database with the actual interpretation of them. We defined automatic systems to musically label the bertsolaritza singingvoice generating a fully labeled bertsolaritza database. Musical labeling included vibrato and we analyzedthe use of it in each bertsolari. We evaluated all automatic labeling systems in the process.After creating a labeled database of bertso recordings we generated singing voice synthesis systems usingHMMs and DNNs. We included fo normalization, tempo adaptation and vibrato prediction techniques inthese systems. We defined methods to automatically adapt music scores for each bertsolari consideringthe pitch range of each bertsolari. We evaluated synthesis models created for different bertsolaris in asubjective and objective way obtaining good results.The contributions of this thesis are related to bertsolaritza and singing voice synthesis. We added newinformation levels to the bertsolaritza corpus with the segmentation of singing voice, the alignment ofutterances and phonemes and the subsequent musical labeling. These labeling methods need no manualsupervision and therefore we created tools to increase the labeled database in the future. We created amulti-singer singing voice database that is considerably bigger than any state of the art singing voicedatabases. Finally we defined systems to synthesize bertsolaritza singing voice using different singers andtechnologies obtaining positive results.