Automatic voice pleasantness classification and intensity estimation for speech synthesis

  1. Martins Pinto-Coelho, Luis Filipe
Supervised by:
  1. Carmen García Mateo Director

Defence university: Universidade de Vigo

Fecha de defensa: 23 March 2012

  1. Inmaculada Hernáez Rioja Chair
  2. Eduardo Rodríguez Banga Secretary
  3. José Luis Alba Castro Committee member
  4. Francesc Josep Ferri Rabasa Committee member
  5. António Joaquim da Silva Teixeira Committee member

Type: Thesis

Teseo: 322070 DIALNET


Speech synthesis systems based on hidden Markov models (HMMs) have defined the beginning of a new generation of Text-to-Speech systems (TTS) technology. The stochastic-based models can simultaneously describe time and frequency domain events, while maintaining a powerful and highly flexible synthesis framework. Despite the several recognized advantages, some authors report a background buzz or a muffled voice, among other issues, which shows the need for improvements on the speech description/generation model. Since there are already several adaptations of vocoding technologies to the HMM synthesis framework and none could provide an entirely satisfying result, in this work a different approach is proposed. With the objective of improving syntactic voice quality, we propose the development of a perceptually weighted adaptive filter technique that can enhance parameter generation ability on time and frequency domains and on an intra-segmental basis. The adaptation strategy will be based on prosodic correlates of voice preference in contextualized TTS applications for maximizing voice intelligibility and overall naturalness. The proposed work will be entirely dedicated to the European Portuguese language which still lacks several resources and tools.