Speech intelligibility enhancement and glimpse-based intelligibility models for known noise conditions

  1. TANG ---, YAN
Supervised by:
  1. Martin Cooke Director
  2. Maria Luisa Garcia Lecumberri Director

Defence university: Universidad del País Vasco - Euskal Herriko Unibertsitatea

Fecha de defensa: 26 June 2014

  1. Inmaculada Hernáez Rioja Chair
  2. Daniel Erro Eslava Secretary
  3. Phil Green Committee member
  4. Juan Manuel Montero Committee member
  5. Climent Nadeu Camprubí Committee member
  1. Filología Inglesa y Alemana y Traducción e Intepretación

Type: Thesis

Teseo: 117910 DIALNET


In this thesis, a series of context-sensitive speech modifications are proposed which aim at improving intelligibility in known noise conditions. In addition, attempts are made to improve the predictive power of objective intelligibility models for both modified and synthetic speech. The first study designed modifications which reallocate speech energy across time and frequency, with the aim of reducing masking under the constraints of constant input-output energy and preserved duration. Subjective evaluations demonstrated that reallocating energy to just-audible time-frequency regions at mid-high frequencies is the most beneficial strategy for listeners. A second study explored noise-dependent optimised spectral weightings derived using a genetic algorithm-based optimisation procedure, with an objective intelligibility measure (OIM) ¿ the glimpse proportion (GP) as the objective function. A clear dependence of noise type and global signal-to-noise ratio on energy reallocation was observed with the discovery of sparse, highly-selective spectral weightings, particularly in high noise conditions. In a subjective test using both stationary noise and competing speech maskers, spectral-weighted speech was more intelligible than unmodified speech in noise. In a third study, seven state-of-the-art OIMs were evaluated using three datasets from different subjective experiments, in which both synthetic speech and speech modified by algorithms designed to boost intelligibility were applied in a range of noise masking conditions. The GP metric is further extended. Four glimpse-based OIMs are also proposed to improve the predictive power of objective intelligibility models for both modified and synthetic speech. The proposed OIMs outperformed other evaluated metrics in various assessment conditions. In the final study, noise-dependent optimised spectral weightings are investigated further by maximising one of the proposed OIMs based on high-energy glimpses. In a subjective test, large intelligibility gains were seen for spectral-weighted speech in different noise conditions. Static noise-independent spectral weightings are also designed. A further listening test suggested that noise-independent spectral weightings can be as effective as the noise-dependent adaptive boosting suggested by the optimisation if a sufficient number of frequency channels are boosted. This modification has potential application to boosting speech intelligibility for public address systems and personal mobile devices.