Verification of the four Spanish official languages on TV show recordings

  1. Varona, A.
  2. Peñagaricano Badiola, Mikel
  3. Rodríguez Fuentes, Luis Javier
  4. Díez, M.
  5. Bordel García, Germán
Revista:
Procesamiento del lenguaje natural

ISSN: 1135-5948

Año de publicación: 2010

Número: 45

Páginas: 95-104

Tipo: Artículo

Otras publicaciones en: Procesamiento del lenguaje natural

Resumen

En este trabajo se presentan resultados de verificación sobre las cuatro lenguas oficiales españolas: castellano, catalán, euskera y gallego. Se analizan los resultados obtenidos en tests cerrados y abiertos (estos últimos incluyendo segmentos en francés, portugués, alemán o inglés) y considerando segmentos de voz de 30 segundos. Se realiza también un estudio detallado del rendimiento del sistema por cada lengua objetivo. Se usa la base de datos KALAKA creada especialmente para la Evaluación Albayzín 2008 de sistemas de verificación de la lengua. El sistema de verificación principal resulta de la fusión de un sistema acústico y 6 subsistemas fonotácticos. El sistema acústico toma información de las características espectrales de la señal de audio, mientras que los sistemas fonotácticos utilizan secuencias de fonemas producidas por varios decodificadores acústicos. En este trabajo se alcanza una tasa EER= 3,58 % y un coste CLLR = 0.30 en test cerrado, lo que implica una mejora relativa del 24,5 % con respecto a los mejores resultados obtenidos en la evaluación Albayzin 2008 VL.

Referencias bibliográficas

  • Auckenthaler, R., M. Carey, and H. Lloyd- Thomas. 2000. Score normalization for text-independent speaker verification systems. Digital Signal Processing, 10(1- 3):42–54, January.
  • Brümmer, N. and J. A. du Preez. 2006. Application-independent evaluation of speaker detection. Computer Speech & Language, 20(2-3):230–275.
  • Brümmer, N. and D.A. van Leeuwen. 2006. On calibration of language recognition scores. In Proceedings of Odyssey - The Speaker and Language Recognition Workshop, pages 1–8. Campbell, W. M., J. P. Campbell, D. A. Reynolds, E. Singer, and P. A. Torres- Carrasquillo. 2006. Support vector machines for speaker and language recognition. Computer Speech and Language, 20(2-3):210–229.
  • Collobert, R. and S. Bengio. 2001. SVM Torch: Support Vector Machines for Large-Scale Regression Problems. The Journal of Machine Learning Research, 1:143–160.
  • Fan, Rong-En, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui Wang, and Chih-Jen Lin. 2008. LIBLINEAR – A Library for Large Linear Classification. The Journal of Machine Learning Research, 9:1871– 1874, June.
  • FoCal, 2008. Toolkit for Evaluation, Fusion and Calibration of statistical pattern recognizers. http://sites.google.com/site/nikobrummer /focal.
  • JTH, 2008. 5th Biennial Workshop on Speech Technology. Bilbao, Spain, 12-14 November. http://jth2008.ehu.es/en/index.html.
  • Martin, A., G. Doddington, T. Kamm, M. Ordowski, and M. Przybocki. 1997. The DET Curve in Assessment of Detection Task Performance. In Proceedings of Eurospeech, pages 1985–1988.
  • Martin, A.F. and A.N. Le. 2008. NIST 2007 Language Recognition Evaluation. In Proceedings of Odyssey 2008 - The Speaker and Language Recognition Workshop, paper 16, Stellenbosch, South Africa.
  • Matejka, P., L. Burget, O. Glembek, P. Schwarz,V. Hubeika, M. Fapso, T. Mikolov, and O. Plchot. 2007. BUT system description for NIST LRE 2007. In Proc. 2007 NIST Language Recognition Evaluation Workshop, pages 1–5, Orlando, US. National Institute of Standards and Technology. Penagarikano, M., A.Varona, M. Zamalloa,
  • L.J. Rodriguez, G. Bordel, and J. P. Uribe. 2009. University of the Basque Country + Ikerlan System for NIST 2009 Language Recognition Evaluation. In 2009 NIST Language Recognition Evaluation (LRE) Workshop, Baltimore, MD, USA.
  • Penagarikano, M. and G. Bordel. 2005. Sautrela: A Highly Modular Open Source Speech Recognition Framework. In Proceedings of the ASRU Workshop, pages 386–391, San Juan, Puerto Rico, December.
  • Penagarikano, M., G. Bordel, L.J. Rodriguez, and J. P. Uribe. 2007. University of the Basque Country + Ikerlan System for NIST 2007 Language Recognition Evaluation. In 2007 NIST Language Recognition Evaluation (LRE) Workshop, Orlando, Florida, USA.
  • Penagarikano, M., A. Varona, L.J. Rodriguez-Fuentes, and G. Bordel. 2010a. Improved Modeling of Cross- Decoder Phone Co-occurrences in SVM-based Phonotactic Language Recognition. In Odyssey 2010: The Speaker and Language Recognition Workshop, Brno, Czech Republic.
  • Penagarikano, M., A. Varona, L.J. Rodriguez-Fuentes, and G. Bordel. 2010b. Using cross-decoder phone co-ocurrences in phonotactic language recognition. In Proceedings of the 35th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2010), pages 5034–5037, Dallas, Texas (USA).
  • Richardson, F. andW. Campbell. 2008. Language recognition with discriminative keyword selection. In Proceedings of ICASSP 2008, pages 4145–4148.
  • Rodriguez-Fuentes, L. J., M. Penagarikano, G. Bordel, and A. Varona. 2010a. The Albayzin 2008 Language Recognition Evaluation. In Odyssey 2010: The Speaker and Language Recognition Workshop, Brno, Czech Republic, 28 June - 1 July.
  • Rodriguez-Fuentes, L. J., M. Penagarikano, G. Bordel, A. Varona, and M. Diez. 2010b. KALAKA: A TV broadcast speech database for the evaluation of language recognition systems. In Proceedings of the 7th International Conference on Language Resources and Evaluation (LREC 2010), Valleta, Malta, 17-23 May.
  • Schwarz, Petr. 2008. Phoneme recognition based on long temporal context. Ph.D. thesis, Faculty of Information Technology, BUT, Brno, CZ.
  • Stolcke, A. 2002. SRILM - an extensible language modeling toolkit. In Proc. Intl. Conf. on Spoken Language Processing, pages 257–286, November.
  • Torres-Carrasquillo, P.A., E. Singer, W.M. Campbell, T. Gleason, A. McCree, D.A. Reynolds, F. Richardson, W. Shen, and D.E. Sturim. 2008. The MITLL NIST LRE 2007 language recognition system. In Proceedings of Interspeech 2008, pages 719–722.
  • Torres-Carrasquillo, P.A., E. Singer, T. Gleason, A. McCree, D.A. Reynolds, F. Richardson, and D.E. Sturim. 2010. The MITLL NIST LRE 2009 language recognition system. In Proceedings of ICASS 2010, pages 4994–4997.
  • Zissman, M.A. 1996. Comparison of four approaches to automatic language identification of telephone speech. IEEE Transactions on Speech and Audio Processing, 4(1):31–44, January.