Verification of the four Spanish official languages on TV show recordings

  1. Varona, A.
  2. Peñagaricano Badiola, Mikel
  3. Rodríguez Fuentes, Luis Javier
  4. Díez, M.
  5. Bordel García, Germán
Journal:
Procesamiento del lenguaje natural

ISSN: 1135-5948

Year of publication: 2010

Issue: 45

Pages: 95-104

Type: Article

More publications in: Procesamiento del lenguaje natural

Abstract

This paper presents language recognition results obtained for the four official Spanish languages: Spanish, Catalan, Basque and Galician. Results were obtained in closed and open tests (these latter including segments in French, Portuguese, German or English) on a subset of 30 second segments. A detailed study per target language is also included. Experiments were carried out on the KALAKA database, especially recorded for The Albayzin 2008 Language Recognition Evaluation. The main verification system resulted from the fusion of an acoustic system and 6 phonotactic subsystems. To model the target language, the acoustic subsystem takes information from the spectral characteristics of the audio signal, whereas phonotactic subsystems use sequences of phones produced by several acoustic-phonetic decoders. The best fused system attained a 3,58 % EER and CLLR = 0.30 in closed tests, which means 24,5 % improvement with regard to the best result obtained in the Albayzin 2008 LRE.

Bibliographic References

  • Auckenthaler, R., M. Carey, and H. Lloyd- Thomas. 2000. Score normalization for text-independent speaker verification systems. Digital Signal Processing, 10(1- 3):42–54, January.
  • Brümmer, N. and J. A. du Preez. 2006. Application-independent evaluation of speaker detection. Computer Speech & Language, 20(2-3):230–275.
  • Brümmer, N. and D.A. van Leeuwen. 2006. On calibration of language recognition scores. In Proceedings of Odyssey - The Speaker and Language Recognition Workshop, pages 1–8. Campbell, W. M., J. P. Campbell, D. A. Reynolds, E. Singer, and P. A. Torres- Carrasquillo. 2006. Support vector machines for speaker and language recognition. Computer Speech and Language, 20(2-3):210–229.
  • Collobert, R. and S. Bengio. 2001. SVM Torch: Support Vector Machines for Large-Scale Regression Problems. The Journal of Machine Learning Research, 1:143–160.
  • Fan, Rong-En, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui Wang, and Chih-Jen Lin. 2008. LIBLINEAR – A Library for Large Linear Classification. The Journal of Machine Learning Research, 9:1871– 1874, June.
  • FoCal, 2008. Toolkit for Evaluation, Fusion and Calibration of statistical pattern recognizers. http://sites.google.com/site/nikobrummer /focal.
  • JTH, 2008. 5th Biennial Workshop on Speech Technology. Bilbao, Spain, 12-14 November. http://jth2008.ehu.es/en/index.html.
  • Martin, A., G. Doddington, T. Kamm, M. Ordowski, and M. Przybocki. 1997. The DET Curve in Assessment of Detection Task Performance. In Proceedings of Eurospeech, pages 1985–1988.
  • Martin, A.F. and A.N. Le. 2008. NIST 2007 Language Recognition Evaluation. In Proceedings of Odyssey 2008 - The Speaker and Language Recognition Workshop, paper 16, Stellenbosch, South Africa.
  • Matejka, P., L. Burget, O. Glembek, P. Schwarz,V. Hubeika, M. Fapso, T. Mikolov, and O. Plchot. 2007. BUT system description for NIST LRE 2007. In Proc. 2007 NIST Language Recognition Evaluation Workshop, pages 1–5, Orlando, US. National Institute of Standards and Technology. Penagarikano, M., A.Varona, M. Zamalloa,
  • L.J. Rodriguez, G. Bordel, and J. P. Uribe. 2009. University of the Basque Country + Ikerlan System for NIST 2009 Language Recognition Evaluation. In 2009 NIST Language Recognition Evaluation (LRE) Workshop, Baltimore, MD, USA.
  • Penagarikano, M. and G. Bordel. 2005. Sautrela: A Highly Modular Open Source Speech Recognition Framework. In Proceedings of the ASRU Workshop, pages 386–391, San Juan, Puerto Rico, December.
  • Penagarikano, M., G. Bordel, L.J. Rodriguez, and J. P. Uribe. 2007. University of the Basque Country + Ikerlan System for NIST 2007 Language Recognition Evaluation. In 2007 NIST Language Recognition Evaluation (LRE) Workshop, Orlando, Florida, USA.
  • Penagarikano, M., A. Varona, L.J. Rodriguez-Fuentes, and G. Bordel. 2010a. Improved Modeling of Cross- Decoder Phone Co-occurrences in SVM-based Phonotactic Language Recognition. In Odyssey 2010: The Speaker and Language Recognition Workshop, Brno, Czech Republic.
  • Penagarikano, M., A. Varona, L.J. Rodriguez-Fuentes, and G. Bordel. 2010b. Using cross-decoder phone co-ocurrences in phonotactic language recognition. In Proceedings of the 35th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2010), pages 5034–5037, Dallas, Texas (USA).
  • Richardson, F. andW. Campbell. 2008. Language recognition with discriminative keyword selection. In Proceedings of ICASSP 2008, pages 4145–4148.
  • Rodriguez-Fuentes, L. J., M. Penagarikano, G. Bordel, and A. Varona. 2010a. The Albayzin 2008 Language Recognition Evaluation. In Odyssey 2010: The Speaker and Language Recognition Workshop, Brno, Czech Republic, 28 June - 1 July.
  • Rodriguez-Fuentes, L. J., M. Penagarikano, G. Bordel, A. Varona, and M. Diez. 2010b. KALAKA: A TV broadcast speech database for the evaluation of language recognition systems. In Proceedings of the 7th International Conference on Language Resources and Evaluation (LREC 2010), Valleta, Malta, 17-23 May.
  • Schwarz, Petr. 2008. Phoneme recognition based on long temporal context. Ph.D. thesis, Faculty of Information Technology, BUT, Brno, CZ.
  • Stolcke, A. 2002. SRILM - an extensible language modeling toolkit. In Proc. Intl. Conf. on Spoken Language Processing, pages 257–286, November.
  • Torres-Carrasquillo, P.A., E. Singer, W.M. Campbell, T. Gleason, A. McCree, D.A. Reynolds, F. Richardson, W. Shen, and D.E. Sturim. 2008. The MITLL NIST LRE 2007 language recognition system. In Proceedings of Interspeech 2008, pages 719–722.
  • Torres-Carrasquillo, P.A., E. Singer, T. Gleason, A. McCree, D.A. Reynolds, F. Richardson, and D.E. Sturim. 2010. The MITLL NIST LRE 2009 language recognition system. In Proceedings of ICASS 2010, pages 4994–4997.
  • Zissman, M.A. 1996. Comparison of four approaches to automatic language identification of telephone speech. IEEE Transactions on Speech and Audio Processing, 4(1):31–44, January.