AMICaffective multimedia analytics with inclusive and natural communication

  1. Torres Barañano, María Inés
  2. Justo Blanco, Raquel
  3. Ortega Giménez, Alfonso
  4. Lleida Solano, Eduardo
  5. San Segundo Hernández, Rubén
  6. Ferreiros López, Javier
  7. Hurtado Oliver, Lluís Felip
  8. Sanchis Arnal, Emilio
Revista:
Procesamiento del lenguaje natural

ISSN: 1135-5948

Año de publicación: 2018

Número: 61

Páginas: 147-150

Tipo: Artículo

Otras publicaciones en: Procesamiento del lenguaje natural

Resumen

Traditionally, textual content has been the main source of information extraction and indexing, and other technologies that are capable of extracting information from the audio and video of multimedia documents have joined later. Other major axis of analysis is the emotional and affective aspect intrinsic in human communication. This information of emotions, stances, preferences, figurative language, irony, sarcasm, etc. is fundamental and irreplaceable for a complete understanding of the content in conversations, speeches, debates, discussions, etc. The objective of this project is focused on advancing, developing and improving speech and language technologies as well as image and video technologies in the analysis of multimedia content adding to this analysis the extraction of affective-emotional information. As additional steps forward, we will advance in the methodologies and ways for presenting the information to the user, working on technologies for language simplification, automatic reports and summary generation, emotional speech synthesis and natural and inclusive interaction

Información de financiación

This work is supported by Ministerio de Economía y Competitividad under the grants TIN2017-85854-C4-(1,2,3,4)-R.

Financiadores

Referencias bibliográficas

  • Amodei, D., S. Ananthanarayanan, R. Anubhai, , J. Bai, E. Battenberg, C. Case and J. Chen. 2016. Deep speech 2: End-to-end speech recognition in English and mandarin. In Int. Conf. on Machine Learning, pp. 173-182.
  • Deng, L. 2016. Deep learning: from speech recognition to language and multimodal processing. APSIPA Transactions on Signal and Information Processing.
  • Ferreiros, J., J.M. Pardo, L.F. Hurtado, E. Segarra, A. Ortega, E. Lleida, M.I. Torres, and R. Justo, 2016. ASLP-MULAN: Audio speech and language processing for multimedia analytics. Procesamiento del Lenguaje Natural, Vol 57, pp.147-150.
  • García P., E. Lleida, D. Castán, J.M. Marcos, and D. Romero, 2015. Context-Aware Communicator for All. In Universal Access in Human-Computer Interaction. Lecture Notes in Computer Science, vol 9175. Springer. Hinton, G. E., O. Simon and Y.W. The, 2006. A fast learning algorithm for deep belief nets. Neural computation 18.7, pp. 1527-1554.
  • Hinton, G., L. Deng, D. Yu, G.E. Dahl, A. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen and T.N. Sainath 2012. Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups, Signal Processing Magazine, IEEE, vol. 29, no. 6, p. 829.
  • Hurtado, L., E. Segarra, F. Pla, P. Carrasco and J.A. González 2017. ELiRF-UPV at SemEval-2017 Task 7: Pun Detection and Interpretation. In Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017).
  • Justo, R., T. Corcoran, S. Lukin, M. Walker and M.I. Torres 2014. Extracting relevant knowledge for the detection of sarcasm and nastiness in the social web, Knowledge-Based Systems 69. Krizhevsky, A., I. Sutskever and G. Hinton 2012. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems.
  • Lorenzo-Trueba J., R. Barra-Chicote, R. San-Segundo, J. Ferreiros, J. Yamagishi and J.M. Montero 2015. Emotion Transplantation through Adaptation in HMM-based Speech Synthesis. Computer Speech and Language. Volume 34, Issue 1, pp. 292–307.
  • Martinez-González, B., J.M. Pardo, R. San-Segundo, and J.M. Montero 2016. Influence of Transition Cost in the Segmentation Stage of Speaker Diarization. In Proc of Odyssey, Bilbao-Spain.
  • Miguel, A., J. Llombart, A. Ortega, and E. Lleida 2017 Tied Hidden Factors in Neural Networks for End-to-End Speaker Recognition. In Proc. of Interspeech.
  • Mikolov, T., K. Chen, G. Corrado, and J. Dean 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.
  • Viñals, I., A. Ortega, J. Villalba, A. Miguel and E. Lleida 2017. Domain Adaptation of PLDA models in Broadcast Diarization by means of Unsupervised Speaker Clustering. Proc. Interspeech 2017.
  • Zhang, K., W.L. Chao, F. Sha and K. Grauman 2016. Video Summarization with Long Short-term Memory, arXiv:1605.08110v2.