Archivio istituzionale della ricerca dell'Università degli Studi di Palermo

In this paper we discuss the rational of the Multi-model Information based Speech Processing (MISP) Challenge, and provide a detailed description of the data recorded, the two evaluation tasks and the corresponding baselines, followed by a summary of submitted systems and evaluation results. The MISP Challenge aims at tack-ling speech processing tasks in different scenarios by introducing information about an additional modality (e.g., video, or text), which will hopefully lead to better environmental and speaker robustness in realistic applications. In the first MISP challenge, two bench-mark datasets recorded in a real-home TV room with two reproducible open-source baseline systems have been released to promote research in audio-visual wake word spotting (AVWWS) and audio-visual speech recognition (AVSR). To our knowledge, MISP is the first open evaluation challenge to tackle real-world issues of AVWWS and AVSR in the home TV scenario.

Chen, H., Zhou, H., Du, J., Lee, C., Chen, J., Watanabe, S., et al. (2022). The First Multimodal Information Based Speech Processing (Misp) Challenge: Data, Tasks, Baselines And Results. In 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 9266-9270) [10.1109/ICASSP43922.2022.9746683].

The First Multimodal Information Based Speech Processing (Misp) Challenge: Data, Tasks, Baselines And Results

Zhou, Hengshun;Du, Jun;Lee, Chin-Hui;Chen, Jingdong;Watanabe, Shinji;Siniscalchi, Sabato Marco^Supervision;Scharenborg, Odette;Liu, Di-Yuan;Yin, Bao-Cai;Pan, Jia;Gao, Jian-Qing;Liu, Cong

2022-01-01

Abstract

In this paper we discuss the rational of the Multi-model Information based Speech Processing (MISP) Challenge, and provide a detailed description of the data recorded, the two evaluation tasks and the corresponding baselines, followed by a summary of submitted systems and evaluation results. The MISP Challenge aims at tack-ling speech processing tasks in different scenarios by introducing information about an additional modality (e.g., video, or text), which will hopefully lead to better environmental and speaker robustness in realistic applications. In the first MISP challenge, two bench-mark datasets recorded in a real-home TV room with two reproducible open-source baseline systems have been released to promote research in audio-visual wake word spotting (AVWWS) and audio-visual speech recognition (AVSR). To our knowledge, MISP is the first open evaluation challenge to tackle real-world issues of AVWWS and AVSR in the home TV scenario.

Scheda breve

Scheda completa

Scheda completa (DC)

	Data
	
				2022
			
	ISBN della monografia 
DATO PREVISTO SU LOGINMIUR
	
				978-1-6654-0540-9
			
	DOI del contributo 
DATO PREVISTO SU LOGINMIUR
	
				https://dx.doi.org/10.1109/ICASSP43922.2022.9746683
			
	URL alternativo rispetto a quello dell'editore 
DATO PREVISTO SU LOGINMIUR
	
				https://ieeexplore.ieee.org/document/9746683
			
	Citazione
	
				Chen, H., Zhou, H., Du, J., Lee, C., Chen, J., Watanabe, S., et al. (2022). The First Multimodal Information Based Speech Processing (Misp) Challenge: Data, Tasks, Baselines And Results. In 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 9266-9270) [10.1109/ICASSP43922.2022.9746683].
			
	Appare nelle tipologie:
	
				2.07 Contributo in atti di convegno pubblicato in volume

File in questo prodotto:

File	Dimensione	Formato
The_First_Multimodal_Information_Based_Speech_Processing_Misp_Challenge_Data_Tasks_Baselines_And_Results.pdf Solo gestori archvio Tipologia: Versione Editoriale Dimensione 864.94 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	864.94 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/636618

Citazioni

ND

61

44

social impact