Archivio istituzionale della ricerca dell'Università degli Studi di Palermo

The Multimodal Information based Speech Processing (MISP) 2022 challenge aimed to enhance speech processing performance in harsh acoustic environments by leveraging additional modalities such as video or text. The challenge included two tracks: audio-visual speaker diarization (AVSD) and audio-visual diarization and recognition (AVDR). The training material was based on previous MISP 2021 recordings, but we have accurately synchronized audio and visual data. Additionally, a new evaluation set was provided. This paper gives an overview of the challenge setup, presents the results, and summarizes the effective techniques employed by the participants. We also analyze the current technical challenges and suggest directions for future research in AVSD and AVDR.

Chen H., Wu S., Dai Y., Wang Z., Du J., Lee C.-H., et al. (2023). Summary on the Multimodal Information Based Speech Processing (MISP) 2022 Challenge. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. Institute of Electrical and Electronics Engineers Inc. [10.1109/ICASSP49357.2023.10433931].

Summary on the Multimodal Information Based Speech Processing (MISP) 2022 Challenge

Wu S.;Dai Y.;Wang Z.;Du J.;Lee C. -H.;Chen J.;Watanabe S.;Siniscalchi S. M.^Supervision;Scharenborg O.;Liu D. -Y.;Yin B. -C.;Pan J.;Gao J. -Q.;Liu C.

2023-01-01

Abstract

The Multimodal Information based Speech Processing (MISP) 2022 challenge aimed to enhance speech processing performance in harsh acoustic environments by leveraging additional modalities such as video or text. The challenge included two tracks: audio-visual speaker diarization (AVSD) and audio-visual diarization and recognition (AVDR). The training material was based on previous MISP 2021 recordings, but we have accurately synchronized audio and visual data. Additionally, a new evaluation set was provided. This paper gives an overview of the challenge setup, presents the results, and summarizes the effective techniques employed by the participants. We also analyze the current technical challenges and suggest directions for future research in AVSD and AVDR.

Scheda breve

Scheda completa

Scheda completa (DC)

	Data
	
				2023
			
	ISBN della monografia 
DATO PREVISTO SU LOGINMIUR
	
				9781728163277
			
	DOI del contributo 
DATO PREVISTO SU LOGINMIUR
	
				https://dx.doi.org/10.1109/ICASSP49357.2023.10433931
			
	Citazione
	
				Chen H.,  Wu S.,  Dai Y.,  Wang Z.,  Du J.,  Lee C.-H., et al. (2023). Summary on the Multimodal Information Based Speech Processing (MISP) 2022 Challenge. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. Institute of Electrical and Electronics Engineers Inc. [10.1109/ICASSP49357.2023.10433931].
			
	Appare nelle tipologie:
	
				2.07 Contributo in atti di convegno pubblicato in volume

File in questo prodotto:

File	Dimensione	Formato
Summary_on_the_Multimodal_Information_Based_Speech_Processing_MISP_2022_Challenge.pdf Solo gestori archvio Tipologia: Versione Editoriale Dimensione 767.88 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	767.88 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/637517

Citazioni

ND

1

ND

social impact