Archivio istituzionale della ricerca dell'Università degli Studi di Palermo

In this paper, we describe and release publicly the audio-visual wake word spotting (WWS) database in the MISP2021 Challenge, which covers a range of scenarios of audio and video data collected by near-, mid-, and far-field microphone arrays, and cameras, to create a shared and publicly available database for WWS. The database and the code 2 are released, which will be a valuable addition to the community for promoting WWS research using multi-modality information in realistic and complex conditions. Moreover, we investigated the different data augmentation methods for single modalities on an end-to-end WWS network. A set of audio-visual fusion experiments and analysis were conducted to observe the assistance from visual information to acoustic information based on different audio and video field configurations. The results showed that the fusion system generally improves over the single-modality (audio- or video-only) system, especially under complex noisy conditions.

Zhou H., Du J., Zou G., Nian Z., Lee C.-H., Siniscalchi S.M., et al. (2022). Audio-Visual Wake Word Spotting in MISP2021 Challenge: Dataset Release and Deep Analysis. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH (pp. 1111-1115). C/O EMMANUELLE FOXONET, 4 RUE DES FAUVETTES, LIEU DIT LOUS TOURILS, BAIXAS, F-66390, FRANCE : International Speech Communication Association [10.21437/Interspeech.2022-10650].

Audio-Visual Wake Word Spotting in MISP2021 Challenge: Dataset Release and Deep Analysis

Zhou H.;Du J.;Zou G.;Nian Z.;Lee C. -H.;Siniscalchi S. M.;Watanabe S.;Scharenborg O.;Chen J.;Xiong S.;Gao J. -Q.

2022-01-01

Abstract

In this paper, we describe and release publicly the audio-visual wake word spotting (WWS) database in the MISP2021 Challenge, which covers a range of scenarios of audio and video data collected by near-, mid-, and far-field microphone arrays, and cameras, to create a shared and publicly available database for WWS. The database and the code 2 are released, which will be a valuable addition to the community for promoting WWS research using multi-modality information in realistic and complex conditions. Moreover, we investigated the different data augmentation methods for single modalities on an end-to-end WWS network. A set of audio-visual fusion experiments and analysis were conducted to observe the assistance from visual information to acoustic information based on different audio and video field configurations. The results showed that the fusion system generally improves over the single-modality (audio- or video-only) system, especially under complex noisy conditions.

Scheda breve

Scheda completa

Scheda completa (DC)

	Data
	
				2022
			
	DOI del contributo 
DATO PREVISTO SU LOGINMIUR
	
				https://dx.doi.org/10.21437/Interspeech.2022-10650
			
	URL dell'editore (Open access ove possibile)
	
				https://www.isca-archive.org/interspeech_2022/zhou22g_interspeech.html
			
	Citazione
	
				Zhou H.,  Du J.,  Zou G.,  Nian Z.,  Lee C.-H.,  Siniscalchi S.M., et al. (2022). Audio-Visual Wake Word Spotting in MISP2021 Challenge: Dataset Release and Deep Analysis. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH (pp. 1111-1115). C/O EMMANUELLE FOXONET, 4 RUE DES FAUVETTES, LIEU DIT LOUS TOURILS, BAIXAS, F-66390, FRANCE : International Speech Communication Association [10.21437/Interspeech.2022-10650].
			
	Appare nelle tipologie:
	
				2.07 Contributo in atti di convegno pubblicato in volume

File in questo prodotto:

File	Dimensione	Formato
zhou22g_interspeech.pdf Solo gestori archvio Descrizione: Il testo pieno dell’articolo è disponibile al seguente link: https://www.isca-archive.org/interspeech_2022/zhou22g_interspeech.html Tipologia: Versione Editoriale Dimensione 1.93 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	1.93 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/641414

Citazioni

ND

12

10

social impact