Archivio istituzionale della ricerca dell'Università degli Studi di Palermo

We propose a novel data utilization strategy, called multichannel-condition learning, leveraging upon complementary information captured in microphone array speech to jointly train dereverberation and acoustic deep neural network (DNN) models for robust distant speech recognition. Experimental results, with a single automatic speech recognition (ASR) system, on the REVERB2014 simulated evaluation data show that, on 1-channel testing, the baseline joint training scheme attains a word error rate (WER) of 7.47%, reduced from 8.72% for separate training. The proposed multi-channel-condition learning scheme has been experimented on different channel data combinations and usage showing many interesting implications. Finally, training on all 8-channel data and with DNN-based language model rescoring, a state-of-the-art WER of 4.05% is achieved. We anticipate an even lower WER when combining more top ASR systems.

Ge F., Li K., Wu B., Siniscalchi S.M., Yan Y., Lee C.-H. (2017). Joint training of multi-channel-condition dereverberation and acoustic modeling of microphone array speech for robust distant speech recognition. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH (pp. 3847-3851). ;4 Rue des Fauvettes - Lous Tourils : International Speech Communication Association [10.21437/Interspeech.2017-579].

Joint training of multi-channel-condition dereverberation and acoustic modeling of microphone array speech for robust distant speech recognition

Ge F.;Li K.;Wu B.;Siniscalchi S. M.^{Membro del Collaboration Group};Yan Y.;Lee C. -H.

2017-01-01

Abstract

We propose a novel data utilization strategy, called multichannel-condition learning, leveraging upon complementary information captured in microphone array speech to jointly train dereverberation and acoustic deep neural network (DNN) models for robust distant speech recognition. Experimental results, with a single automatic speech recognition (ASR) system, on the REVERB2014 simulated evaluation data show that, on 1-channel testing, the baseline joint training scheme attains a word error rate (WER) of 7.47%, reduced from 8.72% for separate training. The proposed multi-channel-condition learning scheme has been experimented on different channel data combinations and usage showing many interesting implications. Finally, training on all 8-channel data and with DNN-based language model rescoring, a state-of-the-art WER of 4.05% is achieved. We anticipate an even lower WER when combining more top ASR systems.

Scheda breve

Scheda completa

Scheda completa (DC)

	Data
	
				2017
			
	DOI del contributo 
DATO PREVISTO SU LOGINMIUR
	
				https://dx.doi.org/10.21437/Interspeech.2017-579
			
	URL dell'editore (Open access ove possibile)
	
				https://www.isca-archive.org/interspeech_2017/ge17_interspeech.html
			
	Citazione
	
				Ge F.,  Li K.,  Wu B.,  Siniscalchi S.M.,  Yan Y.,  Lee C.-H. (2017). Joint training of multi-channel-condition dereverberation and acoustic modeling of microphone array speech for robust distant speech recognition. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH (pp. 3847-3851). ;4 Rue des Fauvettes - Lous Tourils : International Speech Communication Association [10.21437/Interspeech.2017-579].
			
	Appare nelle tipologie:
	
				2.07 Contributo in atti di convegno pubblicato in volume

File in questo prodotto:

File	Dimensione	Formato
ge17_interspeech.pdf Solo gestori archvio Descrizione: Il testo pieno dell’articolo è disponibile al seguente link: https://www.isca-archive.org/interspeech_2017/ge17_interspeech.html Tipologia: Versione Editoriale Dimensione 1.25 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	1.25 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/649496

Citazioni

ND

2

2

social impact