Archivio istituzionale della ricerca dell'Università degli Studi di Palermo

We propose a unified deep neural network (DNN) approach to achieve both high-quality enhanced speech and high-accuracy automatic speech recognition (ASR) simultaneously on the recent REverberant Voice Enhancement and Recognition Benchmark (RE-VERB) Challenge. These two goals are accomplished by two proposed techniques, namely DNN-based regression to enhance reverberant and noisy speech, followed by DNN-based multi-condition training that takes clean-condition, multi-condition and enhanced speech all into consideration. We first report on superior objective measures in enhanced speech to those listed in the 2014 REVERB Challenge Workshop. We then show that in clean-condition training, we obtain the best word error rate (WER) of 13.28% on the 1-channel REVERB simulated evaluation data with the proposed DNN-based pre-processing scheme. Similarly we attain a competitive single-system WER of 8.75% with the proposed multi-condition training strategy and the same less-discriminative log power spectrum features used in the enhancement stage. Finally by leveraging upon joint training with more discriminative ASR features and improved neural network based language models a state-of-the-art WER of 4.46% is attained with a single ASR system, and single-channel information. Another state-of-the-art WER of 4.10% is achieved through system combination.

Wu, B., Li, K., Huang, Z., SINISCALCHI, S.M., Yang, M., Lee, C.H. (2017). A unified deep modeling approach to simultaneous speech dereverberation and recognition for the reverb challenge. In HSCMA (pp. 36-40). IEEE Signal Proc Soc [10.1109/HSCMA.2017.7895557].

A unified deep modeling approach to simultaneous speech dereverberation and recognition for the reverb challenge

Wu, B.;Li, K.;Huang, Z.;SINISCALCHI, SABATO MARCO;Yang, M;Lee, C. H.

2017-01-01

Abstract

We propose a unified deep neural network (DNN) approach to achieve both high-quality enhanced speech and high-accuracy automatic speech recognition (ASR) simultaneously on the recent REverberant Voice Enhancement and Recognition Benchmark (RE-VERB) Challenge. These two goals are accomplished by two proposed techniques, namely DNN-based regression to enhance reverberant and noisy speech, followed by DNN-based multi-condition training that takes clean-condition, multi-condition and enhanced speech all into consideration. We first report on superior objective measures in enhanced speech to those listed in the 2014 REVERB Challenge Workshop. We then show that in clean-condition training, we obtain the best word error rate (WER) of 13.28% on the 1-channel REVERB simulated evaluation data with the proposed DNN-based pre-processing scheme. Similarly we attain a competitive single-system WER of 8.75% with the proposed multi-condition training strategy and the same less-discriminative log power spectrum features used in the enhancement stage. Finally by leveraging upon joint training with more discriminative ASR features and improved neural network based language models a state-of-the-art WER of 4.46% is attained with a single ASR system, and single-channel information. Another state-of-the-art WER of 4.10% is achieved through system combination.

Scheda breve

Scheda completa

Scheda completa (DC)

	Data
	
				2017
			
	ISBN della monografia 
DATO PREVISTO SU LOGINMIUR
	
				9781509059256
			
	DOI del contributo 
DATO PREVISTO SU LOGINMIUR
	
				https://dx.doi.org/10.1109/HSCMA.2017.7895557
			
	URL alternativo rispetto a quello dell'editore 
DATO PREVISTO SU LOGINMIUR
	
				http://ieeexplore.ieee.org/document/7895557/
			
	Citazione
	
				Wu, B., Li, K., Huang, Z., SINISCALCHI, S.M., Yang, M., Lee, C.H. (2017). A unified deep modeling approach to simultaneous speech dereverberation and recognition for the reverb challenge. In HSCMA (pp. 36-40). IEEE Signal Proc Soc [10.1109/HSCMA.2017.7895557].
			
	Appare nelle tipologie:
	
				2.07 Contributo in atti di convegno pubblicato in volume

File in questo prodotto:

File	Dimensione	Formato
07895557.pdf Solo gestori archvio Tipologia: Versione Editoriale Dimensione 518.7 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	518.7 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/649575

Citazioni

ND

6

5

social impact