Archivio istituzionale della ricerca dell'Università degli Studi di Palermo

We propose an integrated end-to-end automatic speech recognition (ASR) paradigm by joint learning of the front-end speech signal processing and back-end acoustic modeling. We believe that “only good signal processing can lead to top ASR performance” in challenging acoustic environments. This notion leads to a unified deep neural network (DNN) framework for distant speech processing that can achieve both high-quality enhanced speech and high-accuracy ASR simultaneously. Our goal is accomplished by two techniques, namely: (i) a reverberation-time-aware DNN based speech dereverberation architecture that can handle a wide range of reverberation times to enhance speech quality of reverberant and noisy speech, followed by (ii) DNN-based multi-condition training that takes both clean-condition and multi-condition speech into consideration, leveraging upon an exploitation of the data acquired and processed with multi-channel microphone arrays, to improve ASR performance. The final end-to-end system is established by a joint optimization of the speech enhancement and recognition DNNs.

Bo Wu, Kehuang Li, Fengpei Ge, Huang Zhen, Yang Minglei, Sabato Marco Siniscalchi, et al. (2017). An End-to-End Deep Learning Approach to Simultaneous Speech Dereverberation and Acoustic Modeling for Robust Speech Recognition. IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 11(8), 1289-1300 [10.1109/JSTSP.2017.2756439].

An End-to-End Deep Learning Approach to Simultaneous Speech Dereverberation and Acoustic Modeling for Robust Speech Recognition

Kehuang Li;Fengpei Ge;Huang Zhen;Yang Minglei;Sabato Marco Siniscalchi^Supervision;Chin-Hui Lee

2017-12-01

Abstract

We propose an integrated end-to-end automatic speech recognition (ASR) paradigm by joint learning of the front-end speech signal processing and back-end acoustic modeling. We believe that “only good signal processing can lead to top ASR performance” in challenging acoustic environments. This notion leads to a unified deep neural network (DNN) framework for distant speech processing that can achieve both high-quality enhanced speech and high-accuracy ASR simultaneously. Our goal is accomplished by two techniques, namely: (i) a reverberation-time-aware DNN based speech dereverberation architecture that can handle a wide range of reverberation times to enhance speech quality of reverberant and noisy speech, followed by (ii) DNN-based multi-condition training that takes both clean-condition and multi-condition speech into consideration, leveraging upon an exploitation of the data acquired and processed with multi-channel microphone arrays, to improve ASR performance. The final end-to-end system is established by a joint optimization of the speech enhancement and recognition DNNs.

Scheda breve

Scheda completa

Scheda completa (DC)

	Data
	
				dic-2017
			
	Titolo del periodico 
DATO PREVISTO SU LOGINMIUR
	
				IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING
			
	DOI del contributo 
DATO PREVISTO SU LOGINMIUR
	
				https://dx.doi.org/10.1109/JSTSP.2017.2756439
			
	Citazione
	
				Bo Wu,  Kehuang Li,  Fengpei Ge,  Huang Zhen,  Yang Minglei,  Sabato Marco Siniscalchi, et al. (2017). An End-to-End Deep Learning Approach to Simultaneous Speech Dereverberation and Acoustic Modeling for Robust Speech Recognition. IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 11(8), 1289-1300 [10.1109/JSTSP.2017.2756439].
			
	Appare nelle tipologie:
	
				1.01 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
08051278.pdf Solo gestori archvio Tipologia: Versione Editoriale Dimensione 820.27 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	820.27 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/636656

Citazioni

ND

70

50

social impact