Archivio istituzionale della ricerca dell'Università degli Studi di Palermo

We propose an environment adaptation approach that improves deep speech enhancement models via minimizing the Kullback-Leibler divergence between posterior probabilities produced by a multi-condition senone classifier (teacher) fed with noisy speech features and a clean-condition senone classifier (student) fed with enhanced speech features to transfer an existing deep neural network (DNN) speech enhancer to specific noisy environments without using noisy/clean paired target waveforms needed in conventional DNN-based spectral regression. Our solution not only improves listening quality in the enhanced speech but also boosts noise robustness of existing automatic speech recognition (ASR) systems trained on clean data if employed as a pre-processing step before speech feature extraction. Experimental results show steady gains in objective quality measurements as a result of a teacher network producing adaptation targets for a student enhancement model to adjust its parameters in unseen noise conditions. The proposed technique is particularly advantageous in environments that are not handled effectively by the unadapted DNN-based enhancer, as we find that only very little data from a specific operating condition is required to yield good improvements. Finally, higher gains in speech quality directly translate to larger improvements in ASR.

Wang, S., Li, W., Siniscalchi, S.M., Lee, C. (2020). A Cross-Task Transfer Learning Approach to Adapting Deep Speech Enhancement Models to Unseen Background Noise Using Paired Senone Classifiers. In IEEE International Conference on Acoustics, Speech, and Signal Processing (pp. 6219-6223). IEEE [10.1109/ICASSP40776.2020.9054543].

A Cross-Task Transfer Learning Approach to Adapting Deep Speech Enhancement Models to Unseen Background Noise Using Paired Senone Classifiers

Wang, Sicheng;Li, Wei;Siniscalchi, Sabato Marco;Lee, Chin-Hui

2020-01-01

Abstract

We propose an environment adaptation approach that improves deep speech enhancement models via minimizing the Kullback-Leibler divergence between posterior probabilities produced by a multi-condition senone classifier (teacher) fed with noisy speech features and a clean-condition senone classifier (student) fed with enhanced speech features to transfer an existing deep neural network (DNN) speech enhancer to specific noisy environments without using noisy/clean paired target waveforms needed in conventional DNN-based spectral regression. Our solution not only improves listening quality in the enhanced speech but also boosts noise robustness of existing automatic speech recognition (ASR) systems trained on clean data if employed as a pre-processing step before speech feature extraction. Experimental results show steady gains in objective quality measurements as a result of a teacher network producing adaptation targets for a student enhancement model to adjust its parameters in unseen noise conditions. The proposed technique is particularly advantageous in environments that are not handled effectively by the unadapted DNN-based enhancer, as we find that only very little data from a specific operating condition is required to yield good improvements. Finally, higher gains in speech quality directly translate to larger improvements in ASR.

Scheda breve

Scheda completa

Scheda completa (DC)

	Data
	
				2020
			
	ISBN della monografia 
DATO PREVISTO SU LOGINMIUR
	
				978-1-5090-6631-5
			
	DOI del contributo 
DATO PREVISTO SU LOGINMIUR
	
				https://dx.doi.org/10.1109/ICASSP40776.2020.9054543
			
	URL alternativo rispetto a quello dell'editore 
DATO PREVISTO SU LOGINMIUR
	
				https://ieeexplore.ieee.org/document/9054543
			
	Citazione
	
				Wang, S., Li, W., Siniscalchi, S.M., Lee, C. (2020). A Cross-Task Transfer Learning Approach to Adapting Deep Speech Enhancement Models to Unseen Background Noise Using Paired Senone Classifiers. In IEEE International Conference on Acoustics, Speech, and Signal Processing (pp. 6219-6223). IEEE [10.1109/ICASSP40776.2020.9054543].
			
	Appare nelle tipologie:
	
				2.07 Contributo in atti di convegno pubblicato in volume

File in questo prodotto:

File	Dimensione	Formato
wang2020-2.pdf Solo gestori archvio Tipologia: Versione Editoriale Dimensione 392.62 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	392.62 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/636673

Citazioni

ND

17

11

social impact