We propose an environment adaptation approach that improves deep speech enhancement models via minimizing the Kullback-Leibler divergence between posterior probabilities produced by a multi-condition senone classifier (teacher) fed with noisy speech features and a clean-condition senone classifier (student) fed with enhanced speech features to transfer an existing deep neural network (DNN) speech enhancer to specific noisy environments without using noisy/clean paired target waveforms needed in conventional DNN-based spectral regression. Our solution not only improves listening quality in the enhanced speech but also boosts noise robustness of existing automatic speech recognition (ASR) systems trained on clean data if employed as a pre-processing step before speech feature extraction. Experimental results show steady gains in objective quality measurements as a result of a teacher network producing adaptation targets for a student enhancement model to adjust its parameters in unseen noise conditions. The proposed technique is particularly advantageous in environments that are not handled effectively by the unadapted DNN-based enhancer, as we find that only very little data from a specific operating condition is required to yield good improvements. Finally, higher gains in speech quality directly translate to larger improvements in ASR.

Wang, S., Li, W., Siniscalchi, S.M., Lee, C. (2020). A Cross-Task Transfer Learning Approach to Adapting Deep Speech Enhancement Models to Unseen Background Noise Using Paired Senone Classifiers. In IEEE International Conference on Acoustics, Speech, and Signal Processing (pp. 6219-6223). IEEE [10.1109/ICASSP40776.2020.9054543].

A Cross-Task Transfer Learning Approach to Adapting Deep Speech Enhancement Models to Unseen Background Noise Using Paired Senone Classifiers

Siniscalchi, Sabato Marco;
2020-01-01

Abstract

We propose an environment adaptation approach that improves deep speech enhancement models via minimizing the Kullback-Leibler divergence between posterior probabilities produced by a multi-condition senone classifier (teacher) fed with noisy speech features and a clean-condition senone classifier (student) fed with enhanced speech features to transfer an existing deep neural network (DNN) speech enhancer to specific noisy environments without using noisy/clean paired target waveforms needed in conventional DNN-based spectral regression. Our solution not only improves listening quality in the enhanced speech but also boosts noise robustness of existing automatic speech recognition (ASR) systems trained on clean data if employed as a pre-processing step before speech feature extraction. Experimental results show steady gains in objective quality measurements as a result of a teacher network producing adaptation targets for a student enhancement model to adjust its parameters in unseen noise conditions. The proposed technique is particularly advantageous in environments that are not handled effectively by the unadapted DNN-based enhancer, as we find that only very little data from a specific operating condition is required to yield good improvements. Finally, higher gains in speech quality directly translate to larger improvements in ASR.
2020
978-1-5090-6631-5
Wang, S., Li, W., Siniscalchi, S.M., Lee, C. (2020). A Cross-Task Transfer Learning Approach to Adapting Deep Speech Enhancement Models to Unseen Background Noise Using Paired Senone Classifiers. In IEEE International Conference on Acoustics, Speech, and Signal Processing (pp. 6219-6223). IEEE [10.1109/ICASSP40776.2020.9054543].
File in questo prodotto:
File Dimensione Formato  
wang2020-2.pdf

Solo gestori archvio

Tipologia: Versione Editoriale
Dimensione 392.62 kB
Formato Adobe PDF
392.62 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/636673
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 15
  • ???jsp.display-item.citation.isi??? 8
social impact