Archivio istituzionale della ricerca dell'Università degli Studi di Palermo

In this work, we leverage on a novel distributional loss to improve vector-to-vector regression for feature-based speech enhancement (SE). The distributional loss function is devised based on the Kullback-Leibler divergence between a selected target distribution and a conditional distribution to be learned from the data for each coefficient in the clean speech vector given the noisy input features. A deep model having a softmax layer per coefficient is employed to parametrize the conditional distribution, and deep model parameters are found by minimizing a weighted sum of the cross-entropy between its outputs and respective target distributions. Experiments with convolutional neural networks (CNNs) on publicly available noisy speech dataset obtained from the Voice Bank corpus show consistent improvement over conventional solutions based on the mean squared error (MSE), and the least absolute deviation (LAD). Moreover, our approach compares favourably in terms of both speech quality and intelligibility against the Mixture Density Networks (MDNs), which is also an approach that relies on computing parametric conditional distributions based on Gaussian mixture models (GMMs) and a neural architecture. Comparison against GAN-based solutions are presented as well.

Siniscalchi, S.M. (2021). Vector-to-Vector Regression via Distributional Loss for Speech Enhancement. IEEE SIGNAL PROCESSING LETTERS, 28, 254-258 [10.1109/LSP.2021.3050386].

Vector-to-Vector Regression via Distributional Loss for Speech Enhancement

Siniscalchi, Sabato Marco^{Primo

Investigation}

2021-01-01

Abstract

In this work, we leverage on a novel distributional loss to improve vector-to-vector regression for feature-based speech enhancement (SE). The distributional loss function is devised based on the Kullback-Leibler divergence between a selected target distribution and a conditional distribution to be learned from the data for each coefficient in the clean speech vector given the noisy input features. A deep model having a softmax layer per coefficient is employed to parametrize the conditional distribution, and deep model parameters are found by minimizing a weighted sum of the cross-entropy between its outputs and respective target distributions. Experiments with convolutional neural networks (CNNs) on publicly available noisy speech dataset obtained from the Voice Bank corpus show consistent improvement over conventional solutions based on the mean squared error (MSE), and the least absolute deviation (LAD). Moreover, our approach compares favourably in terms of both speech quality and intelligibility against the Mixture Density Networks (MDNs), which is also an approach that relies on computing parametric conditional distributions based on Gaussian mixture models (GMMs) and a neural architecture. Comparison against GAN-based solutions are presented as well.

Scheda breve

Scheda completa

Scheda completa (DC)

	Data
	
				2021
			
	Titolo del periodico 
DATO PREVISTO SU LOGINMIUR
	
				IEEE SIGNAL PROCESSING LETTERS
			
	DOI del contributo 
DATO PREVISTO SU LOGINMIUR
	
				https://dx.doi.org/10.1109/LSP.2021.3050386
			
	Citazione
	
				Siniscalchi, S.M. (2021). Vector-to-Vector Regression via Distributional Loss for Speech Enhancement. IEEE SIGNAL PROCESSING LETTERS, 28, 254-258 [10.1109/LSP.2021.3050386].
			
	Appare nelle tipologie:
	
				1.01 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
Vector-to-Vector_Regression_via_Distributional_Loss_for_Speech_Enhancement.pdf Solo gestori archvio Tipologia: Versione Editoriale Dimensione 499.41 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	499.41 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/636667

Citazioni

ND

7

5

social impact