Archivio istituzionale della ricerca dell'Università degli Studi di Palermo

We propose a tensor-to-vector regression approach to multi-channel speech enhancement in order to address the issue of input size explosion and hidden-layer size expansion. The key idea is to cast the conventional deep neural network (DNN) based vector-to-vector regression formulation under a tensor-train network (TTN) framework. TTN is a recently emerged solution for compact representation of deep models with fully connected hidden layers. Thus TTN maintains DNN's expressive power yet involves a much smaller amount of trainable parameters. Furthermore, TTN can handle a multi-dimensional tensor input by design, which exactly matches the desired setting in multi-channel speech enhancement. We first provide a theoretical extension from DNN to TTN based regression. Next, we show that TTN can attain speech enhancement quality comparable with that for DNN but with much fewer parameters, e.g., a reduction from 27 million to only 5 million parameters is observed in a single-channel scenario. TTN also improves PESQ over DNN from 2.86 to 2.96 by slightly increasing the number of trainable parameters. Finally, in 8-channel conditions, a PESQ of 3.12 is achieved using 20 million parameters for TTN, whereas a DNN with 68 million parameters can only attain a PESQ of 3.06.

Qi, J., Hu, H.u., Wang, Y., Yang, C.H., Marco Siniscalchi, S., Lee, C. (2020). Tensor-To-Vector Regression for Multi-Channel Speech Enhancement Based on Tensor-Train Network. In ICASSP (pp. 7504-7508). IEEE [10.1109/ICASSP40776.2020.9052938].

Tensor-To-Vector Regression for Multi-Channel Speech Enhancement Based on Tensor-Train Network

Qi, Jun;Hu, Hu;Wang, Yannan;Yang, Chao-Han Huck;Marco Siniscalchi, Sabato;Lee, Chin-Hui

2020-01-01

Abstract

We propose a tensor-to-vector regression approach to multi-channel speech enhancement in order to address the issue of input size explosion and hidden-layer size expansion. The key idea is to cast the conventional deep neural network (DNN) based vector-to-vector regression formulation under a tensor-train network (TTN) framework. TTN is a recently emerged solution for compact representation of deep models with fully connected hidden layers. Thus TTN maintains DNN's expressive power yet involves a much smaller amount of trainable parameters. Furthermore, TTN can handle a multi-dimensional tensor input by design, which exactly matches the desired setting in multi-channel speech enhancement. We first provide a theoretical extension from DNN to TTN based regression. Next, we show that TTN can attain speech enhancement quality comparable with that for DNN but with much fewer parameters, e.g., a reduction from 27 million to only 5 million parameters is observed in a single-channel scenario. TTN also improves PESQ over DNN from 2.86 to 2.96 by slightly increasing the number of trainable parameters. Finally, in 8-channel conditions, a PESQ of 3.12 is achieved using 20 million parameters for TTN, whereas a DNN with 68 million parameters can only attain a PESQ of 3.06.

Scheda breve

Scheda completa

Scheda completa (DC)

	Data
	
				2020
			
	ISBN della monografia 
DATO PREVISTO SU LOGINMIUR
	
				978-1-5090-6631-5
			
	DOI del contributo 
DATO PREVISTO SU LOGINMIUR
	
				https://dx.doi.org/10.1109/ICASSP40776.2020.9052938
			
	URL alternativo rispetto a quello dell'editore 
DATO PREVISTO SU LOGINMIUR
	
				https://ieeexplore.ieee.org/document/9052938
			
	Citazione
	
				Qi, J., Hu, H.u., Wang, Y., Yang, C.H., Marco Siniscalchi, S., Lee, C. (2020). Tensor-To-Vector Regression for Multi-Channel Speech Enhancement Based on Tensor-Train Network. In ICASSP (pp. 7504-7508). IEEE [10.1109/ICASSP40776.2020.9052938].
			
	Appare nelle tipologie:
	
				2.07 Contributo in atti di convegno pubblicato in volume

File in questo prodotto:

File	Dimensione	Formato
qi2020-3.pdf Solo gestori archvio Tipologia: Versione Editoriale Dimensione 550.09 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	550.09 kB	Adobe PDF	Visualizza/Apri Richiedi una copia
2002.00544v1.pdf accesso aperto Tipologia: Pre-print Dimensione 569.18 kB Formato Adobe PDF Visualizza/Apri	569.18 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/636674

Citazioni

ND

14

12

social impact