Archivio istituzionale della ricerca dell'Università degli Studi di Palermo

This paper investigates different trade-offs between the number of model parameters and enhanced speech qualities by employing several deep tensor-to-vector regression models for speech enhancement. We find that a hybrid architecture, namely CNN-TT, is capable of maintaining a good quality performance with a reduced model parameter size. CNN-TT is composed of several convolutional layers at the bottom for feature extraction to improve speech quality and a tensor-train (TT) output layer on the top to reduce model parameters. We first derive a new upper bound on the generalization power of the convolutional neural network (CNN) based vector-to-vector regression models. Then, we provide experimental evidence on the Edinburgh noisy speech corpus to demonstrate that, in single-channel speech enhancement, CNN outperforms DNN at the expense of a small increment of model sizes. Besides, CNN-TT slightly outperforms the CNN counterpart by utilizing only 32% of the CNN model parameters. Besides, further performance improvement can be attained if the number of CNN-TT parameters is increased to 44% of the CNN model size. Finally, our experiments of multi-channel speech enhancement on a simulated noisy WSJ0 corpus demonstrate that our proposed hybrid CNN-TT architecture achieves better results than both DNN and CNN models in terms of better-enhanced speech qualities and smaller parameter sizes.

Qi, J., Hu, H.u., Wang, Y., Yang, C.H., Siniscalchi, S.M., Lee, C. (2020). Exploring Deep Hybrid Tensor-to-Vector Network Architectures for Regression Based Speech Enhancement. In INTERSPEECH 2020 (pp. 76-80) [10.21437/Interspeech.2020-1900].

Exploring Deep Hybrid Tensor-to-Vector Network Architectures for Regression Based Speech Enhancement

Hu, Hu;Wang, Yannan;Yang, Chao-Han Huck;Siniscalchi, Sabato Marco^{Co-ultimo

Supervision};Lee, Chin-Hui

2020-01-01

Abstract

This paper investigates different trade-offs between the number of model parameters and enhanced speech qualities by employing several deep tensor-to-vector regression models for speech enhancement. We find that a hybrid architecture, namely CNN-TT, is capable of maintaining a good quality performance with a reduced model parameter size. CNN-TT is composed of several convolutional layers at the bottom for feature extraction to improve speech quality and a tensor-train (TT) output layer on the top to reduce model parameters. We first derive a new upper bound on the generalization power of the convolutional neural network (CNN) based vector-to-vector regression models. Then, we provide experimental evidence on the Edinburgh noisy speech corpus to demonstrate that, in single-channel speech enhancement, CNN outperforms DNN at the expense of a small increment of model sizes. Besides, CNN-TT slightly outperforms the CNN counterpart by utilizing only 32% of the CNN model parameters. Besides, further performance improvement can be attained if the number of CNN-TT parameters is increased to 44% of the CNN model size. Finally, our experiments of multi-channel speech enhancement on a simulated noisy WSJ0 corpus demonstrate that our proposed hybrid CNN-TT architecture achieves better results than both DNN and CNN models in terms of better-enhanced speech qualities and smaller parameter sizes.

Scheda breve

Scheda completa

Scheda completa (DC)

	Data
	
				2020
			
	DOI del contributo 
DATO PREVISTO SU LOGINMIUR
	
				https://dx.doi.org/10.21437/Interspeech.2020-1900
			
	URL dell'editore (Open access ove possibile)
	
				https://www.isca-archive.org/interspeech_2020/qi20_interspeech.html
			
	Citazione
	
				Qi, J., Hu, H.u., Wang, Y., Yang, C.H., Siniscalchi, S.M., Lee, C. (2020). Exploring Deep Hybrid Tensor-to-Vector Network Architectures for Regression Based Speech Enhancement. In INTERSPEECH 2020 (pp. 76-80) [10.21437/Interspeech.2020-1900].
			
	Appare nelle tipologie:
	
				2.07 Contributo in atti di convegno pubblicato in volume

File in questo prodotto:

File	Dimensione	Formato
1900.pdf Solo gestori archvio Descrizione: Il testo pieno dell’articolo è disponibile al seguente link: https://www.isca-archive.org/interspeech_2020/qi20_interspeech.html Tipologia: Versione Editoriale Dimensione 969.64 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	969.64 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/636628

Citazioni

ND

8

7

social impact