Archivio istituzionale della ricerca dell'Università degli Studi di Palermo

This work aims to investigate the use of a recently proposed, attention-free, scalable state-space model (SSM), Mamba, for the speech enhancement (SE) task. In particular, we employ Mamba to deploy different regression-based SE models (SEMamba) with different configurations, namely basic, advanced, causal, and non-causal. Furthermore, loss functions either based on signal-level distances or metric-oriented are considered. Experimental evidence shows that SEMamba attains a competitive PESQ of 3.55 on the VoiceBank-DEMAND dataset with the advanced, non-causal configuration. A new state-of-the-art PESQ of 3.69 is also reported when SEMamba is combined with Perceptual Contrast Stretching (PCS). Compared against Transformed-based equivalent SE solutions, a noticeable FLOPs reduction up to ∼ 12% is observed with the advanced non-causal configurations. Finally, SEMamba can be used as a pre-processing step before automatic speech recognition (ASR), showing competitive performance against recent SE solutions.

Chao, R., Cheng, W., Quatra, M.L., Siniscalchi, S.M., Yang, C.H., Fu, S., et al. (2024). An Investigation of Incorporating Mamba For Speech Enhancement. In Proceedings of 2024 IEEE Spoken Language Technology Workshop, SLT 2024 (pp. 302-308). Institute of Electrical and Electronics Engineers Inc. [10.1109/slt61566.2024.10832332].

An Investigation of Incorporating Mamba For Speech Enhancement

Chao, Rong;Cheng, Wen-Huang;Quatra, Moreno La;Siniscalchi, Sabato Marco;Yang, Chao-Han Huck;Fu, Szu-Wei;Tsao, Yu

2024-01-01

Abstract

This work aims to investigate the use of a recently proposed, attention-free, scalable state-space model (SSM), Mamba, for the speech enhancement (SE) task. In particular, we employ Mamba to deploy different regression-based SE models (SEMamba) with different configurations, namely basic, advanced, causal, and non-causal. Furthermore, loss functions either based on signal-level distances or metric-oriented are considered. Experimental evidence shows that SEMamba attains a competitive PESQ of 3.55 on the VoiceBank-DEMAND dataset with the advanced, non-causal configuration. A new state-of-the-art PESQ of 3.69 is also reported when SEMamba is combined with Perceptual Contrast Stretching (PCS). Compared against Transformed-based equivalent SE solutions, a noticeable FLOPs reduction up to ∼ 12% is observed with the advanced non-causal configurations. Finally, SEMamba can be used as a pre-processing step before automatic speech recognition (ASR), showing competitive performance against recent SE solutions.

Scheda breve

Scheda completa

Scheda completa (DC)

	Data
	
				2024
			
	Settore scientifico disciplinare del contributo
	
				Settore IINF-05/A - Sistemi di elaborazione delle informazioni
			
	DOI del contributo 
DATO PREVISTO SU LOGINMIUR
	
				https://dx.doi.org/10.1109/slt61566.2024.10832332
			
	URL dell'editore (Open access ove possibile)
	
				https://ieeexplore.ieee.org/document/10832332
			
	Citazione
	
				Chao, R., Cheng, W., Quatra, M.L., Siniscalchi, S.M., Yang, C.H., Fu, S., et al. (2024). An Investigation of Incorporating Mamba For Speech Enhancement. In Proceedings of 2024 IEEE Spoken Language Technology Workshop, SLT 2024 (pp. 302-308). Institute of Electrical and Electronics Engineers Inc. [10.1109/slt61566.2024.10832332].
			
	Appare nelle tipologie:
	
				2.07 Contributo in atti di convegno pubblicato in volume

File in questo prodotto:

File	Dimensione	Formato
An_Investigation_of_Incorporating_Mamba_For_Speech_Enhancement.pdf Solo gestori archvio Tipologia: Versione Editoriale Dimensione 2.24 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	2.24 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/673127

Citazioni

ND

21

12

social impact