Archivio istituzionale della ricerca dell'Università degli Studi di Palermo

We propose a novel approach to speech enhancement, termed Controllable ConforMer for Speech Enhancement (CCMSE), which leverages a Conformer-based architecture integrated with a control factor embedding module. Our method is designed to optimize speech quality for both human auditory perception and automatic speech recognition (ASR). It is observed that while mild denoising typically preserves speech naturalness, stronger denoising can improve human auditory tasks but often at the cost of ASR accuracy due to increased distortion. To address this, we introduce an algorithm that balances these trade-offs. By utilizing differential equations to interpolate between outputs at varying levels of denoising intensity, our method effectively combines the robustness of mild denoising with the clarity of stronger denoising, resulting in enhanced speech that is well-suited for both human and machine listeners. Experimental results on the CHiME-4 dataset validate the effectiveness of our approach. Additionally, to directly evaluate our method, a listening test demo is provided: https://zelokuo.github.io/CCMSE_demo .

Guo, Z., Du, J., Siniscalchi, S.M., Pan, J., Liu, Q. (2025). Controllable Conformer for Speech Enhancement and Recognition. IEEE SIGNAL PROCESSING LETTERS, 32, 156-160 [10.1109/LSP.2024.3505794].

Controllable Conformer for Speech Enhancement and Recognition

Guo, Zilu^{Investigation};Siniscalchi, Sabato Marco^Supervision;Pan, Jia^{Membro del Collaboration Group};

2025-01-01

Abstract

We propose a novel approach to speech enhancement, termed Controllable ConforMer for Speech Enhancement (CCMSE), which leverages a Conformer-based architecture integrated with a control factor embedding module. Our method is designed to optimize speech quality for both human auditory perception and automatic speech recognition (ASR). It is observed that while mild denoising typically preserves speech naturalness, stronger denoising can improve human auditory tasks but often at the cost of ASR accuracy due to increased distortion. To address this, we introduce an algorithm that balances these trade-offs. By utilizing differential equations to interpolate between outputs at varying levels of denoising intensity, our method effectively combines the robustness of mild denoising with the clarity of stronger denoising, resulting in enhanced speech that is well-suited for both human and machine listeners. Experimental results on the CHiME-4 dataset validate the effectiveness of our approach. Additionally, to directly evaluate our method, a listening test demo is provided: https://zelokuo.github.io/CCMSE_demo .

Scheda breve

Scheda completa

Scheda completa (DC)

	Data
	
				2025
			
	Settore scientifico disciplinare del contributo
	
				Settore IINF-05/A - Sistemi di elaborazione delle informazioni
			
	Titolo del periodico 
DATO PREVISTO SU LOGINMIUR
	
				IEEE SIGNAL PROCESSING LETTERS
			
	DOI del contributo 
DATO PREVISTO SU LOGINMIUR
	
				https://dx.doi.org/10.1109/LSP.2024.3505794
			
	URL dell'editore (Open access ove possibile)
	
				https://ieeexplore.ieee.org/abstract/document/10766627/keywords#keywords
			
	Citazione
	
				Guo, Z., Du, J., Siniscalchi, S.M., Pan, J., Liu, Q. (2025). Controllable Conformer for Speech Enhancement and Recognition. IEEE SIGNAL PROCESSING LETTERS, 32, 156-160 [10.1109/LSP.2024.3505794].
			
	Appare nelle tipologie:
	
				1.01 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
Controllable_Conformer_for_Speech_Enhancement_and_Recognition.pdf Solo gestori archvio Tipologia: Post-print Dimensione 577.23 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	577.23 kB	Adobe PDF	Visualizza/Apri Richiedi una copia
Controllable_Conformer_for_Speech_Enhancement_and_Recognition.pdf Solo gestori archvio Tipologia: Versione Editoriale Dimensione 839.83 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	839.83 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/664606

Citazioni

ND

4

3

social impact