We propose a novel approach to speech enhancement, termed Controllable ConforMer for Speech Enhancement (CCMSE), which leverages a Conformer-based architecture integrated with a control factor embedding module. Our method is designed to optimize speech quality for both human auditory perception and automatic speech recognition (ASR). It is observed that while mild denoising typically preserves speech naturalness, stronger denoising can improve human auditory tasks but often at the cost of ASR accuracy due to increased distortion. To address this, we introduce an algorithm that balances these trade-offs. By utilizing differential equations to interpolate between outputs at varying levels of denoising intensity, our method effectively combines the robustness of mild denoising with the clarity of stronger denoising, resulting in enhanced speech that is well-suited for both human and machine listeners. Experimental results on the CHiME-4 dataset validate the effectiveness of our approach. Additionally, to directly evaluate our method, a listening test demo is provided: https://zelokuo.github.io/CCMSE_demo .

Guo, Z., Du, J., Siniscalchi, S.M., Pan, J., Liu, Q. (2025). Controllable Conformer for Speech Enhancement and Recognition. IEEE SIGNAL PROCESSING LETTERS, 32, 156-160 [10.1109/LSP.2024.3505794].

Controllable Conformer for Speech Enhancement and Recognition

Siniscalchi, Sabato Marco
Supervision
;
2025-01-01

Abstract

We propose a novel approach to speech enhancement, termed Controllable ConforMer for Speech Enhancement (CCMSE), which leverages a Conformer-based architecture integrated with a control factor embedding module. Our method is designed to optimize speech quality for both human auditory perception and automatic speech recognition (ASR). It is observed that while mild denoising typically preserves speech naturalness, stronger denoising can improve human auditory tasks but often at the cost of ASR accuracy due to increased distortion. To address this, we introduce an algorithm that balances these trade-offs. By utilizing differential equations to interpolate between outputs at varying levels of denoising intensity, our method effectively combines the robustness of mild denoising with the clarity of stronger denoising, resulting in enhanced speech that is well-suited for both human and machine listeners. Experimental results on the CHiME-4 dataset validate the effectiveness of our approach. Additionally, to directly evaluate our method, a listening test demo is provided: https://zelokuo.github.io/CCMSE_demo .
2025
Settore IINF-05/A - Sistemi di elaborazione delle informazioni
Guo, Z., Du, J., Siniscalchi, S.M., Pan, J., Liu, Q. (2025). Controllable Conformer for Speech Enhancement and Recognition. IEEE SIGNAL PROCESSING LETTERS, 32, 156-160 [10.1109/LSP.2024.3505794].
File in questo prodotto:
File Dimensione Formato  
Controllable_Conformer_for_Speech_Enhancement_and_Recognition.pdf

Solo gestori archvio

Tipologia: Post-print
Dimensione 577.23 kB
Formato Adobe PDF
577.23 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
Controllable_Conformer_for_Speech_Enhancement_and_Recognition.pdf

Solo gestori archvio

Tipologia: Versione Editoriale
Dimensione 839.83 kB
Formato Adobe PDF
839.83 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/664606
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact