We have developed an innovative speech enhancement (SE) model backbone that utilizes cross-attention among spectrum, waveform and self-supervised learned representations (CA-SW-SSL) to integrate knowledge from diverse feature domains. The CA-SW-SSL model integrates the cross spectrum and waveform attention (CSWA) model to connect the spectrum and waveform branches, along with a dual-path cross-attention module to select outputs from different layers of the self-supervised learning (SSL) model. To handle the increased complexity of SSL integration, we introduce a bidirectional knowledge distillation (BiKD) framework for model compression. The proposed adaptive layered distance measure (ALDM) maximizes the Gaussian likelihood between clean and enhanced multi-level SSL features during the backward knowledge distillation (BKD) process. Meanwhile, in the forward process, the CA-SW-SSL model acts as a teacher, using the novel teacher–student Barlow Twins (TSBT) loss to guide the training of the CSWA student models, including both lite and tiny versions. Experiments on the DNS-Challenge and Voicebank+Demand datasets demonstrate that the CSWA-Lite+BiKD model outperforms existing joint spectrum-waveform methods and surpasses the state-of-the-art on the DNS-Challenge non-blind test set with half the computational load. Further, the CA-SW-SSL+BiKD model outperforms all CSWA models and current SSL-based methods.

Chen H., Wang C., Wang Q., Du J., Siniscalchi S.M., Wan G., et al. (2025). Cross-attention among spectrum, waveform and SSL representations with bidirectional knowledge distillation for speech enhancement. INFORMATION FUSION, 122 [10.1016/j.inffus.2025.103218].

Cross-attention among spectrum, waveform and SSL representations with bidirectional knowledge distillation for speech enhancement

Siniscalchi S. M.;
2025-10-01

Abstract

We have developed an innovative speech enhancement (SE) model backbone that utilizes cross-attention among spectrum, waveform and self-supervised learned representations (CA-SW-SSL) to integrate knowledge from diverse feature domains. The CA-SW-SSL model integrates the cross spectrum and waveform attention (CSWA) model to connect the spectrum and waveform branches, along with a dual-path cross-attention module to select outputs from different layers of the self-supervised learning (SSL) model. To handle the increased complexity of SSL integration, we introduce a bidirectional knowledge distillation (BiKD) framework for model compression. The proposed adaptive layered distance measure (ALDM) maximizes the Gaussian likelihood between clean and enhanced multi-level SSL features during the backward knowledge distillation (BKD) process. Meanwhile, in the forward process, the CA-SW-SSL model acts as a teacher, using the novel teacher–student Barlow Twins (TSBT) loss to guide the training of the CSWA student models, including both lite and tiny versions. Experiments on the DNS-Challenge and Voicebank+Demand datasets demonstrate that the CSWA-Lite+BiKD model outperforms existing joint spectrum-waveform methods and surpasses the state-of-the-art on the DNS-Challenge non-blind test set with half the computational load. Further, the CA-SW-SSL+BiKD model outperforms all CSWA models and current SSL-based methods.
ott-2025
Settore IINF-05/A - Sistemi di elaborazione delle informazioni
Chen H., Wang C., Wang Q., Du J., Siniscalchi S.M., Wan G., et al. (2025). Cross-attention among spectrum, waveform and SSL representations with bidirectional knowledge distillation for speech enhancement. INFORMATION FUSION, 122 [10.1016/j.inffus.2025.103218].
File in questo prodotto:
File Dimensione Formato  
1-s2.0-S156625352500291X-main.pdf

Solo gestori archvio

Descrizione: main document
Tipologia: Versione Editoriale
Dimensione 3.33 MB
Formato Adobe PDF
3.33 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/679550
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact