Deep neural network (DNN)-based speech enhancement (SE) usually uses conventional activation functions, which lack the expressiveness to capture complex multiscale structures needed for high-fidelity SE. Group-Rational KAN (GR-KAN), a variant of Kolmogorov-Arnold Networks (KAN), retains KAN's expressiveness while improving scalability on complex tasks. We adapt GR-KAN to existing DNN-based SE by replacing dense layers with GR-KAN layers in the time-frequency (T-F) domain MP-SENet and adapting GR-KAN's activations into the 1D CNN layers in the time-domain Demucs. Results on Voicebank-DEMAND show that GR-KAN requires up to 4× fewer parameters while improving PESQ by up to 0.1. In contrast, KAN, facing scalability issues, outperforms MLP on a small-scale signal modeling task but fails to improve MP-SENet. We demonstrate the first successful use of KAN-based methods for consistent improvement in both time- and SoTA TF-domain SE, establishing GR-KAN as a promising alternative for SE.

Li, H., Hu, Y., Chen, C., Siniscalchi, S.M., Liu, S., Chng, E.S. (2025). From KAN to GR-KAN: Advancing Speech Enhancement with KAN-Based Methodology. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH (pp. 5153-5157). International Speech Communication Association [10.21437/Interspeech.2025-896].

From KAN to GR-KAN: Advancing Speech Enhancement with KAN-Based Methodology

Siniscalchi S. M.;
2025-01-01

Abstract

Deep neural network (DNN)-based speech enhancement (SE) usually uses conventional activation functions, which lack the expressiveness to capture complex multiscale structures needed for high-fidelity SE. Group-Rational KAN (GR-KAN), a variant of Kolmogorov-Arnold Networks (KAN), retains KAN's expressiveness while improving scalability on complex tasks. We adapt GR-KAN to existing DNN-based SE by replacing dense layers with GR-KAN layers in the time-frequency (T-F) domain MP-SENet and adapting GR-KAN's activations into the 1D CNN layers in the time-domain Demucs. Results on Voicebank-DEMAND show that GR-KAN requires up to 4× fewer parameters while improving PESQ by up to 0.1. In contrast, KAN, facing scalability issues, outperforms MLP on a small-scale signal modeling task but fails to improve MP-SENet. We demonstrate the first successful use of KAN-based methods for consistent improvement in both time- and SoTA TF-domain SE, establishing GR-KAN as a promising alternative for SE.
2025
Settore IINF-05/A - Sistemi di elaborazione delle informazioni
Li, H., Hu, Y., Chen, C., Siniscalchi, S.M., Liu, S., Chng, E.S. (2025). From KAN to GR-KAN: Advancing Speech Enhancement with KAN-Based Methodology. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH (pp. 5153-5157). International Speech Communication Association [10.21437/Interspeech.2025-896].
File in questo prodotto:
File Dimensione Formato  
li25m_interspeech.pdf

Solo gestori archvio

Descrizione: Il testo pieno dell’articolo è disponibile al seguente link: https://www.isca-archive.org/interspeech_2025/li25m_interspeech.html
Tipologia: Versione Editoriale
Dimensione 1.14 MB
Formato Adobe PDF
1.14 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/694126
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact