Archivio istituzionale della ricerca dell'Università degli Studi di Palermo

We examine the Evidence Lower Bound (ELBO) within diffusion models (DMs) applied to speech enhancement (SE) and dereverberation (SD). We focus in particular on the interplay between the ELBO and Gaussian noise schedule (GNS), and the choice of practical loss functions. We hypothesize that the suboptimal performance of DM-based SE and SD can arise from the absence of a well-calibrated GNS. We therefore refine the noise schedule design by controlling the minimum and maximum noise variances. Additionally, we introduce the Importance of Condition as a novel metric that quantitatively assesses the influence of noise variance on the model behavior during reverse diffusion processes. Our analysis reveals that changing the GNS configuration substantially affects the reliance of the model on the input condition, thereby impacting the overall performance. Furthermore, we demonstrate that conventional loss functions used in DMs inherently impose a performance ceiling that prevents convergence to the theoretical optimum of the ELBO, resulting in suboptimal SE and SD outcomes. We propose a two-stage training framework to alleviate this limit. First, a score-based DM uses an optimized GNS to perform initial enhancement. Second, a dedicated refinement model is trained to further improve the ELBO and enhance speech quality. Our comprehensive experimental validation demonstrates the effectiveness of the proposed framework on both SE and SD tasks.

Guo, Z., Siniscalchi, S.M., Du, J., Shen, K., Pan, J., Gao, J. (2026). Closing the ELBO Gap in Diffusion Models for Speech Enhancement and Dereverberation. IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, 34, 1966-1979 [10.1109/TASLPRO.2026.3675774].

Closing the ELBO Gap in Diffusion Models for Speech Enhancement and Dereverberation

Siniscalchi S. M.^{Conceptualization};Du J.^{Membro del Collaboration Group};Shen K.^{Membro del Collaboration Group};Pan J.^{Membro del Collaboration Group};Gao J.^{Membro del Collaboration Group}

2026-01-01

Abstract

We examine the Evidence Lower Bound (ELBO) within diffusion models (DMs) applied to speech enhancement (SE) and dereverberation (SD). We focus in particular on the interplay between the ELBO and Gaussian noise schedule (GNS), and the choice of practical loss functions. We hypothesize that the suboptimal performance of DM-based SE and SD can arise from the absence of a well-calibrated GNS. We therefore refine the noise schedule design by controlling the minimum and maximum noise variances. Additionally, we introduce the Importance of Condition as a novel metric that quantitatively assesses the influence of noise variance on the model behavior during reverse diffusion processes. Our analysis reveals that changing the GNS configuration substantially affects the reliance of the model on the input condition, thereby impacting the overall performance. Furthermore, we demonstrate that conventional loss functions used in DMs inherently impose a performance ceiling that prevents convergence to the theoretical optimum of the ELBO, resulting in suboptimal SE and SD outcomes. We propose a two-stage training framework to alleviate this limit. First, a score-based DM uses an optimized GNS to perform initial enhancement. Second, a dedicated refinement model is trained to further improve the ELBO and enhance speech quality. Our comprehensive experimental validation demonstrates the effectiveness of the proposed framework on both SE and SD tasks.

Scheda breve

Scheda completa

Scheda completa (DC)

	Data
	
				2026
			
	Settore scientifico disciplinare del contributo
	
				Settore IINF-05/A - Sistemi di elaborazione delle informazioni
			
	Titolo del periodico 
DATO PREVISTO SU LOGINMIUR
	
				IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING
			
	DOI del contributo 
DATO PREVISTO SU LOGINMIUR
	
				https://dx.doi.org/10.1109/TASLPRO.2026.3675774
			
	Citazione
	
				Guo, Z., Siniscalchi, S.M., Du, J., Shen, K., Pan, J., Gao, J. (2026). Closing the ELBO Gap in Diffusion Models for Speech Enhancement and Dereverberation. IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, 34, 1966-1979 [10.1109/TASLPRO.2026.3675774].
			
	Appare nelle tipologie:
	
				1.01 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
Closing_the_ELBO_Gap_in_Diffusion_Models_for_Speech_Enhancement_and_Dereverberation.pdf Solo gestori archvio Tipologia: Versione Editoriale Dimensione 3.96 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	3.96 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/703824

Citazioni

ND

0

ND

social impact