Archivio istituzionale della ricerca dell'Università degli Studi di Palermo

In this work, we propose a novel consistency-preserving loss function for recovering the phase information in the context of phase reconstruction (PR) and speech enhancement (SE). Different from conventional techniques that directly estimate the phase using a deep model, our idea is to exploit ad-hoc constraints to directly generate a consistent pair of magnitude and phase. Specifically, the proposed loss forces a set of complex numbers to be a consistent short-time Fourier transform (STFT) representation, i.e., to be the spectrogram of a real signal. Our approach thus avoids the difficulty of estimating the original phase, which is highly unstructured and sensitive to time shift. The influence of our proposed loss is first assessed on a PR task, experimentally demonstrating that our approach is viable. Next, we show its effectiveness on an SE task, using both the VB-DMD and WSJ0-CHiME3 data sets. On VB-DMD, our approach is competitive with conventional solutions. On the challenging WSJ0-CHiME3 set, the proposed framework compares favourably over those techniques that explicitly estimate the phase.

Ku P.-J., Ho C.-W., Yen H., Siniscalchi S.M., Lee C.-H. (2025). An Explicit Consistency-Preserving Loss Function for Phase Reconstruction and Speech Enhancement. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings (pp. 1-5). Institute of Electrical and Electronics Engineers Inc. [10.1109/ICASSP49660.2025.10887812].

An Explicit Consistency-Preserving Loss Function for Phase Reconstruction and Speech Enhancement

Yen H.^{Membro del Collaboration Group};Siniscalchi S. M.^Supervision;

2025-01-01

Abstract

In this work, we propose a novel consistency-preserving loss function for recovering the phase information in the context of phase reconstruction (PR) and speech enhancement (SE). Different from conventional techniques that directly estimate the phase using a deep model, our idea is to exploit ad-hoc constraints to directly generate a consistent pair of magnitude and phase. Specifically, the proposed loss forces a set of complex numbers to be a consistent short-time Fourier transform (STFT) representation, i.e., to be the spectrogram of a real signal. Our approach thus avoids the difficulty of estimating the original phase, which is highly unstructured and sensitive to time shift. The influence of our proposed loss is first assessed on a PR task, experimentally demonstrating that our approach is viable. Next, we show its effectiveness on an SE task, using both the VB-DMD and WSJ0-CHiME3 data sets. On VB-DMD, our approach is competitive with conventional solutions. On the challenging WSJ0-CHiME3 set, the proposed framework compares favourably over those techniques that explicitly estimate the phase.

Scheda breve

Scheda completa

Scheda completa (DC)

	Data
	
				2025
			
	ISBN della monografia 
DATO PREVISTO SU LOGINMIUR
	
				979-8-3503-6874-1
			
	DOI del contributo 
DATO PREVISTO SU LOGINMIUR
	
				https://dx.doi.org/10.1109/ICASSP49660.2025.10887812
			
	Citazione
	
				Ku P.-J.,  Ho C.-W.,  Yen H.,  Siniscalchi S.M.,  Lee C.-H. (2025). An Explicit Consistency-Preserving Loss Function for Phase Reconstruction and Speech Enhancement. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings (pp. 1-5). Institute of Electrical and Electronics Engineers Inc. [10.1109/ICASSP49660.2025.10887812].
			
	Appare nelle tipologie:
	
				2.07 Contributo in atti di convegno pubblicato in volume

File in questo prodotto:

File	Dimensione	Formato
An_Explicit_Consistency-Preserving_Loss_Function_for_Phase_Reconstruction_and_Speech_Enhancement.pdf Solo gestori archvio Descrizione: main document Tipologia: Versione Editoriale Dimensione 391.13 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	391.13 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/679551

Citazioni

ND

0

ND

social impact