Archivio istituzionale della ricerca dell'Università degli Studi di Palermo

Voice disorders significantly impact patient quality of life, yet non-invasive automated diagnosis remains under-explored due to both the scarcity of pathological voice data, and the variability in recording sources. This work introduces MVP (Multi-source Voice Pathology detection), a novel approach that leverages transformers operating directly on raw voice signals. We explore three fusion strategies to combine sentence reading and sustained vowel recordings: waveform concatenation, intermediate feature fusion, and decision-level combination. Empirical validation across the German, Portuguese, and Italian languages shows that intermediate feature fusion using transformers best captures the complementary characteristics of both recording types. Our approach achieves up to +13% AUC improvement over single-source methods.

Koudounas, A., La Quatra, M., Ciravegna, G., Fantini, M., Crosetti, E., Succo, G., et al. (2025). MVP: Multi-source Voice Pathology detection. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH (pp. 3548-3552). International Speech Communication Association [10.21437/Interspeech.2025-1868].

MVP: Multi-source Voice Pathology detection

Koudounas A.;La Quatra M.;Ciravegna G.;Fantini M.;Crosetti E.;Succo G.;Cerquitelli T.;Siniscalchi S. M.;Baralis E.

2025-01-01

Abstract

Voice disorders significantly impact patient quality of life, yet non-invasive automated diagnosis remains under-explored due to both the scarcity of pathological voice data, and the variability in recording sources. This work introduces MVP (Multi-source Voice Pathology detection), a novel approach that leverages transformers operating directly on raw voice signals. We explore three fusion strategies to combine sentence reading and sustained vowel recordings: waveform concatenation, intermediate feature fusion, and decision-level combination. Empirical validation across the German, Portuguese, and Italian languages shows that intermediate feature fusion using transformers best captures the complementary characteristics of both recording types. Our approach achieves up to +13% AUC improvement over single-source methods.

Scheda breve

Scheda completa

Scheda completa (DC)

	Data
	
				2025
			
	DOI del contributo 
DATO PREVISTO SU LOGINMIUR
	
				https://dx.doi.org/10.21437/Interspeech.2025-1868
			
	URL dell'editore (Open access ove possibile)
	
				https://www.isca-archive.org/interspeech_2025/koudounas25b_interspeech.html
			
	Citazione
	
				Koudounas, A., La Quatra, M., Ciravegna, G., Fantini, M., Crosetti, E., Succo, G., et al. (2025). MVP: Multi-source Voice Pathology detection. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH (pp. 3548-3552). International Speech Communication Association [10.21437/Interspeech.2025-1868].
			
	Appare nelle tipologie:
	
				2.07 Contributo in atti di convegno pubblicato in volume

File in questo prodotto:

File	Dimensione	Formato
koudounas25b_interspeech.pdf Solo gestori archvio Descrizione: Il testo pieno dell’articolo è disponibile al seguente link: https://www.isca-archive.org/interspeech_2025/koudounas25b_interspeech.html Tipologia: Versione Editoriale Dimensione 512.25 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	512.25 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/694127

Citazioni

ND

0

ND

social impact