This work is concerned with devising a robust Parkinson’s (PD) disease detector from speech in real-world operating conditions using (i) foundational models, and (ii) speech enhancement (SE) methods. To this end, we first fine-tune several foundational-based models on the standard PC-GITA (s-PCGITA) clean data. Our results demonstrate superior performance to previously proposed models. Second, we assess the generalization capability of the PD models on the extended PCGITA (e-PC-GITA) recordings, collected in real-world operative conditions, and observe a severe drop in performance moving from ideal to real-world conditions. Third, we align training and testing conditions applaying off-the-shelf SE techniques on e-PC-GITA, and a significant boost in performance is observed only for the foundational-based models. Finally, combining the two best foundational-based models trained on s-PCGITA, namely WavLM Base and Hubert Base, yielded top performance on the enhanced e-PC-GITA

La Quatra M., Turco M.F., Svendsen T., Salvi G., Orozco-Arroyave J.R., Siniscalchi S.M. (2024). Exploiting Foundation Models and Speech Enhancement for Parkinson's Disease Detection from Speech in Real-World Operative Conditions. In INTERSPEECH (pp. 1405-1409) [10.21437/Interspeech.2024-522].

Exploiting Foundation Models and Speech Enhancement for Parkinson's Disease Detection from Speech in Real-World Operative Conditions

La Quatra M.
Formal Analysis
;
Siniscalchi S. M.
Conceptualization
2024-09-01

Abstract

This work is concerned with devising a robust Parkinson’s (PD) disease detector from speech in real-world operating conditions using (i) foundational models, and (ii) speech enhancement (SE) methods. To this end, we first fine-tune several foundational-based models on the standard PC-GITA (s-PCGITA) clean data. Our results demonstrate superior performance to previously proposed models. Second, we assess the generalization capability of the PD models on the extended PCGITA (e-PC-GITA) recordings, collected in real-world operative conditions, and observe a severe drop in performance moving from ideal to real-world conditions. Third, we align training and testing conditions applaying off-the-shelf SE techniques on e-PC-GITA, and a significant boost in performance is observed only for the foundational-based models. Finally, combining the two best foundational-based models trained on s-PCGITA, namely WavLM Base and Hubert Base, yielded top performance on the enhanced e-PC-GITA
1-set-2024
Settore IINF-05/A - Sistemi di elaborazione delle informazioni
La Quatra M., Turco M.F., Svendsen T., Salvi G., Orozco-Arroyave J.R., Siniscalchi S.M. (2024). Exploiting Foundation Models and Speech Enhancement for Parkinson's Disease Detection from Speech in Real-World Operative Conditions. In INTERSPEECH (pp. 1405-1409) [10.21437/Interspeech.2024-522].
File in questo prodotto:
File Dimensione Formato  
laquatra24_interspeech.pdf

Solo gestori archvio

Descrizione: Il testo pieno dell’articolo è disponibile al seguente link: https://www.isca-archive.org/interspeech_2024/laquatra24_interspeech.pdf
Tipologia: Versione Editoriale
Dimensione 410.55 kB
Formato Adobe PDF
410.55 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/670043
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact