Archivio istituzionale della ricerca dell'Università degli Studi di Palermo

Despite the remarkable progress in end-to-end Automatic Speech Recognition (ASR) engines, accurately transcribing dysarthric speech remains a major challenge. In this work, we proposed a two-stage framework for the Speech Accessibility Project Challenge at INTERSPEECH 2025, which combines cutting-edge speech recognition models with LLM-based generative error correction (GER). We assess different configurations of model scales and training strategies, incorporating specific hypothesis selection to improve transcription accuracy. Experiments on the Speech Accessibility Project dataset demonstrate the strength of our approach on structured and spontaneous speech, while highlighting challenges in single-word recognition. Through comprehensive analysis, we provide insights into the complementary roles of acoustic and linguistic modeling in dysarthric speech recognition.

La Quatra, M., Koudounas, A., Salerno, V.M., Siniscalchi, S.M. (2025). Exploring Generative Error Correction for Dysarthric Speech Recognition. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH (pp. 3284-3288). International Speech Communication Association [10.21437/Interspeech.2025-1553].

Exploring Generative Error Correction for Dysarthric Speech Recognition

La Quatra M.;Koudounas A.;Salerno V. M.;Siniscalchi S. M.

2025-01-01

Abstract

Despite the remarkable progress in end-to-end Automatic Speech Recognition (ASR) engines, accurately transcribing dysarthric speech remains a major challenge. In this work, we proposed a two-stage framework for the Speech Accessibility Project Challenge at INTERSPEECH 2025, which combines cutting-edge speech recognition models with LLM-based generative error correction (GER). We assess different configurations of model scales and training strategies, incorporating specific hypothesis selection to improve transcription accuracy. Experiments on the Speech Accessibility Project dataset demonstrate the strength of our approach on structured and spontaneous speech, while highlighting challenges in single-word recognition. Through comprehensive analysis, we provide insights into the complementary roles of acoustic and linguistic modeling in dysarthric speech recognition.

Scheda breve

Scheda completa

Scheda completa (DC)

	Data
	
				2025
			
	DOI del contributo 
DATO PREVISTO SU LOGINMIUR
	
				https://dx.doi.org/10.21437/Interspeech.2025-1553
			
	URL dell'editore (Open access ove possibile)
	
				https://www.isca-archive.org/interspeech_2025/laquatra25_interspeech.pdf
			
	Citazione
	
				La Quatra, M., Koudounas, A., Salerno, V.M., Siniscalchi, S.M. (2025). Exploring Generative Error Correction for Dysarthric Speech Recognition. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH (pp. 3284-3288). International Speech Communication Association [10.21437/Interspeech.2025-1553].
			
	Appare nelle tipologie:
	
				2.07 Contributo in atti di convegno pubblicato in volume

File in questo prodotto:

File	Dimensione	Formato
laquatra25_interspeech.pdf Solo gestori archvio Descrizione: Il testo pieno dell’articolo è disponibile al seguente link: https://www.isca-archive.org/interspeech_2025/laquatra25_interspeech.html Tipologia: Versione Editoriale Dimensione 328.78 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	328.78 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/694129

Citazioni

ND

1

ND

social impact