In this paper, we present an encoder-decoder model leveraging Flan-T5 for post-Automatic Speech Recognition (ASR) Generative Speech Error Correction (GenSEC), and we refer to it as FlanEC. We explore its application within the GenSEC framework to enhance ASR outputs by mapping n-best hypotheses into a single output sentence. By utilizing n-best lists from ASR models, we aim to improve the linguistic correctness, accuracy, and grammaticality of final ASR transcriptions. Specifically, we investigate whether scaling the training data and incorporating diverse datasets can lead to significant improvements in post-ASR error correction. We evaluate FlanEC using the HyPoradise dataset, providing a comprehensive analysis of the model's effectiveness in this domain. Furthermore, we assess the proposed approach under different settings to evaluate model scalability and efficiency, offering valuable insights into the potential of instruction-tuned encoder-decoder models for this task.

La Quatra, M., Salerno, V.M., Tsao, Y.u., Siniscalchi, S.M. (2024). FlanEC: Exploring Flan-T5 for Post-ASR Error Correction. In Proceedings of 2024 IEEE Spoken Language Technology Workshop, SLT 2024 (pp. 608-615). Institute of Electrical and Electronics Engineers Inc. [10.1109/slt61566.2024.10832257].

FlanEC: Exploring Flan-T5 for Post-ASR Error Correction

La Quatra, Moreno;Siniscalchi, Sabato Marco
2024-01-01

Abstract

In this paper, we present an encoder-decoder model leveraging Flan-T5 for post-Automatic Speech Recognition (ASR) Generative Speech Error Correction (GenSEC), and we refer to it as FlanEC. We explore its application within the GenSEC framework to enhance ASR outputs by mapping n-best hypotheses into a single output sentence. By utilizing n-best lists from ASR models, we aim to improve the linguistic correctness, accuracy, and grammaticality of final ASR transcriptions. Specifically, we investigate whether scaling the training data and incorporating diverse datasets can lead to significant improvements in post-ASR error correction. We evaluate FlanEC using the HyPoradise dataset, providing a comprehensive analysis of the model's effectiveness in this domain. Furthermore, we assess the proposed approach under different settings to evaluate model scalability and efficiency, offering valuable insights into the potential of instruction-tuned encoder-decoder models for this task.
2024
Settore IINF-05/A - Sistemi di elaborazione delle informazioni
979-8-3503-9225-8
La Quatra, M., Salerno, V.M., Tsao, Y.u., Siniscalchi, S.M. (2024). FlanEC: Exploring Flan-T5 for Post-ASR Error Correction. In Proceedings of 2024 IEEE Spoken Language Technology Workshop, SLT 2024 (pp. 608-615). Institute of Electrical and Electronics Engineers Inc. [10.1109/slt61566.2024.10832257].
File in questo prodotto:
File Dimensione Formato  
FlanEC_Exploring_Flan-T5_for_Post-ASR_Error_Correction.pdf

Solo gestori archvio

Tipologia: Versione Editoriale
Dimensione 261.36 kB
Formato Adobe PDF
261.36 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/673126
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact