In this paper, a lattice rescoring approach to integrating acoustic-phonetic information into automatic speech recognition (ASR) is described. Additional information over what is used in conventional log-likelihood based decoding is provided by a bank of speech event detectors that score manner and place of articulation events with log-likelihood ratios that are treated as confidence levels. An artificial neural network (ANN) is then used to transform raw log-likelihood ratio scores into manageable terms for easy incorporation. We refer to the union of the event detectors and the ANN as knowledge module. A goal of this study is to design a generic framework which makes it easier to incorporate other sources of information into an existing ASR system. Another aim is to start investigating the possibility of building a generic knowledge module that can be plugged into an ASR system without being trained on specific data for the given task. To this end, the proposed approach is evaluated on three diverse ASR tasks: continuous phone recognition, connected digit recognition, and large vocabulary continuous speech recognition, but the data-driven knowledge module is trained with a single corpus and used in all three evaluation tasks without further training. Experimental results indicate that in all three cases the proposed rescoring framework achieves better results than those obtained without incorporating the confidence scores provided by the knowledge module. It is interesting to note that the rescoring process is especially effective in correcting utterances with errors in large vocabulary continuous speech recognition, where constraints imposed by the lexical and language models sometimes produce recognition results not strictly observing the underlying acoustic-phonetic properties.

SINISCALCHI, S.M., Lee C. H. (2009). A study on integrating acoustic-phonetic information into lattice rescoring for automatic speech recognition. SPEECH COMMUNICATION, 51(11), 1139-1153 [10.1016/j.specom.2009.05.004].

A study on integrating acoustic-phonetic information into lattice rescoring for automatic speech recognition

SINISCALCHI, SABATO MARCO
Primo
Investigation
;
2009-01-01

Abstract

In this paper, a lattice rescoring approach to integrating acoustic-phonetic information into automatic speech recognition (ASR) is described. Additional information over what is used in conventional log-likelihood based decoding is provided by a bank of speech event detectors that score manner and place of articulation events with log-likelihood ratios that are treated as confidence levels. An artificial neural network (ANN) is then used to transform raw log-likelihood ratio scores into manageable terms for easy incorporation. We refer to the union of the event detectors and the ANN as knowledge module. A goal of this study is to design a generic framework which makes it easier to incorporate other sources of information into an existing ASR system. Another aim is to start investigating the possibility of building a generic knowledge module that can be plugged into an ASR system without being trained on specific data for the given task. To this end, the proposed approach is evaluated on three diverse ASR tasks: continuous phone recognition, connected digit recognition, and large vocabulary continuous speech recognition, but the data-driven knowledge module is trained with a single corpus and used in all three evaluation tasks without further training. Experimental results indicate that in all three cases the proposed rescoring framework achieves better results than those obtained without incorporating the confidence scores provided by the knowledge module. It is interesting to note that the rescoring process is especially effective in correcting utterances with errors in large vocabulary continuous speech recognition, where constraints imposed by the lexical and language models sometimes produce recognition results not strictly observing the underlying acoustic-phonetic properties.
2009
SINISCALCHI, S.M., Lee C. H. (2009). A study on integrating acoustic-phonetic information into lattice rescoring for automatic speech recognition. SPEECH COMMUNICATION, 51(11), 1139-1153 [10.1016/j.specom.2009.05.004].
File in questo prodotto:
File Dimensione Formato  
SPECOM1810.pdf

Solo gestori archvio

Tipologia: Versione Editoriale
Dimensione 766.12 kB
Formato Adobe PDF
766.12 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/649525
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 61
  • ???jsp.display-item.citation.isi??? 47
social impact