Archivio istituzionale della ricerca dell'Università degli Studi di Palermo

Large Vocabulary Continuous Speech Recognition (LVCSR) systems decode the input speech using diverse information sources, such as acoustic, lexical, and linguistic. Although most of the unreliable hypotheses are pruned during the recognition process, current state-of-the-art systems often make errors that are “unreasonable” for human listeners. Several studies have shown that a proper integration of acoustic-phonetic information can be beneficial to reducing such errors. We have previously shown that high-accuracy phone recognition can be achieved if a bank of speech attribute detectors is used to compute a confidence score describing attribute activation levels that the current frame exhibits. In those experiments, the phone recognition system did not rely on the language model to follow their word sequence constraints, and the vocabulary was small. In this work, we extend our approach to LVCSR by introducing a second recognition step during which additional information not directly used during conventional log-likelihood based decoding is introduced. Experimental results show promising performance.

SINISCALCHI, S.M., Torbjorn Svendsen, Chin Hui Lee (2009). A phonetic feature based lattice rescoring approach to LVCSR. In IEEE ICASSP (pp. 3865-3868) [10.1109/ICASSP.2009.4960471].

A phonetic feature based lattice rescoring approach to LVCSR

SINISCALCHI, SABATO MARCO^{Primo

Formal Analysis};Torbjorn Svendsen^Secondo;Chin Hui Lee^Ultimo

2009-01-01

Abstract

Large Vocabulary Continuous Speech Recognition (LVCSR) systems decode the input speech using diverse information sources, such as acoustic, lexical, and linguistic. Although most of the unreliable hypotheses are pruned during the recognition process, current state-of-the-art systems often make errors that are “unreasonable” for human listeners. Several studies have shown that a proper integration of acoustic-phonetic information can be beneficial to reducing such errors. We have previously shown that high-accuracy phone recognition can be achieved if a bank of speech attribute detectors is used to compute a confidence score describing attribute activation levels that the current frame exhibits. In those experiments, the phone recognition system did not rely on the language model to follow their word sequence constraints, and the vocabulary was small. In this work, we extend our approach to LVCSR by introducing a second recognition step during which additional information not directly used during conventional log-likelihood based decoding is introduced. Experimental results show promising performance.

Scheda breve

Scheda completa

Scheda completa (DC)

	Data
	
				2009
			
	Settore scientifico disciplinare del contributo
	
				Settore IINF-05/A - Sistemi di elaborazione delle informazioni
			
	ISBN della monografia 
DATO PREVISTO SU LOGINMIUR
	
				978-1-4244-2354-5
			
	DOI del contributo 
DATO PREVISTO SU LOGINMIUR
	
				https://dx.doi.org/10.1109/ICASSP.2009.4960471
			
	URL alternativo rispetto a quello dell'editore 
DATO PREVISTO SU LOGINMIUR
	
				http://ieeexplore.ieee.org/document/4960471/
			
	Citazione
	
				SINISCALCHI, S.M.,  Torbjorn Svendsen,  Chin Hui Lee (2009). A phonetic feature based lattice rescoring approach to LVCSR. In IEEE ICASSP (pp. 3865-3868) [10.1109/ICASSP.2009.4960471].
			
	Appare nelle tipologie:
	
				2.07 Contributo in atti di convegno pubblicato in volume

File in questo prodotto:

File	Dimensione	Formato
04960471.pdf Solo gestori archvio Dimensione 253.26 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	253.26 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/670047

Citazioni

ND

9

3

social impact