We present a novel approach to designing bottom-up automatic speech recognition (ASR) systems. The key component of the proposed approach is a bank of articulatory attribute detectors implemented using a set of feed-forward artificial neural networks (ANNs). Each detector computes a score describing an activation level of the specified speech attributes that the current frame exhibits. These cues are first combined by an event merger that provides some evidence about the presence of a higher level feature which is then verified by an evidence verifier to produce hypotheses at the phone or word level. We evaluate several configurations of our proposed system on a continuous phone recognition task. Experimental results on the TIMIT database show that the system achieves a phone error rate of 25% which is superior to results obtained with either hidden Markov model (HMM) or conditional random field (CRF) based recognizers. We believe the system's inherent flexibility and the ease of adding new detectors may provide further improvements.

S. M. SINISCALCHI, T. SVENDSEN, AND C-H. LEE (2007). Towards Bottom-up Continuous Phone Recognition. In ASRU (pp. 566-569). IEEE [10.1109/ASRU.2007.4430174].

Towards Bottom-up Continuous Phone Recognition

S. M. SINISCALCHI
Primo
Investigation
;
2007-01-01

Abstract

We present a novel approach to designing bottom-up automatic speech recognition (ASR) systems. The key component of the proposed approach is a bank of articulatory attribute detectors implemented using a set of feed-forward artificial neural networks (ANNs). Each detector computes a score describing an activation level of the specified speech attributes that the current frame exhibits. These cues are first combined by an event merger that provides some evidence about the presence of a higher level feature which is then verified by an evidence verifier to produce hypotheses at the phone or word level. We evaluate several configurations of our proposed system on a continuous phone recognition task. Experimental results on the TIMIT database show that the system achieves a phone error rate of 25% which is superior to results obtained with either hidden Markov model (HMM) or conditional random field (CRF) based recognizers. We believe the system's inherent flexibility and the ease of adding new detectors may provide further improvements.
2007
978-1-4244-1746-9
S. M. SINISCALCHI, T. SVENDSEN, AND C-H. LEE (2007). Towards Bottom-up Continuous Phone Recognition. In ASRU (pp. 566-569). IEEE [10.1109/ASRU.2007.4430174].
File in questo prodotto:
File Dimensione Formato  
ASRU_2007.pdf

Solo gestori archvio

Tipologia: Versione Editoriale
Dimensione 663.68 kB
Formato Adobe PDF
663.68 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/664128
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 38
  • ???jsp.display-item.citation.isi??? 26
social impact