Over the past few years, there has been a resurgence of interest in designing high-accuracy automatic speech recognition (ASR) systems due to the key rule they can play in many real-world applications, such as voice print for biometric identification, language identification, and call-scanning. Improving current state-of-the-art technology is therefore vital for the success of those aforementioned applications, yet this is not simple with the standard technology based on hidden Markov models (HMMs) trained on short-term spectral features. This paper offers an innovative prospective on how two novel prominent approaches to ASR, namely speech attribute detection and discriminative training, can be combined into a unified framework with beneficial effects on the overall speech recognition performance. This goal is achieved by embedding phonetic feature detection into a penalized logistic regression machine (PLRM). The proposed approach is evaluated on both isolated and continuous phoneme recognition tasks. Experimental evidence indicate that the proposed framework is able to achieve state-of-the-art performance in the isolated speech recognition task and to outperform current technology in the continuous speech recognition task.

SINISCALCHI, S.M. (2012). Combining Speech Attribute Detection and Penalized Logistic Regression for Phoneme Recognition. NEUROCOMPUTING, 93, 10-18 [10.1016/j.neucom.2012.02.037].

Combining Speech Attribute Detection and Penalized Logistic Regression for Phoneme Recognition

SINISCALCHI, SABATO MARCO
Primo
Investigation
2012-09-15

Abstract

Over the past few years, there has been a resurgence of interest in designing high-accuracy automatic speech recognition (ASR) systems due to the key rule they can play in many real-world applications, such as voice print for biometric identification, language identification, and call-scanning. Improving current state-of-the-art technology is therefore vital for the success of those aforementioned applications, yet this is not simple with the standard technology based on hidden Markov models (HMMs) trained on short-term spectral features. This paper offers an innovative prospective on how two novel prominent approaches to ASR, namely speech attribute detection and discriminative training, can be combined into a unified framework with beneficial effects on the overall speech recognition performance. This goal is achieved by embedding phonetic feature detection into a penalized logistic regression machine (PLRM). The proposed approach is evaluated on both isolated and continuous phoneme recognition tasks. Experimental evidence indicate that the proposed framework is able to achieve state-of-the-art performance in the isolated speech recognition task and to outperform current technology in the continuous speech recognition task.
15-set-2012
SINISCALCHI, S.M. (2012). Combining Speech Attribute Detection and Penalized Logistic Regression for Phoneme Recognition. NEUROCOMPUTING, 93, 10-18 [10.1016/j.neucom.2012.02.037].
File in questo prodotto:
File Dimensione Formato  
NEUCOM12644.pdf

Solo gestori archvio

Tipologia: Versione Editoriale
Dimensione 610.71 kB
Formato Adobe PDF
610.71 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/649522
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 6
  • ???jsp.display-item.citation.isi??? 5
social impact