We present methods of detector design in the Automatic Speech Attribute Transcription project. This paper details the results of a student-led, cross-site collaboration between Georgia Institute of Technology, The Ohio State University and Rutgers University. The work reported in this paper describes and evaluates the detection-based ASR paradigm and discusses phonetic attribute classes, methods of detecting framewise phonetic attributes and methods of combining attribute detectors for ASR. We use Multi-Layer Perceptrons, Hidden Markov Models and Support Vector Machines to compute confidence scores for several prescribed sets of phonetic attribute classes. We use Conditional Random Fields (CRFs) and knowledge-based rescoring of phone lattices to combine framewise detection scores for continuous phone recognition on the TIMIT database. With CRFs, we achieve a phone accuracy of 70.63%, outperforming the baseline and enhanced HMM systems, by incorporating all of the attribute detectors discussed in the paper
I. BROMBERG, Q. FU, J. HOU, J. LI, C. MA, B. MATTHEWS, et al. (2007). Detection-Based ASR in the Automatic Speech Attribute Transcription Project. In INTERSPEECH 2007 (pp. 1829-1832). ISCA-INT SPEECH COMMUNICATION ASSOC,.
Detection-Based ASR in the Automatic Speech Attribute Transcription Project
S. M. SINISCALCHIWriting – Original Draft Preparation
;
2007-01-01
Abstract
We present methods of detector design in the Automatic Speech Attribute Transcription project. This paper details the results of a student-led, cross-site collaboration between Georgia Institute of Technology, The Ohio State University and Rutgers University. The work reported in this paper describes and evaluates the detection-based ASR paradigm and discusses phonetic attribute classes, methods of detecting framewise phonetic attributes and methods of combining attribute detectors for ASR. We use Multi-Layer Perceptrons, Hidden Markov Models and Support Vector Machines to compute confidence scores for several prescribed sets of phonetic attribute classes. We use Conditional Random Fields (CRFs) and knowledge-based rescoring of phone lattices to combine framewise detection scores for continuous phone recognition on the TIMIT database. With CRFs, we achieve a phone accuracy of 70.63%, outperforming the baseline and enhanced HMM systems, by incorporating all of the attribute detectors discussed in the paperFile | Dimensione | Formato | |
---|---|---|---|
INTERSPEECH_2007.pdf
Solo gestori archvio
Descrizione: Il testo pieno dell’articolo è disponibile al seguente link: https://www.isca-archive.org/interspeech_2007/bromberg07_interspeech.html
Tipologia:
Versione Editoriale
Dimensione
279.23 kB
Formato
Adobe PDF
|
279.23 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.