This paper presents a knowledge integration framework to improve performance in large vocabulary continuous speech recognition. Two types of knowledge sources, manner attribute and prosodic structure, are incorporated. For manner of articulation, six attribute detectors trained with an American English corpus (WSJ0) are utilized to rescore hypothesized phones in word lattices obtained by a baseline ASR system. For the prosodic structure, models trained with an unsupervised joint prosody labeling and modeling (PLM) technique using WSJ0 are used in lattice rescoring. Experimental results on the American English WSJ word recognition task of the Nov92 test set show that the proposed approach significantly outperforms the baseline system that does not use articulatory and prosodic information. The results also demonstrate the effectiveness and usefulness of the PLM technique in constructing prosodic models for American English ASR.

Chiang C. Y., SINISCALCHI, S.M., Chen S. H., Lee C. H. (2013). Knowledge Integration for Improving Performance in LVCSR. In INTERSPEECH 2013 (pp. 1786-1790) [10.21437/Interspeech.2013-442].

Knowledge Integration for Improving Performance in LVCSR

SINISCALCHI, SABATO MARCO
Investigation
;
2013-01-01

Abstract

This paper presents a knowledge integration framework to improve performance in large vocabulary continuous speech recognition. Two types of knowledge sources, manner attribute and prosodic structure, are incorporated. For manner of articulation, six attribute detectors trained with an American English corpus (WSJ0) are utilized to rescore hypothesized phones in word lattices obtained by a baseline ASR system. For the prosodic structure, models trained with an unsupervised joint prosody labeling and modeling (PLM) technique using WSJ0 are used in lattice rescoring. Experimental results on the American English WSJ word recognition task of the Nov92 test set show that the proposed approach significantly outperforms the baseline system that does not use articulatory and prosodic information. The results also demonstrate the effectiveness and usefulness of the PLM technique in constructing prosodic models for American English ASR.
2013
Settore IINF-05/A - Sistemi di elaborazione delle informazioni
Chiang C. Y., SINISCALCHI, S.M., Chen S. H., Lee C. H. (2013). Knowledge Integration for Improving Performance in LVCSR. In INTERSPEECH 2013 (pp. 1786-1790) [10.21437/Interspeech.2013-442].
File in questo prodotto:
File Dimensione Formato  
IS130066.pdf

Solo gestori archvio

Descrizione: Il testo pieno dell’articolo è disponibile al seguente link: https://www.isca-archive.org/interspeech_2013/chiang13_interspeech.pdf
Tipologia: Versione Editoriale
Dimensione 204.58 kB
Formato Adobe PDF
204.58 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/664129
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 4
  • ???jsp.display-item.citation.isi??? 3
social impact