We study continuous phone recognition with little or no language-specific speech training data. The phone recognizer integrates three levels of information from: (1) frame based speech attribute detectors, (2) artificial neural network based phone event mergers, and (3) decoding based evidence verifiers. With a set of acoustic phonetic attributes defined over a number of available languages, a collection of attribute-to-phone mapping rules can either be specified in a language-dependent way, one for each language, or even independently for all languages if the attribute specification is complete to cover all phones and the phone definition is universal to cover all spoken languages. We report on experimental results on Japanese phone recognition with the OGI Multilingual Speech Corpus. It is interesting that a good performance can be achieved without using any Japanese speech training data, and the phone accuracy rates vary depending on how the attribute detectors and phone mergers are configured. Further improvement is observed by adding little Japanese data to train the attribute-to-phone mergers.
D.-C. LYU, S. M. SINISCALCHI, AND C. H. LEE (2008). An experimental study on continuous phone recognition with little or no language specific-training data. In ISCA ITRW, Speech Analysis and Processing for Knowledge Discovery.
An experimental study on continuous phone recognition with little or no language specific-training data
S. M. SINISCALCHI;
2008-01-01
Abstract
We study continuous phone recognition with little or no language-specific speech training data. The phone recognizer integrates three levels of information from: (1) frame based speech attribute detectors, (2) artificial neural network based phone event mergers, and (3) decoding based evidence verifiers. With a set of acoustic phonetic attributes defined over a number of available languages, a collection of attribute-to-phone mapping rules can either be specified in a language-dependent way, one for each language, or even independently for all languages if the attribute specification is complete to cover all phones and the phone definition is universal to cover all spoken languages. We report on experimental results on Japanese phone recognition with the OGI Multilingual Speech Corpus. It is interesting that a good performance can be achieved without using any Japanese speech training data, and the phone accuracy rates vary depending on how the attribute detectors and phone mergers are configured. Further improvement is observed by adding little Japanese data to train the attribute-to-phone mergers.File | Dimensione | Formato | |
---|---|---|---|
lyu08_spkd.pdf
Solo gestori archvio
Descrizione: Il testo pieno dell’articolo è disponibile al seguente link: https://www.isca-archive.org/spkd_2008/lyu08_spkd.pdf
Tipologia:
Versione Editoriale
Dimensione
107.21 kB
Formato
Adobe PDF
|
107.21 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.