A state-of-the-art automatic speech recognition (ASR) system can often achieve high accuracy for most spoken languages of interest if a large amount of speech material can be collected and used to train a set of language-specific acoustic phone models. However, designing good ASR systems with little or no language-specific speech data for resource-limited languages is still a challenging research topic. As a consequence, there has been an increasing interest in exploring knowledge sharing among a large number of languages so that a universal set of acoustic phone units can be defined to work for multiple or even for all languages. This work aims at demonstrating that a recently proposed automatic speech attribute transcription framework can play a key role in designing language-universal acoustic models by sharing speech units among all target languages at the acoustic phonetic attribute level. The language-universal acoustic models are evaluated through phone recognition. It will be shown that good cross-language attribute detection and continuous phone recognition performance can be accomplished for unseen languages using minimal training data from the target languages to be recognized. Furthermore, a phone-based background model (PBM) approach will be presented to improve attribute detection accuracies.
SINISCALCHI, S.M., Lyu D. C., Svendsen T., Lee C. H. (2012). Experiments on cross-language attribute detection and phone recognition with minimal target-specific training data. IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, 20(3), 875-887 [10.1109/TASL.2011.2167610].
Experiments on cross-language attribute detection and phone recognition with minimal target-specific training data
SINISCALCHI, SABATO MARCO
Primo
Investigation
;
2012-01-01
Abstract
A state-of-the-art automatic speech recognition (ASR) system can often achieve high accuracy for most spoken languages of interest if a large amount of speech material can be collected and used to train a set of language-specific acoustic phone models. However, designing good ASR systems with little or no language-specific speech data for resource-limited languages is still a challenging research topic. As a consequence, there has been an increasing interest in exploring knowledge sharing among a large number of languages so that a universal set of acoustic phone units can be defined to work for multiple or even for all languages. This work aims at demonstrating that a recently proposed automatic speech attribute transcription framework can play a key role in designing language-universal acoustic models by sharing speech units among all target languages at the acoustic phonetic attribute level. The language-universal acoustic models are evaluated through phone recognition. It will be shown that good cross-language attribute detection and continuous phone recognition performance can be accomplished for unseen languages using minimal training data from the target languages to be recognized. Furthermore, a phone-based background model (PBM) approach will be presented to improve attribute detection accuracies.File | Dimensione | Formato | |
---|---|---|---|
06016213_.pdf
Solo gestori archvio
Tipologia:
Versione Editoriale
Dimensione
415.63 kB
Formato
Adobe PDF
|
415.63 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.