We attempt to formulate Bayesian speaker adaptation for deep models and explore two different solutions. In the first “indirect” approach, Bayesian adaptation is applied to context-dependent, Gaussian-mixture-model based hidden Markov models (CD-GMM-HMMs) with bottleneck (BN) features derived from deep neural networks (DNNs). The second method directly formulates Bayesian adaptation for CD-DNN-HMMs by casting the adaptation step into a generative framework to formulate maximum-likelihood (ML) and maximum a posteriori (MAP) adaptation schemes. Experiments on the Wall Street Journal task demonstrate that both MAP and Structural MAP (SMAP) adaptation schemes are effective even with discriminative BN features. Furthermore, SMAP can attain a meaningful word error reduction (WERR) of 7.3% even when 80 hours of data, and 284 different speakers are available at training time. We have also observed a notable performance improvement with the indirect approach, and that supports the plausibility of proposed solution towards this novel direction.
Huang, Z., SINISCALCHI, S.M., Chen, I.F., Lee, C.H. (2017). Towards a direct Bayesian adaptation framework for deep models. In 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA). IEEE [10.1109/APSIPA.2016.7820894].
Towards a direct Bayesian adaptation framework for deep models
SINISCALCHI, SABATO MARCO;
2017-01-19
Abstract
We attempt to formulate Bayesian speaker adaptation for deep models and explore two different solutions. In the first “indirect” approach, Bayesian adaptation is applied to context-dependent, Gaussian-mixture-model based hidden Markov models (CD-GMM-HMMs) with bottleneck (BN) features derived from deep neural networks (DNNs). The second method directly formulates Bayesian adaptation for CD-DNN-HMMs by casting the adaptation step into a generative framework to formulate maximum-likelihood (ML) and maximum a posteriori (MAP) adaptation schemes. Experiments on the Wall Street Journal task demonstrate that both MAP and Structural MAP (SMAP) adaptation schemes are effective even with discriminative BN features. Furthermore, SMAP can attain a meaningful word error reduction (WERR) of 7.3% even when 80 hours of data, and 284 different speakers are available at training time. We have also observed a notable performance improvement with the indirect approach, and that supports the plausibility of proposed solution towards this novel direction.File | Dimensione | Formato | |
---|---|---|---|
Towards_a_direct_Bayesian_adaptation_framework_for_deep_models.pdf
Solo gestori archvio
Tipologia:
Versione Editoriale
Dimensione
290.5 kB
Formato
Adobe PDF
|
290.5 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.