We present a Bayesian approach to adapting parameters of a well-trained context-dependent, deep-neural-network, hidden Markov model (CD-DNN-HMM) to improve automatic speech recognition performance. Given an abundance of DNN parameters but with only a limited amount of data, the effectiveness of the adapted DNN model can often be compromised. We formulate maximum a posteriori (MAP) adaptation of parameters of a specially designed CD-DNN-HMM with an augmented linear hidden networks connected to the output tied states, or senones, and compare it to feature space MAP linear regression previously proposed. Experimental evidences on the 20,000-word open vocabulary Wall Street Journal task demonstrate the feasibility of the proposed framework. In supervised adaptation, the proposed MAP adaptation approach provides more than 10% relative error reduction and consistently outperforms the conventional transformation based methods. Furthermore, we present an initial attempt to generate hierarchical priors to im- prove adaptation efficiency and effectiveness with limited adap- tation data by exploiting similarities among senones.

Huang, Z., SINISCALCHI, S.M., Chen, I.F., Li, L., Wu, J., L.e.e., ..H. (2015). Maximum a posteriori adaptation of network parameters in deep models. In INTERSPEECH 2015 (pp. 1076-1080). INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (ISCA).

Maximum a posteriori adaptation of network parameters in deep models

SINISCALCHI, SABATO MARCO;
2015-01-01

Abstract

We present a Bayesian approach to adapting parameters of a well-trained context-dependent, deep-neural-network, hidden Markov model (CD-DNN-HMM) to improve automatic speech recognition performance. Given an abundance of DNN parameters but with only a limited amount of data, the effectiveness of the adapted DNN model can often be compromised. We formulate maximum a posteriori (MAP) adaptation of parameters of a specially designed CD-DNN-HMM with an augmented linear hidden networks connected to the output tied states, or senones, and compare it to feature space MAP linear regression previously proposed. Experimental evidences on the 20,000-word open vocabulary Wall Street Journal task demonstrate the feasibility of the proposed framework. In supervised adaptation, the proposed MAP adaptation approach provides more than 10% relative error reduction and consistently outperforms the conventional transformation based methods. Furthermore, we present an initial attempt to generate hierarchical priors to im- prove adaptation efficiency and effectiveness with limited adap- tation data by exploiting similarities among senones.
2015
Huang, Z., SINISCALCHI, S.M., Chen, I.F., Li, L., Wu, J., L.e.e., ..H. (2015). Maximum a posteriori adaptation of network parameters in deep models. In INTERSPEECH 2015 (pp. 1076-1080). INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (ISCA).
File in questo prodotto:
File Dimensione Formato  
IS150347.PDF

Solo gestori archvio

Descrizione: Il testo pieno dell’articolo è disponibile al seguente link: https://www.isca-archive.org/interspeech_2015/huang15b_interspeech.html
Tipologia: Versione Editoriale
Dimensione 601.2 kB
Formato Adobe PDF
601.2 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/649502
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 38
  • ???jsp.display-item.citation.isi??? 48
social impact