We propose a novel approach to addressing the adaptation effectiveness issue in parameter adaptation for deep neural network (DNN) based acoustic models for automatic speech recognition by adding one or more small auxiliary output layers modeling broad acoustic units, such as mono-phones or tied-state (often called senone) clusters. In scenarios with a limited amount of available adaptation data, most senones are usually rarely seen or not observed, and consequently the ability to model them in a new condition is often not fully exploited. With the original senone classification task as the primary task, and adding auxiliary mono-phone/senone-cluster classification as the secondary tasks, multi-task learning (MTL) is employed to adapt the DNN parameters. With the proposed MTL adaptation framework, we improve the learning ability of the original DNN structure, then enlarge the coverage of the acoustic space to deal with the unseen senone problem, and thus enhance the discrimination power of the adapted DNN models. Experimental results on the 20,000-word open vocabulary WSJ task demonstrate that the proposed framework consistently outperforms the conventional linear hidden layer adaptation schemes without MTL by providing 5.4% relative reduction in word error rate (WERR) with only 1 single adaptation utterance, and 10.7% WERR with 40 adaptation utterances against the un-adapted DNN models.

Huang, Z., Li, J., SINISCALCHI, S.M., Chen, I.F., Wu, J.i., Lee, C.H. (2015). Rapid adaptation for deep neural networks through multi-task learning. In INTERSPEECH 2015 (pp. 3625-3629). ISCA-INT SPEECH COMMUNICATION ASSOC [10.21437/Interspeech.2015-719].

Rapid adaptation for deep neural networks through multi-task learning

SINISCALCHI, SABATO MARCO
Membro del Collaboration Group
;
2015-01-01

Abstract

We propose a novel approach to addressing the adaptation effectiveness issue in parameter adaptation for deep neural network (DNN) based acoustic models for automatic speech recognition by adding one or more small auxiliary output layers modeling broad acoustic units, such as mono-phones or tied-state (often called senone) clusters. In scenarios with a limited amount of available adaptation data, most senones are usually rarely seen or not observed, and consequently the ability to model them in a new condition is often not fully exploited. With the original senone classification task as the primary task, and adding auxiliary mono-phone/senone-cluster classification as the secondary tasks, multi-task learning (MTL) is employed to adapt the DNN parameters. With the proposed MTL adaptation framework, we improve the learning ability of the original DNN structure, then enlarge the coverage of the acoustic space to deal with the unseen senone problem, and thus enhance the discrimination power of the adapted DNN models. Experimental results on the 20,000-word open vocabulary WSJ task demonstrate that the proposed framework consistently outperforms the conventional linear hidden layer adaptation schemes without MTL by providing 5.4% relative reduction in word error rate (WERR) with only 1 single adaptation utterance, and 10.7% WERR with 40 adaptation utterances against the un-adapted DNN models.
2015
978-1-5108-1790-6
Huang, Z., Li, J., SINISCALCHI, S.M., Chen, I.F., Wu, J.i., Lee, C.H. (2015). Rapid adaptation for deep neural networks through multi-task learning. In INTERSPEECH 2015 (pp. 3625-3629). ISCA-INT SPEECH COMMUNICATION ASSOC [10.21437/Interspeech.2015-719].
File in questo prodotto:
File Dimensione Formato  
i15_3625.pdf

Solo gestori archvio

Descrizione: Il testo pieno dell’articolo è disponibile al seguente link: https://www.isca-archive.org/interspeech_2015/huang15h_interspeech.html
Tipologia: Versione Editoriale
Dimensione 132.66 kB
Formato Adobe PDF
132.66 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/649513
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 57
  • ???jsp.display-item.citation.isi??? 30
social impact