Archivio istituzionale della ricerca dell'Università degli Studi di Palermo

This work is concerned with speaker adaptation techniques for artificial neural network (ANN) implemented as feed-forward multi-layer perceptrons (MLPs) in the context of large vocabulary continuous speech recognition (LVCSR). Most successful speaker adaptation techniques for MLPs consist of augmenting the neural architecture with a linear transformation network connected to either the input or the output layer. The weights of this additional linear layer are learned during the adaptation phase while all of the other weights are kept frozen in order to avoid over-fitting. In doing so, the structure of the speaker-dependent (SD) and speaker-independent (Si) architecture differs and the number of adaptation parameters depends upon the dimension of either the input or output layers. We propose an alternative neural architecture for speaker-adaptation to overcome the limits of current approaches. This neural architecture adopts hidden activation functions that can be learned directly from the adaptation data. This adaptive capability of the hidden activation function is achieved through the use of orthonormal Hermite polynomials. Experimental evidence gathered on the Wall Street Journal Nov92 task demonstrates the viability of the proposed technique.

S. M. Siniscalchi, J. Li, C.-H. Lee (2012). Hermitian-Based Hidden Activation Functions for Adaptation of Hybrid HMM/ANN Models. In Interspeech 2012 (pp. 2590-2593). ISCA-INST SPEECH COMMUNICATION ASSOC, C/O EMMANUELLE FOXONET, 4 RUE DES FAUVETTES, LIEU DIT LOUS TOURILS [10.21437/Interspeech.2012-13].

Hermitian-Based Hidden Activation Functions for Adaptation of Hybrid HMM/ANN Models

S. M. Siniscalchi^{Primo

Investigation};J. Li^{Secondo

Membro del Collaboration Group};

2012-01-01

Abstract

This work is concerned with speaker adaptation techniques for artificial neural network (ANN) implemented as feed-forward multi-layer perceptrons (MLPs) in the context of large vocabulary continuous speech recognition (LVCSR). Most successful speaker adaptation techniques for MLPs consist of augmenting the neural architecture with a linear transformation network connected to either the input or the output layer. The weights of this additional linear layer are learned during the adaptation phase while all of the other weights are kept frozen in order to avoid over-fitting. In doing so, the structure of the speaker-dependent (SD) and speaker-independent (Si) architecture differs and the number of adaptation parameters depends upon the dimension of either the input or output layers. We propose an alternative neural architecture for speaker-adaptation to overcome the limits of current approaches. This neural architecture adopts hidden activation functions that can be learned directly from the adaptation data. This adaptive capability of the hidden activation function is achieved through the use of orthonormal Hermite polynomials. Experimental evidence gathered on the Wall Street Journal Nov92 task demonstrates the viability of the proposed technique.

Scheda breve

Scheda completa

Scheda completa (DC)

	Data
	
				2012
			
	ISBN della monografia 
DATO PREVISTO SU LOGINMIUR
	
				978-1-62276-759-5
			
	DOI del contributo 
DATO PREVISTO SU LOGINMIUR
	
				https://dx.doi.org/10.21437/Interspeech.2012-13
			
	URL dell'editore (Open access ove possibile)
	
				https://www.isca-archive.org/interspeech_2012/siniscalchi12_interspeech.html
			
	Citazione
	
				S. M. Siniscalchi,  J. Li,  C.-H. Lee (2012). Hermitian-Based Hidden Activation Functions for Adaptation of Hybrid HMM/ANN Models. In Interspeech 2012 (pp. 2590-2593). ISCA-INST SPEECH COMMUNICATION ASSOC, C/O EMMANUELLE FOXONET, 4 RUE DES FAUVETTES, LIEU DIT LOUS TOURILS [10.21437/Interspeech.2012-13].
			
	Appare nelle tipologie:
	
				2.07 Contributo in atti di convegno pubblicato in volume

File in questo prodotto:

File	Dimensione	Formato
siniscalchi12_interspeech-2.pdf Solo gestori archvio Descrizione: Il testo pieno dell’articolo è disponibile al seguente link: https://www.isca-archive.org/interspeech_2012/siniscalchi12_interspeech.html Tipologia: Versione Editoriale Dimensione 663.38 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	663.38 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/649517

Citazioni

ND

23

0

social impact