Archivio istituzionale della ricerca dell'Università degli Studi di Palermo

Model adaptation techniques are an efficient way to reduce the mismatch that typically occurs between the training and test condition of any automatic speech recognition (ASR) system. This work addresses the problem of increased degradation in performance when moving from speaker-dependent (SD) to speaker-independent (SI) conditions for connectionist (or hybrid) hidden Markov model/artificial neural network (HMM/ANN) systems in the context of large vocabulary continuous speech recognition (LVCSR). Adapting hybrid HMM/ANN systems on a small amount of adaptation data has been proven to be a difficult task, and has been a limiting factor in the widespread deployment of hybrid techniques in operational ASR systems. Addressing the crucial issue of speaker adaptation (SA) for hybrid HMM/ANN system can thereby have a great impact on the connectionist paradigm, which will play a major role in the design of next-generation LVCSR considering the great success reported by deep neural networks - ANNs with many hidden layers that adopts the pre-training technique - on many speech tasks. Current adaptation techniques for ANNs based on injecting an adaptable linear transformation network connected to either the input, or the output layer are not effective especially with a small amount of adaptation data, e.g., a single adaptation utterance. In this paper, a novel solution is proposed to overcome those limits and make it robust to scarce adaptation resources. The key idea is to adapt the hidden activation functions rather than the network weights. The adoption of Hermitian activation functions makes this possible. Experimental results on an LVCSR task demonstrate the effectiveness of the proposed approach.

S. M. SINISCALCHI, J. Li, C. -H. Lee (2013). Hermitian Polynomial for Speaker Adaptation of Connectionist Speech Recognition Systems. IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, 21(10), 2152-2161 [10.1109/TASL.2013.2270370].

Hermitian Polynomial for Speaker Adaptation of Connectionist Speech Recognition Systems

S. M. SINISCALCHI^{Primo

Investigation};J. Li;C. -H. Lee

2013-01-01

Abstract

Model adaptation techniques are an efficient way to reduce the mismatch that typically occurs between the training and test condition of any automatic speech recognition (ASR) system. This work addresses the problem of increased degradation in performance when moving from speaker-dependent (SD) to speaker-independent (SI) conditions for connectionist (or hybrid) hidden Markov model/artificial neural network (HMM/ANN) systems in the context of large vocabulary continuous speech recognition (LVCSR). Adapting hybrid HMM/ANN systems on a small amount of adaptation data has been proven to be a difficult task, and has been a limiting factor in the widespread deployment of hybrid techniques in operational ASR systems. Addressing the crucial issue of speaker adaptation (SA) for hybrid HMM/ANN system can thereby have a great impact on the connectionist paradigm, which will play a major role in the design of next-generation LVCSR considering the great success reported by deep neural networks - ANNs with many hidden layers that adopts the pre-training technique - on many speech tasks. Current adaptation techniques for ANNs based on injecting an adaptable linear transformation network connected to either the input, or the output layer are not effective especially with a small amount of adaptation data, e.g., a single adaptation utterance. In this paper, a novel solution is proposed to overcome those limits and make it robust to scarce adaptation resources. The key idea is to adapt the hidden activation functions rather than the network weights. The adoption of Hermitian activation functions makes this possible. Experimental results on an LVCSR task demonstrate the effectiveness of the proposed approach.

Scheda breve

Scheda completa

Scheda completa (DC)

	Data
	
				2013
			
	Titolo del periodico 
DATO PREVISTO SU LOGINMIUR
	
				IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING
			
	DOI del contributo 
DATO PREVISTO SU LOGINMIUR
	
				https://dx.doi.org/10.1109/TASL.2013.2270370
			
	URL alternativo rispetto a quello dell'editore 
DATO PREVISTO SU LOGINMIUR
	
				http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6544616
			
	Citazione
	
				S. M. SINISCALCHI,  J. Li,  C. -H. Lee (2013). Hermitian Polynomial for Speaker Adaptation of Connectionist Speech Recognition Systems. IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, 21(10), 2152-2161 [10.1109/TASL.2013.2270370].
			
	Appare nelle tipologie:
	
				1.01 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
06544616_.pdf Solo gestori archvio Tipologia: Versione Editoriale Dimensione 952.22 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	952.22 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/649527

Citazioni

ND

69

60

social impact