Archivio istituzionale della ricerca dell'Università degli Studi di Palermo

The choice of hidden non-linearity in a feed-forward multi-layer perceptron (MLP) architecture is crucial to obtain good generalization capability and better performance. Nonetheless, little attention has been paid to this aspect in the ASR field. In this work, we present some initial, yet promising, studies toward improving ASR performance by adopting hidden activation functions that can be automatically learned from the data and change shape during training. This adaptive capability is achieved through the use of orthonormal Hermite polynomials. The “adaptive” MLP is used in two neural architectures that generate phone posterior estimates, namely, a standalone configuration and a hierarchical structure. The posteriors are input to a hybrid phone recognition system with good results on the TIMIT corpus. A scheme for optimizing the contributions of high-accuracy neural architectures is also investigated, resulting in a relative improvement of ~9.0% over a non-optimized combination. Finally, initial experiments on the WSJ Nov92 task show that the proposed technique scales well up to large vocabulary continuous speech recognition (LVCSR) tasks.

SINISCALCHI, S.M., Torbjorn Svendsen, SORBELLO, F., Chin Hui Lee (2010). Experimental studies on continuous speech recognition using neural architectures with "adaptive" hidden activation functions. In ICASSP 2010 : IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 4882-4885). Piscataway : IEEE [10.1109/ICASSP.2010.5495120].

Experimental studies on continuous speech recognition using neural architectures with "adaptive" hidden activation functions

SINISCALCHI, SABATO MARCO^{Primo

Investigation};Torbjorn Svendsen;SORBELLO, FILIPPO;Chin Hui Lee

2010-01-01

Abstract

The choice of hidden non-linearity in a feed-forward multi-layer perceptron (MLP) architecture is crucial to obtain good generalization capability and better performance. Nonetheless, little attention has been paid to this aspect in the ASR field. In this work, we present some initial, yet promising, studies toward improving ASR performance by adopting hidden activation functions that can be automatically learned from the data and change shape during training. This adaptive capability is achieved through the use of orthonormal Hermite polynomials. The “adaptive” MLP is used in two neural architectures that generate phone posterior estimates, namely, a standalone configuration and a hierarchical structure. The posteriors are input to a hybrid phone recognition system with good results on the TIMIT corpus. A scheme for optimizing the contributions of high-accuracy neural architectures is also investigated, resulting in a relative improvement of ~9.0% over a non-optimized combination. Finally, initial experiments on the WSJ Nov92 task show that the proposed technique scales well up to large vocabulary continuous speech recognition (LVCSR) tasks.

Scheda breve

Scheda completa

Scheda completa (DC)

	Data
	
				2010
			
	ISBN della monografia 
DATO PREVISTO SU LOGINMIUR
	
				9781424442959
			
	DOI del contributo 
DATO PREVISTO SU LOGINMIUR
	
				https://dx.doi.org/10.1109/ICASSP.2010.5495120
			
	URL alternativo rispetto a quello dell'editore 
DATO PREVISTO SU LOGINMIUR
	
				http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5495120&tag=1
			
	Citazione
	
				SINISCALCHI, S.M.,  Torbjorn Svendsen, SORBELLO, F.,  Chin Hui Lee (2010). Experimental studies on continuous speech recognition using neural architectures with "adaptive" hidden activation functions. In ICASSP 2010 : IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 4882-4885). Piscataway : IEEE [10.1109/ICASSP.2010.5495120].
			
	Appare nelle tipologie:
	
				2.07 Contributo in atti di convegno pubblicato in volume

File in questo prodotto:

File	Dimensione	Formato
ICASSP_2010.pdf Solo gestori archvio Tipologia: Versione Editoriale Dimensione 189.15 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	189.15 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/664127

Citazioni

ND

6

4

social impact