The choice of hidden non-linearity in a feed-forward multi-layer perceptron (MLP) architecture is crucial to obtain good generalization capability and better performance. Nonetheless, little attention has been paid to this aspect in the ASR field. In this work, we present some initial, yet promising, studies toward improving ASR performance by adopting hidden activation functions that can be automatically learned from the data and change shape during training. This adaptive capability is achieved through the use of orthonormal Hermite polynomials. The “adaptive” MLP is used in two neural architectures that generate phone posterior estimates, namely, a standalone configuration and a hierarchical structure. The posteriors are input to a hybrid phone recognition system with good results on the TIMIT corpus. A scheme for optimizing the contributions of high-accuracy neural architectures is also investigated, resulting in a relative improvement of ~9.0% over a non-optimized combination. Finally, initial experiments on the WSJ Nov92 task show that the proposed technique scales well up to large vocabulary continuous speech recognition (LVCSR) tasks.

SINISCALCHI, S.M., Torbjorn Svendsen, SORBELLO, F., Chin Hui Lee (2010). Experimental studies on continuous speech recognition using neural architectures with "adaptive" hidden activation functions. In ICASSP 2010 : IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 4882-4885). Piscataway : IEEE [10.1109/ICASSP.2010.5495120].

Experimental studies on continuous speech recognition using neural architectures with "adaptive" hidden activation functions

SINISCALCHI, SABATO MARCO
Primo
Investigation
;
SORBELLO, FILIPPO;
2010-01-01

Abstract

The choice of hidden non-linearity in a feed-forward multi-layer perceptron (MLP) architecture is crucial to obtain good generalization capability and better performance. Nonetheless, little attention has been paid to this aspect in the ASR field. In this work, we present some initial, yet promising, studies toward improving ASR performance by adopting hidden activation functions that can be automatically learned from the data and change shape during training. This adaptive capability is achieved through the use of orthonormal Hermite polynomials. The “adaptive” MLP is used in two neural architectures that generate phone posterior estimates, namely, a standalone configuration and a hierarchical structure. The posteriors are input to a hybrid phone recognition system with good results on the TIMIT corpus. A scheme for optimizing the contributions of high-accuracy neural architectures is also investigated, resulting in a relative improvement of ~9.0% over a non-optimized combination. Finally, initial experiments on the WSJ Nov92 task show that the proposed technique scales well up to large vocabulary continuous speech recognition (LVCSR) tasks.
2010
9781424442959
SINISCALCHI, S.M., Torbjorn Svendsen, SORBELLO, F., Chin Hui Lee (2010). Experimental studies on continuous speech recognition using neural architectures with "adaptive" hidden activation functions. In ICASSP 2010 : IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 4882-4885). Piscataway : IEEE [10.1109/ICASSP.2010.5495120].
File in questo prodotto:
File Dimensione Formato  
ICASSP_2010.pdf

Solo gestori archvio

Tipologia: Versione Editoriale
Dimensione 189.15 kB
Formato Adobe PDF
189.15 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/664127
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 6
  • ???jsp.display-item.citation.isi??? 4
social impact