Archivio istituzionale della ricerca dell'Università degli Studi di Palermo

Generation of high-precision sub-phonetic attribute (also known as phonological features) and phone lattices is a key frontend component for detection-based bottom-up speech recognition. In this paper we employ deep neural networks (DNNs) to improve detection accuracy over conventional shallow MLPs (multi-layer perceptrons) with one hidden layer. A range of DNN architectures with five to seven hidden layers and up to 2048 hidden units per layer have been explored. Training on the SI84 and testing on the Nov92 WSJ data, the proposed DNNs achieve significant improvements over the shallow MLPs, producing greater than 90% frame-level attribute estimation accuracies for all 21 attributes tested for the full system. On the phone detection task, we also obtain excellent frame-level accuracy of 86.6%. With this level of high-precision detection of basic speech units we have opened the door to a new family of flexible speech recognition system design for both top-down and bottom-up, lattice-based search strategies and knowledge integration.

Dong Yu, SINISCALCHI, S.M., Li Deng, Chin Hui Lee (2012). Boosting attribute and phone estimation accuracies with deep neural networks for detection-based speech recognition. In 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 4169-4172). NEW YORK : IEEE [10.1109/ICASSP.2012.6288837].

Boosting attribute and phone estimation accuracies with deep neural networks for detection-based speech recognition

Dong Yu;SINISCALCHI, SABATO MARCO;Li Deng;Chin Hui Lee

2012-01-01

Abstract

Generation of high-precision sub-phonetic attribute (also known as phonological features) and phone lattices is a key frontend component for detection-based bottom-up speech recognition. In this paper we employ deep neural networks (DNNs) to improve detection accuracy over conventional shallow MLPs (multi-layer perceptrons) with one hidden layer. A range of DNN architectures with five to seven hidden layers and up to 2048 hidden units per layer have been explored. Training on the SI84 and testing on the Nov92 WSJ data, the proposed DNNs achieve significant improvements over the shallow MLPs, producing greater than 90% frame-level attribute estimation accuracies for all 21 attributes tested for the full system. On the phone detection task, we also obtain excellent frame-level accuracy of 86.6%. With this level of high-precision detection of basic speech units we have opened the door to a new family of flexible speech recognition system design for both top-down and bottom-up, lattice-based search strategies and knowledge integration.

Scheda breve

Scheda completa

Scheda completa (DC)

	Data
	
				2012
			
	Settore scientifico disciplinare del contributo
	
				Settore IINF-05/A - Sistemi di elaborazione delle informazioni
			
	ISBN della monografia 
DATO PREVISTO SU LOGINMIUR
	
				978-1-4673-0046-9
			
	DOI del contributo 
DATO PREVISTO SU LOGINMIUR
	
				https://dx.doi.org/10.1109/ICASSP.2012.6288837
			
	URL alternativo rispetto a quello dell'editore 
DATO PREVISTO SU LOGINMIUR
	
				https://ieeexplore.ieee.org/document/6288837
			
	Citazione
	
				Dong Yu, SINISCALCHI, S.M.,  Li Deng,  Chin Hui Lee (2012). Boosting attribute and phone estimation accuracies with deep neural networks for detection-based speech recognition. In 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 4169-4172). NEW YORK : IEEE [10.1109/ICASSP.2012.6288837].
			
	Appare nelle tipologie:
	
				2.07 Contributo in atti di convegno pubblicato in volume

File in questo prodotto:

File	Dimensione	Formato
ICASSP2012.pdf Solo gestori archvio Tipologia: Versione Editoriale Dimensione 532.13 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	532.13 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/636653

Citazioni

ND

64

44

social impact