Archivio istituzionale della ricerca dell'Università degli Studi di Palermo

The field of automatic speech recognition (ASR) has enjoyed more than 30 years of technology advances due to the extensive utilization of the hidden Markov model (HMM) framework and a concentrated effort by the speech community to make available a vast amount of speech and language resources, known today as the Big Data Paradigm. State-of-the-art ASR systems achieve a high recognition accuracy for well-formed utterances of a variety of languages by decoding speech into the most likely sequence of words among all possible sentences represented by a finite-state network (FSN) approximation of all the knowledge sources required by the ASR task. However, the ASR problem is still far from being solved because not all information available in the speech knowledge hierarchy can be directly integrated into the FSN to improve the ASR performance and enhance system robustness. It is believed that some of the current issues of integrating various knowledge sources in top-down integrated search can be partially addressed by processing techniques that take advantage of the full set of acoustic and language information in speech. It has long been postulated that human speech recognition (HSR) determines the linguistic identity of a sound based on detected evidence that exists at various levels of the speech knowledge hierarchy, ranging from acoustic phonetics to syntax and semantics. This calls for a bottom-up attribute detection and knowledge integration framework that links speech processing with information extraction, by spotting speech cues with a bank of attribute detectors, weighting and combining acoustic evidence to form cognitive hypotheses, and verifying these theories until a consistent recognition decision can be reached. The recently proposed automatic speech attribute transcription (ASAT) framework is an attempt to mimic some HSR capabilities with asynchronous speech event detection followed by bottom-up knowledge integration and verification. In the last few years, ASAT has demonstrated good potential and has been applied to a variety of existing applications in speech processing and information extraction.

C. -H. Lee, S. M. SINISCALCHI (2013). An Information-Extraction Approach to Speech Processing: Analysis, Detection, Verification, and Recognition. PROCEEDINGS OF THE IEEE, 101(5), 1089-1115 [10.1109/JPROC.2013.2238591].

An Information-Extraction Approach to Speech Processing: Analysis, Detection, Verification, and Recognition

S. M. SINISCALCHI^{Ultimo

Formal Analysis}

2013-01-01

Abstract

The field of automatic speech recognition (ASR) has enjoyed more than 30 years of technology advances due to the extensive utilization of the hidden Markov model (HMM) framework and a concentrated effort by the speech community to make available a vast amount of speech and language resources, known today as the Big Data Paradigm. State-of-the-art ASR systems achieve a high recognition accuracy for well-formed utterances of a variety of languages by decoding speech into the most likely sequence of words among all possible sentences represented by a finite-state network (FSN) approximation of all the knowledge sources required by the ASR task. However, the ASR problem is still far from being solved because not all information available in the speech knowledge hierarchy can be directly integrated into the FSN to improve the ASR performance and enhance system robustness. It is believed that some of the current issues of integrating various knowledge sources in top-down integrated search can be partially addressed by processing techniques that take advantage of the full set of acoustic and language information in speech. It has long been postulated that human speech recognition (HSR) determines the linguistic identity of a sound based on detected evidence that exists at various levels of the speech knowledge hierarchy, ranging from acoustic phonetics to syntax and semantics. This calls for a bottom-up attribute detection and knowledge integration framework that links speech processing with information extraction, by spotting speech cues with a bank of attribute detectors, weighting and combining acoustic evidence to form cognitive hypotheses, and verifying these theories until a consistent recognition decision can be reached. The recently proposed automatic speech attribute transcription (ASAT) framework is an attempt to mimic some HSR capabilities with asynchronous speech event detection followed by bottom-up knowledge integration and verification. In the last few years, ASAT has demonstrated good potential and has been applied to a variety of existing applications in speech processing and information extraction.

Scheda breve

Scheda completa

Scheda completa (DC)

	Data
	
				2013
			
	Titolo del periodico 
DATO PREVISTO SU LOGINMIUR
	
				PROCEEDINGS OF THE IEEE
			
	DOI del contributo 
DATO PREVISTO SU LOGINMIUR
	
				https://dx.doi.org/10.1109/JPROC.2013.2238591
			
	Citazione
	
				C. -H. Lee,  S. M. SINISCALCHI (2013). An Information-Extraction Approach to Speech Processing: Analysis, Detection, Verification, and Recognition. PROCEEDINGS OF THE IEEE, 101(5), 1089-1115 [10.1109/JPROC.2013.2238591].
			
	Appare nelle tipologie:
	
				1.01 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
_06457407.pdf Solo gestori archvio Tipologia: Versione Editoriale Dimensione 962.26 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	962.26 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/649524

Citazioni

ND

57

51

social impact