Archivio istituzionale della ricerca dell'Università degli Studi di Palermo

Building machines to converse with human beings through automatic speech recognition (ASR) and understanding (ASU) has long been a topic of great interest for scientists and engineers, and we have recently witnessed rapid technological advances in this area. Here, we first cast the ASR problem as a pattern-matching and channel-decoding paradigm. We then follow this with a discussion of the Hidden Markov Model (HMM), which is the most successful technique for modelling fundamental speech units, such as phones and words, in order to solve ASR as a search through a top-down decoding network. Recent advances using deep neural networks as parts of an ASR system are also highlighted. We then compare the conventional top-down decoding approach with the recently proposed automatic speech attribute transcription (ASAT) paradigm, which can better leverage knowledge sources in speech production, auditory perception and language theory through bottom-up integration. Finally we discuss how the processing-based speech engineering and knowledge-based speech science communities can work collaboratively to improve our understanding of speech and enhance ASR capabilities.

Siniscalchi S.M., Lee C.-H. (2021). Automatic Speech Recognition by Machines. In R. Knight, J. Setter (a cura di), The Cambridge Handbook of Phonetics (pp. 480-500). Cambridge University Press [10.1017/9781108644198.020].

Automatic Speech Recognition by Machines

Siniscalchi S. M.^{Primo

Writing – Original Draft Preparation};

2021-01-01

Abstract

Building machines to converse with human beings through automatic speech recognition (ASR) and understanding (ASU) has long been a topic of great interest for scientists and engineers, and we have recently witnessed rapid technological advances in this area. Here, we first cast the ASR problem as a pattern-matching and channel-decoding paradigm. We then follow this with a discussion of the Hidden Markov Model (HMM), which is the most successful technique for modelling fundamental speech units, such as phones and words, in order to solve ASR as a search through a top-down decoding network. Recent advances using deep neural networks as parts of an ASR system are also highlighted. We then compare the conventional top-down decoding approach with the recently proposed automatic speech attribute transcription (ASAT) paradigm, which can better leverage knowledge sources in speech production, auditory perception and language theory through bottom-up integration. Finally we discuss how the processing-based speech engineering and knowledge-based speech science communities can work collaboratively to improve our understanding of speech and enhance ASR capabilities.

Scheda breve

Scheda completa

Scheda completa (DC)

	Data
	
				2021
			
	DOI del contributo 
DATO PREVISTO SU LOGINMIUR
	
				https://dx.doi.org/10.1017/9781108644198.020
			
	Citazione
	
				Siniscalchi S.M.,  Lee C.-H. (2021). Automatic Speech Recognition by Machines. In R. Knight, J. Setter (a cura di), The Cambridge Handbook of Phonetics (pp. 480-500). Cambridge University Press [10.1017/9781108644198.020].
			
	Appare nelle tipologie:
	
				2.01 Capitolo o Saggio

File in questo prodotto:

File	Dimensione	Formato
3268409-capitolo.pdf Solo gestori archvio Tipologia: Versione Editoriale Dimensione 1.92 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	1.92 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/649494

Citazioni

ND

1

ND

social impact