Archivio istituzionale della ricerca dell'Università degli Studi di Palermo

The challenge of articulatory inversion is to determine the tem- poral movement of the articulators from the speech waveform, or from acoustic-phonetic knowledge, e.g. derived from infor- mation about the linguistic content of the utterance. The actual position of the articulators is typically obtained from measured data, in our case position measurements obtained using EMA (Electromagnetic articulography). In this paper, we investigate the impact on articulatory inversion problem by using features derived from the acoustic waveform relative to using linguis- tic features related to the time aligned phone sequence of the utterance. Filterbank energies (FBE) are used as acoustic fea- tures, while phoneme identities and (binary) phonetic attributes are used as linguistic features. Experiments are performed on a speech corpus with synchronously recorded EMA measure- ments and employing a bidirectional long short-term memory (BLSTM) that estimates the articulators’ position. Acoustic FBE features performed better for vowel sounds. Phonetic fea- tures attained better results for nasal and fricative sounds except for /h/. Further improvements were obtained by combining FBE and linguistic features, which led to an average relative RMSE reduction of 9.8%, and a 3% relative improvement of the Pearson correlation coefficient.

Shahrebabaki, A.S., Olfati, N., Imran, A.S., Siniscalchi, S.M., Svendsen, T. (2019). A Phonetic-Level Analysis of Different Input Features for Articulatory Inversion. In Interspeech 2019 (pp. 3775-3779). International Speech Communication Association [10.21437/Interspeech.2019-2526].

A Phonetic-Level Analysis of Different Input Features for Articulatory Inversion

Shahrebabaki, Abdolreza Sabzi;Olfati, Negar;Imran, Ali Shariq;Siniscalchi, Sabato Marco;Svendsen, Torbjørn

2019-01-01

Abstract

The challenge of articulatory inversion is to determine the tem- poral movement of the articulators from the speech waveform, or from acoustic-phonetic knowledge, e.g. derived from infor- mation about the linguistic content of the utterance. The actual position of the articulators is typically obtained from measured data, in our case position measurements obtained using EMA (Electromagnetic articulography). In this paper, we investigate the impact on articulatory inversion problem by using features derived from the acoustic waveform relative to using linguis- tic features related to the time aligned phone sequence of the utterance. Filterbank energies (FBE) are used as acoustic fea- tures, while phoneme identities and (binary) phonetic attributes are used as linguistic features. Experiments are performed on a speech corpus with synchronously recorded EMA measure- ments and employing a bidirectional long short-term memory (BLSTM) that estimates the articulators’ position. Acoustic FBE features performed better for vowel sounds. Phonetic fea- tures attained better results for nasal and fricative sounds except for /h/. Further improvements were obtained by combining FBE and linguistic features, which led to an average relative RMSE reduction of 9.8%, and a 3% relative improvement of the Pearson correlation coefficient.

Scheda breve

Scheda completa

Scheda completa (DC)

	Data
	
				2019
			
	DOI del contributo 
DATO PREVISTO SU LOGINMIUR
	
				https://dx.doi.org/10.21437/Interspeech.2019-2526
			
	URL alternativo rispetto a quello dell'editore 
DATO PREVISTO SU LOGINMIUR
	
				https://www.isca-archive.org/interspeech_2019/shahrebabaki19_interspeech.html
			
	Citazione
	
				Shahrebabaki, A.S., Olfati, N., Imran, A.S., Siniscalchi, S.M., Svendsen, T. (2019). A Phonetic-Level Analysis of Different Input Features for Articulatory Inversion. In Interspeech 2019 (pp. 3775-3779). International Speech Communication Association [10.21437/Interspeech.2019-2526].
			
	Appare nelle tipologie:
	
				2.07 Contributo in atti di convegno pubblicato in volume

File in questo prodotto:

File	Dimensione	Formato
2526.pdf Solo gestori archvio Descrizione: Il testo pieno dell’articolo è disponibile al seguente link: https://www.isca-archive.org/interspeech_2019/shahrebabaki19_interspeech.html Tipologia: Versione Editoriale Dimensione 628.08 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	628.08 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/636677

Citazioni

ND

7

5

social impact