The challenge of articulatory inversion is to determine the tem- poral movement of the articulators from the speech waveform, or from acoustic-phonetic knowledge, e.g. derived from infor- mation about the linguistic content of the utterance. The actual position of the articulators is typically obtained from measured data, in our case position measurements obtained using EMA (Electromagnetic articulography). In this paper, we investigate the impact on articulatory inversion problem by using features derived from the acoustic waveform relative to using linguis- tic features related to the time aligned phone sequence of the utterance. Filterbank energies (FBE) are used as acoustic fea- tures, while phoneme identities and (binary) phonetic attributes are used as linguistic features. Experiments are performed on a speech corpus with synchronously recorded EMA measure- ments and employing a bidirectional long short-term memory (BLSTM) that estimates the articulators’ position. Acoustic FBE features performed better for vowel sounds. Phonetic fea- tures attained better results for nasal and fricative sounds except for /h/. Further improvements were obtained by combining FBE and linguistic features, which led to an average relative RMSE reduction of 9.8%, and a 3% relative improvement of the Pearson correlation coefficient.

Shahrebabaki, A.S., Olfati, N., Imran, A.S., Siniscalchi, S.M., Svendsen, T. (2019). A Phonetic-Level Analysis of Different Input Features for Articulatory Inversion. In Interspeech 2019 (pp. 3775-3779). International Speech Communication Association [10.21437/Interspeech.2019-2526].

A Phonetic-Level Analysis of Different Input Features for Articulatory Inversion

Siniscalchi, Sabato Marco;
2019-01-01

Abstract

The challenge of articulatory inversion is to determine the tem- poral movement of the articulators from the speech waveform, or from acoustic-phonetic knowledge, e.g. derived from infor- mation about the linguistic content of the utterance. The actual position of the articulators is typically obtained from measured data, in our case position measurements obtained using EMA (Electromagnetic articulography). In this paper, we investigate the impact on articulatory inversion problem by using features derived from the acoustic waveform relative to using linguis- tic features related to the time aligned phone sequence of the utterance. Filterbank energies (FBE) are used as acoustic fea- tures, while phoneme identities and (binary) phonetic attributes are used as linguistic features. Experiments are performed on a speech corpus with synchronously recorded EMA measure- ments and employing a bidirectional long short-term memory (BLSTM) that estimates the articulators’ position. Acoustic FBE features performed better for vowel sounds. Phonetic fea- tures attained better results for nasal and fricative sounds except for /h/. Further improvements were obtained by combining FBE and linguistic features, which led to an average relative RMSE reduction of 9.8%, and a 3% relative improvement of the Pearson correlation coefficient.
2019
Settore ING-INF/05 - Sistemi Di Elaborazione Delle Informazioni
Shahrebabaki, A.S., Olfati, N., Imran, A.S., Siniscalchi, S.M., Svendsen, T. (2019). A Phonetic-Level Analysis of Different Input Features for Articulatory Inversion. In Interspeech 2019 (pp. 3775-3779). International Speech Communication Association [10.21437/Interspeech.2019-2526].
File in questo prodotto:
File Dimensione Formato  
2526.pdf

accesso aperto

Tipologia: Versione Editoriale
Dimensione 628.08 kB
Formato Adobe PDF
628.08 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/636677
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 7
  • ???jsp.display-item.citation.isi??? 5
social impact