Archivio istituzionale della ricerca dell'Università degli Studi di Palermo

This paper proposes a two-stage deep feed-forward neural network (DNN) to tackle the acoustic-to-articulatory inversion (AAI) problem. DNNs are a viable solution for the AAI task, but the temporal continuity of the estimated articulatory values has not been exploited properly when a DNN is employed. In this work, we propose to address the lack of any temporal constraints while enforcing a parameter-parsimonious solution by deploying a two-stage solution based only on DNNs: (i) Articulatory trajectories are estimated in a first stage using DNN, and (ii) a temporal window of the estimated trajectories is used in a follow-up DNN stage as a refinement. The first stage estimation could be thought of as an auxiliary additional information that poses some constraints on the inversion process. Experimental evidence demonstrates an average error reduction of 7.51% in terms of RMSE compared to the baseline, and an improvement of 2.39% with respect to Pearson correlation is also attained. Finally, we should point out that AAI is still a highly challenging problem, mainly due to the non-linearity of the acousticto-articulatory and one-to-many mapping. It is thus promising that a significant improvement was attained with our simple yet elegant solution.

Shahrebabaki, A.S., Olfati, N., Imran, A.S., Hallstein Johnsen, M., Siniscalchi, S.M., Svendsen, T. (2021). A Two-Stage Deep Modeling Approach to Articulatory Inversion. In ICASSP 2021 (pp. 6453-6457). IEEE [10.1109/ICASSP39728.2021.9413742].

A Two-Stage Deep Modeling Approach to Articulatory Inversion

Shahrebabaki, Abdolreza Sabzi;Olfati, Negar;Imran, Ali Shariq;Hallstein Johnsen, Magne;Siniscalchi, Sabato Marco;Svendsen, Torbjorn

2021-01-01

Abstract

This paper proposes a two-stage deep feed-forward neural network (DNN) to tackle the acoustic-to-articulatory inversion (AAI) problem. DNNs are a viable solution for the AAI task, but the temporal continuity of the estimated articulatory values has not been exploited properly when a DNN is employed. In this work, we propose to address the lack of any temporal constraints while enforcing a parameter-parsimonious solution by deploying a two-stage solution based only on DNNs: (i) Articulatory trajectories are estimated in a first stage using DNN, and (ii) a temporal window of the estimated trajectories is used in a follow-up DNN stage as a refinement. The first stage estimation could be thought of as an auxiliary additional information that poses some constraints on the inversion process. Experimental evidence demonstrates an average error reduction of 7.51% in terms of RMSE compared to the baseline, and an improvement of 2.39% with respect to Pearson correlation is also attained. Finally, we should point out that AAI is still a highly challenging problem, mainly due to the non-linearity of the acousticto-articulatory and one-to-many mapping. It is thus promising that a significant improvement was attained with our simple yet elegant solution.

Scheda breve

Scheda completa

Scheda completa (DC)

	Data
	
				2021
			
	ISBN della monografia 
DATO PREVISTO SU LOGINMIUR
	
				978-1-7281-7605-5
			
	DOI del contributo 
DATO PREVISTO SU LOGINMIUR
	
				https://dx.doi.org/10.1109/ICASSP39728.2021.9413742
			
	URL alternativo rispetto a quello dell'editore 
DATO PREVISTO SU LOGINMIUR
	
				https://ieeexplore.ieee.org/abstract/document/9413742
			
	Citazione
	
				Shahrebabaki, A.S., Olfati, N., Imran, A.S., Hallstein Johnsen, M., Siniscalchi, S.M., Svendsen, T. (2021). A Two-Stage Deep Modeling Approach to Articulatory Inversion. In ICASSP 2021 (pp. 6453-6457). IEEE [10.1109/ICASSP39728.2021.9413742].
			
	Appare nelle tipologie:
	
				2.07 Contributo in atti di convegno pubblicato in volume

File in questo prodotto:

File	Dimensione	Formato
09413742.pdf Solo gestori archvio Tipologia: Versione Editoriale Dimensione 2.04 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	2.04 MB	Adobe PDF	Visualizza/Apri Richiedi una copia
article.pdf accesso aperto Descrizione: Accepted manuscript version (PDF) Tipologia: Post-print Dimensione 328.15 kB Formato Adobe PDF Visualizza/Apri	328.15 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/636672

Citazioni

ND

1

0

social impact