Archivio istituzionale della ricerca dell'Università degli Studi di Palermo

Articulatory information has been argued to be useful for several speech tasks. However, in most practical scenarios this information is not readily available. We propose a novel transfer learning framework to obtain reliable articulatory information in such cases. We demonstrate its reliability both in terms of estimating parameters of speech production and its ability to enhance the accuracy of an end-to-end phone recognizer. Articulatory information is estimated from speaker independent phonemic features, using a small speech corpus, with electromagnetic articulography (EMA) measurements. Next, we employ a teacher-student model to learn estimation of articulatory features from acoustic features for the targeted phone recognition task. Phone recognition experiments, demonstrate that the proposed transfer learning approach outperforms the baseline transfer learning system acquired directly from an acoustic-to-articulatory (AAI) model. The articulatory features estimated by the proposed method, in conjunction with acoustic features, improved the phone error rate (PER) by 6.7% and 6% on the TIMIT core test and development sets, respectively, compared to standalone static acoustic features. Interestingly, this improvement is slightly higher than what is obtained by static+dynamic acoustic features, but with a significantly less. Adding articulatory features on top of static+dynamic acoustic features yields a small but positive PER improvement

Shahrebabaki, A.S., Olfati, N., Siniscalchi, S.M., Salvi, G., Svendsen, T. (2020). Transfer Learning of Articulatory Information Through Phone Information. In Proceedings of the Annual Conference of the International Speech Communication Association 2020 (pp. 2877-2881) [10.21437/Interspeech.2020-1139].

Transfer Learning of Articulatory Information Through Phone Information

Shahrebabaki, Abdolreza Sabzi;Olfati, Negar;Siniscalchi, Sabato Marco^{Writing – Original Draft Preparation};Salvi, Giampiero;Svendsen, Torbjørn

2020-01-01

Abstract

Articulatory information has been argued to be useful for several speech tasks. However, in most practical scenarios this information is not readily available. We propose a novel transfer learning framework to obtain reliable articulatory information in such cases. We demonstrate its reliability both in terms of estimating parameters of speech production and its ability to enhance the accuracy of an end-to-end phone recognizer. Articulatory information is estimated from speaker independent phonemic features, using a small speech corpus, with electromagnetic articulography (EMA) measurements. Next, we employ a teacher-student model to learn estimation of articulatory features from acoustic features for the targeted phone recognition task. Phone recognition experiments, demonstrate that the proposed transfer learning approach outperforms the baseline transfer learning system acquired directly from an acoustic-to-articulatory (AAI) model. The articulatory features estimated by the proposed method, in conjunction with acoustic features, improved the phone error rate (PER) by 6.7% and 6% on the TIMIT core test and development sets, respectively, compared to standalone static acoustic features. Interestingly, this improvement is slightly higher than what is obtained by static+dynamic acoustic features, but with a significantly less. Adding articulatory features on top of static+dynamic acoustic features yields a small but positive PER improvement

Scheda breve

Scheda completa

Scheda completa (DC)

	Data
	
				2020
			
	DOI del contributo 
DATO PREVISTO SU LOGINMIUR
	
				https://dx.doi.org/10.21437/Interspeech.2020-1139
			
	URL alternativo rispetto a quello dell'editore 
DATO PREVISTO SU LOGINMIUR
	
				https://www.isca-archive.org/interspeech_2020/shahrebabaki20_interspeech.html
			
	Citazione
	
				Shahrebabaki, A.S., Olfati, N., Siniscalchi, S.M., Salvi, G., Svendsen, T. (2020). Transfer Learning of Articulatory Information Through Phone Information. In Proceedings of the Annual Conference of the International Speech Communication Association 2020 (pp. 2877-2881) [10.21437/Interspeech.2020-1139].
			
	Appare nelle tipologie:
	
				2.07 Contributo in atti di convegno pubblicato in volume

File in questo prodotto:

File	Dimensione	Formato
1139.pdf Solo gestori archvio Descrizione: Il testo pieno dell’articolo è disponibile al seguente link: https://www.isca-archive.org/interspeech_2020/shahrebabaki20_interspeech.html Tipologia: Versione Editoriale Dimensione 635.8 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	635.8 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/636467

Citazioni

ND

3

3

social impact