Archivio istituzionale della ricerca dell'Università degli Studi di Palermo

In this paper, we utilize manner and place of articulation features and deep neural network models (DNNs) with long short-term memory (LSTM) to improve the detection performance of phonetic mispronunciations produced by second language learners. First, we show that speech attribute scores are complementary to conventional phone scores, so they can be concatenated as features to improve a baseline system based only on phone information. Next, pronunciation representation, usually calculated by frame-level averaging in a DNN, is now learned by LSTM, which directly uses sequential context information to embed a sequence of pronunciation scores into a pronunciation vector to improve the perfonnance of subsequent mispronunciation detectors. Finally, when both proposed techniques are incorporated into the baseline phone-based GOP (goodness of pronunciation) classifier system trained on the same data, the integrated system reduces the false acceptance rate (FAR) and false rejection rate (FRR) by 37.90% and 38.44% (relative), respectively, from the baseline system.

Li W., Chen N.F., Siniscalchi S.M., Lee C.H. (2017). Improving mispronunciation detection for non-native learners with multisource information and LSTM-based deep models. In Interspeech 2017 (pp. 2759-2763). C/O EMMANUELLE FOXONET, 4 RUE DES FAUVETTES, LIEU DIT LOUS TOURILS, BAIXAS, F-66390, FRANCE : ISCA-INT SPEECH COMMUNICATION ASSOC [10.21437/Interspeech.2017-464].

Improving mispronunciation detection for non-native learners with multisource information and LSTM-based deep models

Chen N. F.;Siniscalchi S. M.^Supervision;Lee C. H.

2017-01-01

Abstract

In this paper, we utilize manner and place of articulation features and deep neural network models (DNNs) with long short-term memory (LSTM) to improve the detection performance of phonetic mispronunciations produced by second language learners. First, we show that speech attribute scores are complementary to conventional phone scores, so they can be concatenated as features to improve a baseline system based only on phone information. Next, pronunciation representation, usually calculated by frame-level averaging in a DNN, is now learned by LSTM, which directly uses sequential context information to embed a sequence of pronunciation scores into a pronunciation vector to improve the perfonnance of subsequent mispronunciation detectors. Finally, when both proposed techniques are incorporated into the baseline phone-based GOP (goodness of pronunciation) classifier system trained on the same data, the integrated system reduces the false acceptance rate (FAR) and false rejection rate (FRR) by 37.90% and 38.44% (relative), respectively, from the baseline system.

Scheda breve

Scheda completa

Scheda completa (DC)

	Data
	
				2017
			
	DOI del contributo 
DATO PREVISTO SU LOGINMIUR
	
				https://dx.doi.org/10.21437/Interspeech.2017-464
			
	URL dell'editore (Open access ove possibile)
	
				https://www.isca-archive.org/interspeech_2017/li17k_interspeech.html
			
	Citazione
	
				Li W.,  Chen N.F.,  Siniscalchi S.M.,  Lee C.H. (2017). Improving mispronunciation detection for non-native learners with multisource information and LSTM-based deep models. In Interspeech 2017 (pp. 2759-2763). C/O EMMANUELLE FOXONET, 4 RUE DES FAUVETTES, LIEU DIT LOUS TOURILS, BAIXAS, F-66390, FRANCE : ISCA-INT SPEECH COMMUNICATION ASSOC [10.21437/Interspeech.2017-464].
			
	Appare nelle tipologie:
	
				2.07 Contributo in atti di convegno pubblicato in volume

File in questo prodotto:

File	Dimensione	Formato
li17k_interspeech.pdf Solo gestori archvio Descrizione: Il testo pieno dell’articolo è disponibile al seguente link: https://www.isca-archive.org/interspeech_2017/li17k_interspeech.html Tipologia: Versione Editoriale Dimensione 319.03 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	319.03 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/649495

Citazioni

ND

30

20

social impact