We propose the use of speech attributes, such as voicing and aspiration, to address two key research issues in computer assisted pronunciation training (CAPT) for L2 learners, namely detecting mispronunciation and providing diagnostic feedback. To improve the performance we focus on mispronunciations occurred at the segmental and sub-segmental levels. In this study, speech attributes scores are first used to measure the pronunciation quality at a sub-segmental level, such as manner and place of articulation. These speech attribute scores are integrated by neural network classifiers to generate segmental pronunciation scores. Compared with the conventional phone-based GOP (Goodness of Pronunciation) system we implement with our dataset, the proposed framework reduces the equal error rate by 8.78% relative. Moreover, it attains comparable results to phone-based classifier approach to mispronunciation detection while providing comprehensive feedback, including segmental and sub-segmental diagnostic information, to help L2 learners.

Li, W., SINISCALCHI, S.M., Chen, N.F., Lee, C.H. (2016). Improving non-native mispronunciation detection and enriching diagnostic feedback with DNN-based speech attribute modeling. In ICASSP (pp. 6135-6139). Institute of Electrical and Electronics Engineers Inc [10.1109/ICASSP.2016.7472856].

Improving non-native mispronunciation detection and enriching diagnostic feedback with DNN-based speech attribute modeling

SINISCALCHI, SABATO MARCO;
2016-01-01

Abstract

We propose the use of speech attributes, such as voicing and aspiration, to address two key research issues in computer assisted pronunciation training (CAPT) for L2 learners, namely detecting mispronunciation and providing diagnostic feedback. To improve the performance we focus on mispronunciations occurred at the segmental and sub-segmental levels. In this study, speech attributes scores are first used to measure the pronunciation quality at a sub-segmental level, such as manner and place of articulation. These speech attribute scores are integrated by neural network classifiers to generate segmental pronunciation scores. Compared with the conventional phone-based GOP (Goodness of Pronunciation) system we implement with our dataset, the proposed framework reduces the equal error rate by 8.78% relative. Moreover, it attains comparable results to phone-based classifier approach to mispronunciation detection while providing comprehensive feedback, including segmental and sub-segmental diagnostic information, to help L2 learners.
2016
978-1-4799-9988-0
Li, W., SINISCALCHI, S.M., Chen, N.F., Lee, C.H. (2016). Improving non-native mispronunciation detection and enriching diagnostic feedback with DNN-based speech attribute modeling. In ICASSP (pp. 6135-6139). Institute of Electrical and Electronics Engineers Inc [10.1109/ICASSP.2016.7472856].
File in questo prodotto:
File Dimensione Formato  
Improving_non-native_mispronunciation_detection_and_enriching_diagnostic_feedback_with_DNN-based_speech_attribute_modeling.pdf

Solo gestori archvio

Tipologia: Versione Editoriale
Dimensione 534.32 kB
Formato Adobe PDF
534.32 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/649576
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 74
  • ???jsp.display-item.citation.isi??? 46
social impact