In this paper, we investigate the effectiveness of articulatory information for Mandarin tone modeling and recognition in a deep neural network – hidden Markov model (DNN-HMM) framework. In conventional approaches, prosodic evidence (e.g., F0, duration and energy) is used to build tone classifiers, we here propose performance enhancement techniques in three areas: (i) adding articulatory features (AFs) and acoustic features, such as MFCCs (Mel frequency cepstrum coefficients), for tone modeling; (ii) adopting phone-dependent tone modeling; and (iii) using tone-based extended recognition network (ERN) to reduce the tone search space. The first approach is feature-related, it explicitly employs the AFs as a form of tonal features and is implemented through a multi-stage procedure. The second approach is model-related and directly extends to phone-dependent tone modeling so that each modeling unit (e.g., tonal phone) not only contains tone information, but also integrates the phone/articulatory information. Finally, the third technique is search-related with a phone-dependent tone-based expanding searching network. A series of comprehensive experiments is conducted using different input feature sets. It is demonstrated that (i) tone recognition accuracy is boosted by incorporating articulatory information, and (ii) ERN, attains the lowest tone error rate of 7.17%, with a 56% relative error reduction from the prosody-only baseline system error of 16.36%.

Lin, J.u., Li, W., Gao, Y., Xie, Y., Chen, N.F., Siniscalchi, S.M., et al. (2018). Improving Mandarin Tone Recognition Based on DNN by Combining Acoustic and Articulatory Features Using Extended Recognition Networks. JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL, IMAGE, AND VIDEO TECHNOLOGY, 90(7), 1077-1087 [10.1007/s11265-018-1334-2].

Improving Mandarin Tone Recognition Based on DNN by Combining Acoustic and Articulatory Features Using Extended Recognition Networks

Siniscalchi, Sabato Marco;
2018-02-08

Abstract

In this paper, we investigate the effectiveness of articulatory information for Mandarin tone modeling and recognition in a deep neural network – hidden Markov model (DNN-HMM) framework. In conventional approaches, prosodic evidence (e.g., F0, duration and energy) is used to build tone classifiers, we here propose performance enhancement techniques in three areas: (i) adding articulatory features (AFs) and acoustic features, such as MFCCs (Mel frequency cepstrum coefficients), for tone modeling; (ii) adopting phone-dependent tone modeling; and (iii) using tone-based extended recognition network (ERN) to reduce the tone search space. The first approach is feature-related, it explicitly employs the AFs as a form of tonal features and is implemented through a multi-stage procedure. The second approach is model-related and directly extends to phone-dependent tone modeling so that each modeling unit (e.g., tonal phone) not only contains tone information, but also integrates the phone/articulatory information. Finally, the third technique is search-related with a phone-dependent tone-based expanding searching network. A series of comprehensive experiments is conducted using different input feature sets. It is demonstrated that (i) tone recognition accuracy is boosted by incorporating articulatory information, and (ii) ERN, attains the lowest tone error rate of 7.17%, with a 56% relative error reduction from the prosody-only baseline system error of 16.36%.
8-feb-2018
Lin, J.u., Li, W., Gao, Y., Xie, Y., Chen, N.F., Siniscalchi, S.M., et al. (2018). Improving Mandarin Tone Recognition Based on DNN by Combining Acoustic and Articulatory Features Using Extended Recognition Networks. JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL, IMAGE, AND VIDEO TECHNOLOGY, 90(7), 1077-1087 [10.1007/s11265-018-1334-2].
File in questo prodotto:
File Dimensione Formato  
Lin2018_Article_ImprovingMandarinToneRecogniti.pdf

Solo gestori archvio

Tipologia: Versione Editoriale
Dimensione 1.21 MB
Formato Adobe PDF
1.21 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/649536
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 22
  • ???jsp.display-item.citation.isi??? 11
social impact