In this work, we are interested in boosting speech attribute detection by formulating it as a multi-label classification task, and deep neural networks (DNNs) are used to design speech attribute detectors. A straightforward way to tackle the speech attribute detection task is to estimate DNN parameters using the mean squared error (MSE) loss function and employ a sigmoid function in the DNN output nodes. A more principled way is nonetheless to incorporate the micro-F1 measure, which is a widely used metric in the multi-label classification, into the DNN loss function to directly improve the metric of interest at training time. Micro-F1 is not differentiable, yet we overcome such a problem by casting our task under the maximal figure-of-merit (MFoM) learning framework. The results demonstrate that our MFoM approach consistently outperforms the baseline systems.

Kukanov, I., Hautamäki, V., SINISCALCHI, S.M., Li, K. (2017). DEEP LEARNING WITH MAXIMAL FIGURE-OF-MERIT COST TO ADVANCE MULTI-LABEL SPEECH ATTRIBUTE DETECTION. In 2016 IEEE Spoken Language Technology Workshop (SLT) (pp. 489-495). IEEE [10.1109/SLT.2016.7846308].

DEEP LEARNING WITH MAXIMAL FIGURE-OF-MERIT COST TO ADVANCE MULTI-LABEL SPEECH ATTRIBUTE DETECTION

SINISCALCHI, SABATO MARCO;
2017-02-09

Abstract

In this work, we are interested in boosting speech attribute detection by formulating it as a multi-label classification task, and deep neural networks (DNNs) are used to design speech attribute detectors. A straightforward way to tackle the speech attribute detection task is to estimate DNN parameters using the mean squared error (MSE) loss function and employ a sigmoid function in the DNN output nodes. A more principled way is nonetheless to incorporate the micro-F1 measure, which is a widely used metric in the multi-label classification, into the DNN loss function to directly improve the metric of interest at training time. Micro-F1 is not differentiable, yet we overcome such a problem by casting our task under the maximal figure-of-merit (MFoM) learning framework. The results demonstrate that our MFoM approach consistently outperforms the baseline systems.
9-feb-2017
978-1-5090-4903-5
Kukanov, I., Hautamäki, V., SINISCALCHI, S.M., Li, K. (2017). DEEP LEARNING WITH MAXIMAL FIGURE-OF-MERIT COST TO ADVANCE MULTI-LABEL SPEECH ATTRIBUTE DETECTION. In 2016 IEEE Spoken Language Technology Workshop (SLT) (pp. 489-495). IEEE [10.1109/SLT.2016.7846308].
File in questo prodotto:
File Dimensione Formato  
07846308.pdf

Solo gestori archvio

Tipologia: Versione Editoriale
Dimensione 465.73 kB
Formato Adobe PDF
465.73 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/649501
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 6
  • ???jsp.display-item.citation.isi??? 5
social impact