Audio chord detection is the combination of two separate tasks: recognizing what chords are played and determining when chords are played. Most current audio chord detection algorithms use hidden Markov model (HMM) classifiers because of the task similarity with automatic speech recognition. For most speech recognition algorithms, the performance is measured by word error rate; i.e., only the identity of recognized segments is considered because word boundaries in continuous speech are often ambiguous. In contrast, audio chord detection performance is typically measured in terms of frame error rate, which considers both timing and classification. This paper treats these two tasks separately and focuses on the first problem; i.e., classifying the correct chords given boundary information. The best performing chroma/HMM chord detection algorithm, as measured in the 2008 MIREX Audio Chord Detection Contest, is used as the baseline in this paper. Further improvements are made to reduce feature correlation, account for differences in tuning, and incorporate minimum classification error (MCE) training in obtaining chord HMMs. Experiments demonstrate that classification rates can be improved with tuning compensation and MCE discriminative training.

J. REED, Y. UEDA, S. M. SINISCALCHI, U. YUCHI, S. SAGAYAMA, AND C-H LEE (2009). Minimum classification error training to improve isolated chord recognition. In ISMIR 10th International Society for Music Information Retrieval Conference (ISMIR 2009) (pp. 609-614).

Minimum classification error training to improve isolated chord recognition

S. M. SINISCALCHI
Investigation
;
2009-01-01

Abstract

Audio chord detection is the combination of two separate tasks: recognizing what chords are played and determining when chords are played. Most current audio chord detection algorithms use hidden Markov model (HMM) classifiers because of the task similarity with automatic speech recognition. For most speech recognition algorithms, the performance is measured by word error rate; i.e., only the identity of recognized segments is considered because word boundaries in continuous speech are often ambiguous. In contrast, audio chord detection performance is typically measured in terms of frame error rate, which considers both timing and classification. This paper treats these two tasks separately and focuses on the first problem; i.e., classifying the correct chords given boundary information. The best performing chroma/HMM chord detection algorithm, as measured in the 2008 MIREX Audio Chord Detection Contest, is used as the baseline in this paper. Further improvements are made to reduce feature correlation, account for differences in tuning, and incorporate minimum classification error (MCE) training in obtaining chord HMMs. Experiments demonstrate that classification rates can be improved with tuning compensation and MCE discriminative training.
2009
Settore IINF-05/A - Sistemi di elaborazione delle informazioni
J. REED, Y. UEDA, S. M. SINISCALCHI, U. YUCHI, S. SAGAYAMA, AND C-H LEE (2009). Minimum classification error training to improve isolated chord recognition. In ISMIR 10th International Society for Music Information Retrieval Conference (ISMIR 2009) (pp. 609-614).
File in questo prodotto:
File Dimensione Formato  
ISMIR2009.pdf

accesso aperto

Tipologia: Versione Editoriale
Dimensione 635.86 kB
Formato Adobe PDF
635.86 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/666343
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 11
  • ???jsp.display-item.citation.isi??? ND
social impact