In this work, a novel acoustic characterization approach to multimedia event detection (MED) task for unconstrained and unstructured consumer-level videos through audio signal modeling is proposed. The key idea is to characterize the acoustic space of interest with a set of fundamental acoustic units around which a set of acoustic segment models (ASMs) is built. A vector space modeling technique to address MED is here adopted, where an incoming audio signal is first decoded into a sequence of acoustic segments. Then, a feature vector is generated by using co-occurrence statistics of acoustic units, and the MED final decision is implemented with a vector space language classifier. Experimental evidence on the TRECVID2011 MED demonstrates the viability of the proposed approach. Furthermore, it better accounts for temporal dependencies than previously proposed MFCC bag-of-word approaches.

B. Byun, I. Kim, S. M. Siniscalchi, C.-H. Lee (2012). Consumer-level multimedia event detection through unsupervised audio signal modeling. In INTERSPEECH 2012 (pp. 2081-2084). ISCA-INT SPEECH COMMUNICATION ASSOC, [10.21437/Interspeech.2012-555].

Consumer-level multimedia event detection through unsupervised audio signal modeling

S. M. Siniscalchi;
2012-01-01

Abstract

In this work, a novel acoustic characterization approach to multimedia event detection (MED) task for unconstrained and unstructured consumer-level videos through audio signal modeling is proposed. The key idea is to characterize the acoustic space of interest with a set of fundamental acoustic units around which a set of acoustic segment models (ASMs) is built. A vector space modeling technique to address MED is here adopted, where an incoming audio signal is first decoded into a sequence of acoustic segments. Then, a feature vector is generated by using co-occurrence statistics of acoustic units, and the MED final decision is implemented with a vector space language classifier. Experimental evidence on the TRECVID2011 MED demonstrates the viability of the proposed approach. Furthermore, it better accounts for temporal dependencies than previously proposed MFCC bag-of-word approaches.
2012
978-1-62276-759-5
B. Byun, I. Kim, S. M. Siniscalchi, C.-H. Lee (2012). Consumer-level multimedia event detection through unsupervised audio signal modeling. In INTERSPEECH 2012 (pp. 2081-2084). ISCA-INT SPEECH COMMUNICATION ASSOC, [10.21437/Interspeech.2012-555].
File in questo prodotto:
File Dimensione Formato  
byun12_interspeech.pdf

Solo gestori archvio

Descrizione: Il testo pieno dell’articolo è disponibile al seguente link: 10.21437/Interspeech.2012-555
Tipologia: Versione Editoriale
Dimensione 244.34 kB
Formato Adobe PDF
244.34 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/649503
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 13
  • ???jsp.display-item.citation.isi??? 0
social impact