Archivio istituzionale della ricerca dell'Università degli Studi di Palermo

Limited diversity in standardized benchmarks for evaluating audio representation learning (ARL) methods may hinder systematic comparison of current methods' capabilities. We present ARCH, a comprehensive benchmark for evaluating ARL methods on diverse audio classification domains, covering acoustic events, music, and speech. ARCH comprises 12 datasets, that allow us to thoroughly assess pre-trained SSL models of different sizes. ARCH streamlines benchmarking of ARL techniques through its unified access to a wide range of domains and its ability to readily incorporate new datasets and models. To address the current lack of open-source, pre-trained models for non-speech audio, we also release new pre-trained models that demonstrate strong performance on non-speech datasets. We argue that the presented wide-ranging evaluation provides valuable insights into state-of-the-art ARL methods, and is useful to pinpoint promising research directions.

La Quatra M., Koudounas A., Vaiani L., Baralis E., Cagliero L., Garza P., et al. (2024). Benchmarking Representations for Speech, Music, and Acoustic Events. In 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops, ICASSPW 2024 - Proceedings (pp. 505-509). Institute of Electrical and Electronics Engineers Inc. [10.1109/ICASSPW62465.2024.10625960].

Benchmarking Representations for Speech, Music, and Acoustic Events

La Quatra M.;Koudounas A.;Vaiani L.;Baralis E.;Cagliero L.;Garza P.;Siniscalchi S. M.

2024-01-01

Abstract

Limited diversity in standardized benchmarks for evaluating audio representation learning (ARL) methods may hinder systematic comparison of current methods' capabilities. We present ARCH, a comprehensive benchmark for evaluating ARL methods on diverse audio classification domains, covering acoustic events, music, and speech. ARCH comprises 12 datasets, that allow us to thoroughly assess pre-trained SSL models of different sizes. ARCH streamlines benchmarking of ARL techniques through its unified access to a wide range of domains and its ability to readily incorporate new datasets and models. To address the current lack of open-source, pre-trained models for non-speech audio, we also release new pre-trained models that demonstrate strong performance on non-speech datasets. We argue that the presented wide-ranging evaluation provides valuable insights into state-of-the-art ARL methods, and is useful to pinpoint promising research directions.

Scheda breve

Scheda completa

Scheda completa (DC)

	Data
	
				2024
			
	ISBN della monografia 
DATO PREVISTO SU LOGINMIUR
	
				979-8-3503-7451-3
			
	DOI del contributo 
DATO PREVISTO SU LOGINMIUR
	
				https://dx.doi.org/10.1109/ICASSPW62465.2024.10625960
			
	Citazione
	
				La Quatra M.,  Koudounas A.,  Vaiani L.,  Baralis E.,  Cagliero L.,  Garza P., et al. (2024). Benchmarking Representations for Speech, Music, and Acoustic Events. In 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops, ICASSPW 2024 - Proceedings (pp. 505-509). Institute of Electrical and Electronics Engineers Inc. [10.1109/ICASSPW62465.2024.10625960].
			
	Appare nelle tipologie:
	
				2.07 Contributo in atti di convegno pubblicato in volume

File in questo prodotto:

File	Dimensione	Formato
Benchmarking_Representations_for_Speech_Music_and_Acoustic_Events.pdf Solo gestori archvio Tipologia: Versione Editoriale Dimensione 844.26 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	844.26 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/663738

Citazioni

ND

18

7

social impact