Archivio istituzionale della ricerca dell'Università degli Studi di Palermo

We propose a novel adversarial reprogramming (AR) approach for low-resource spoken command recognition (SCR) and build an AR-SCR system. The AR procedure aims at repurposing a pretrained SCR model (from the source domain) to modify the acoustic signals (from the target domain). To solve the label mismatches between source and target domains and further improve the stability of AR, we propose a novel similarity-based label mapping technique to align classes. In addition, the transfer learning (TL) technique is combined with the original AR process to improve the model adaptation capability. We evaluate the proposed AR-SCR system on three low-resource SCR datasets, including Arabic, Lithuanian, and dysarthric Mandarin speech. Experimental results show that with a pretrained acoustic model trained on a large-scale English dataset, the proposed AR-SCR system outperforms the current state-of-the-art results on Lithuanian and Arabic datasets, with only a limited amount of training data.

Yen H., Ku P.-J., Yang C.-H.H., Hu H., Siniscalchi S.M., Chen P.-Y., et al. (2023). Neural Model Reprogramming with Similarity Based Mapping for Low-Resource Spoken Command Recognition. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2023 (pp. 3317-3321). International Speech Communication Association [10.21437/Interspeech.2023-1086].

Neural Model Reprogramming with Similarity Based Mapping for Low-Resource Spoken Command Recognition

Ku P. -J.;Yang C. -H. H.;Hu H.;Siniscalchi S. M.^Supervision;Chen P. -Y.;Tsao Y.

2023-01-01

Abstract

We propose a novel adversarial reprogramming (AR) approach for low-resource spoken command recognition (SCR) and build an AR-SCR system. The AR procedure aims at repurposing a pretrained SCR model (from the source domain) to modify the acoustic signals (from the target domain). To solve the label mismatches between source and target domains and further improve the stability of AR, we propose a novel similarity-based label mapping technique to align classes. In addition, the transfer learning (TL) technique is combined with the original AR process to improve the model adaptation capability. We evaluate the proposed AR-SCR system on three low-resource SCR datasets, including Arabic, Lithuanian, and dysarthric Mandarin speech. Experimental results show that with a pretrained acoustic model trained on a large-scale English dataset, the proposed AR-SCR system outperforms the current state-of-the-art results on Lithuanian and Arabic datasets, with only a limited amount of training data.

Scheda breve

Scheda completa

Scheda completa (DC)

	Data
	
				2023
			
	DOI del contributo 
DATO PREVISTO SU LOGINMIUR
	
				https://dx.doi.org/10.21437/Interspeech.2023-1086
			
	URL dell'editore (Open access ove possibile)
	
				https://www.isca-archive.org/interspeech_2023/yen23_interspeech.html
			
	Citazione
	
				Yen H.,  Ku P.-J.,  Yang C.-H.H.,  Hu H.,  Siniscalchi S.M.,  Chen P.-Y., et al. (2023). Neural Model Reprogramming with Similarity Based Mapping for Low-Resource Spoken Command Recognition. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2023 (pp. 3317-3321). International Speech Communication Association [10.21437/Interspeech.2023-1086].
			
	Appare nelle tipologie:
	
				2.07 Contributo in atti di convegno pubblicato in volume

File in questo prodotto:

File	Dimensione	Formato
yen23_interspeech.pdf Solo gestori archvio Descrizione: Il testo pieno dell’articolo è disponibile al seguente link: https://www.isca-archive.org/interspeech_2023/yen23_interspeech.html Tipologia: Versione Editoriale Dimensione 1.02 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	1.02 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/637525

Citazioni

ND

6

2

social impact