Archivio istituzionale della ricerca dell'Università degli Studi di Palermo

Differential privacy (DP) is one data protection avenue to safeguard user information used for training deep models by imposing noisy distortion on privacy data. Such a noise perturbation often results in a severe performance degradation in automatic speech recognition (ASR) in order to meet a privacy budget ". Private aggregation of teacher ensemble (PATE) utilizes ensemble probabilities to improve ASR accuracy when dealing with the noise effects controlled by small values of ". We extend PATE learning to work with dynamic patterns, namely speech utterances, and perform a first experimental demonstration that it prevents acoustic data leakage in ASR training. We evaluate three end-to-end deep models, including LAS, hybrid CTC/attention, and RNN transducer, on the open-source LibriSpeech and TIMIT corpora. PATE learning-enhanced ASR models outperform the benchmark DP-SGD mechanisms, especially under strict DP budgets, giving relative word error rate reductions between 26.2% and 27.5% for an RNN transducer model evaluated with LibriSpeech. We also introduce a DP-preserving ASR solution for pretraining on public speech corpora.

Yang, C., Chen, I.F., Stolcke, A., Siniscalchi, S.M., Lee, C.H. (2022). AN EXPERIMENTAL STUDY ON PRIVATE AGGREGATION OF TEACHER ENSEMBLE LEARNING FOR END-TO-END SPEECH RECOGNITION. In 2022 IEEE Spoken Language Technology Workshop (pp. 1074-1080). IEEE [10.1109/SLT54892.2023.10023326].

AN EXPERIMENTAL STUDY ON PRIVATE AGGREGATION OF TEACHER ENSEMBLE LEARNING FOR END-TO-END SPEECH RECOGNITION

Chen, IF;Stolcke, A;Siniscalchi, SM^{Co-ultimo

Supervision};Lee, CH

2022-01-01

Abstract

Differential privacy (DP) is one data protection avenue to safeguard user information used for training deep models by imposing noisy distortion on privacy data. Such a noise perturbation often results in a severe performance degradation in automatic speech recognition (ASR) in order to meet a privacy budget ". Private aggregation of teacher ensemble (PATE) utilizes ensemble probabilities to improve ASR accuracy when dealing with the noise effects controlled by small values of ". We extend PATE learning to work with dynamic patterns, namely speech utterances, and perform a first experimental demonstration that it prevents acoustic data leakage in ASR training. We evaluate three end-to-end deep models, including LAS, hybrid CTC/attention, and RNN transducer, on the open-source LibriSpeech and TIMIT corpora. PATE learning-enhanced ASR models outperform the benchmark DP-SGD mechanisms, especially under strict DP budgets, giving relative word error rate reductions between 26.2% and 27.5% for an RNN transducer model evaluated with LibriSpeech. We also introduce a DP-preserving ASR solution for pretraining on public speech corpora.

Scheda breve

Scheda completa

Scheda completa (DC)

	Data
	
				2022
			
	ISBN della monografia 
DATO PREVISTO SU LOGINMIUR
	
				979-8-3503-9690-4
			
	DOI del contributo 
DATO PREVISTO SU LOGINMIUR
	
				https://dx.doi.org/10.1109/SLT54892.2023.10023326
			
	URL alternativo rispetto a quello dell'editore 
DATO PREVISTO SU LOGINMIUR
	
				https://ieeexplore-ieee-org.unikore.idm.oclc.org/document/10023326
			
	Citazione
	
				Yang, C., Chen, I.F., Stolcke, A., Siniscalchi, S.M., Lee, C.H. (2022). AN EXPERIMENTAL STUDY ON PRIVATE AGGREGATION OF TEACHER ENSEMBLE LEARNING FOR END-TO-END SPEECH RECOGNITION. In 2022 IEEE Spoken Language Technology Workshop (pp. 1074-1080). IEEE [10.1109/SLT54892.2023.10023326].
			
	Appare nelle tipologie:
	
				2.07 Contributo in atti di convegno pubblicato in volume

File in questo prodotto:

File	Dimensione	Formato
An_Experimental_Study_on_Private_Aggregation_of_Teacher_Ensemble_Learning_for_End-to-End_Speech_Recognition.pdf Solo gestori archvio Tipologia: Versione Editoriale Dimensione 442.25 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	442.25 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/636665

Citazioni

ND

1

1

social impact