Archivio istituzionale della ricerca dell'Università degli Studi di Palermo

In many inference and learning tasks, collecting large amounts of labeled training data is time consuming and expensive, and oftentimes impractical. Thus, being able to efficiently use small amounts of labeled data with an abundance of unlabeled data-the topic of semi-supervised learning (SSL) [1]-has garnered much attention. In this paper, we look at the problem of choosing these small amounts of labeled data, the first step in a bootstrapping paradigm. Contrary to traditional active learning where an initial trained model is employed to select the unlabeled data points which would be most informative if labeled, our selection has to be done in an unsupervised way, as we do not even have labeled data to train an initial model. We propose using unsupervised clustering algorithms, in particular integrated sensing and processing decision trees (ISPDTs) [2], to select small amounts of data to label and subsequently use in SSL (e.g. transductive SVMs). In a language identification task on the CallFriend1 and 2003 NIST Language Recognition Evaluation corpora [3], we demonstrate that the proposed method results in significantly improved performance over random selection of equivalently sized training data.

Shuai Huang, Damianos Karakos, Glen A. Coppersmith, Kenneth W. Church, SINISCALCHI, S.M. (2011). Bootstrapping a spoken language identification system using unsupervised integrated sensing and processing decision trees. In ASRU (pp. 342-347) [10.1109/ASRU.2011.6163955].

Bootstrapping a spoken language identification system using unsupervised integrated sensing and processing decision trees

Shuai Huang;Damianos Karakos;Glen A. Coppersmith;Kenneth W. Church;SINISCALCHI, SABATO MARCO

2011-01-01

Abstract

In many inference and learning tasks, collecting large amounts of labeled training data is time consuming and expensive, and oftentimes impractical. Thus, being able to efficiently use small amounts of labeled data with an abundance of unlabeled data-the topic of semi-supervised learning (SSL) [1]-has garnered much attention. In this paper, we look at the problem of choosing these small amounts of labeled data, the first step in a bootstrapping paradigm. Contrary to traditional active learning where an initial trained model is employed to select the unlabeled data points which would be most informative if labeled, our selection has to be done in an unsupervised way, as we do not even have labeled data to train an initial model. We propose using unsupervised clustering algorithms, in particular integrated sensing and processing decision trees (ISPDTs) [2], to select small amounts of data to label and subsequently use in SSL (e.g. transductive SVMs). In a language identification task on the CallFriend1 and 2003 NIST Language Recognition Evaluation corpora [3], we demonstrate that the proposed method results in significantly improved performance over random selection of equivalently sized training data.

Scheda breve

Scheda completa

Scheda completa (DC)

	Data
	
				2011
			
	ISBN della monografia 
DATO PREVISTO SU LOGINMIUR
	
				9781467303651
			
	DOI del contributo 
DATO PREVISTO SU LOGINMIUR
	
				https://dx.doi.org/10.1109/ASRU.2011.6163955
			
	URL alternativo rispetto a quello dell'editore 
DATO PREVISTO SU LOGINMIUR
	
				http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6163955
			
	Citazione
	
				Shuai Huang,  Damianos Karakos,  Glen A. Coppersmith,  Kenneth W. Church, SINISCALCHI, S.M. (2011). Bootstrapping a spoken language identification system using unsupervised integrated sensing and processing decision trees. In ASRU (pp. 342-347) [10.1109/ASRU.2011.6163955].
			
	Appare nelle tipologie:
	
				2.07 Contributo in atti di convegno pubblicato in volume

File in questo prodotto:

File	Dimensione	Formato
06163955.pdf Solo gestori archvio Tipologia: Versione Editoriale Dimensione 205.6 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	205.6 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/664126

Citazioni

ND

2

ND

social impact