Archivio istituzionale della ricerca dell'Università degli Studi di Palermo

Distance functions are a fundamental ingredient of classification and clustering procedures, and this holds true also in the particular case of microarray data. In the general data mining and classification literature, functions such as Euclidean distance or Pearson correlation have gained their status of de facto standards thanks to a considerable amount of experimental validation. For microarray data, the issue of which distance function works best has been investigated, but no final conclusion has been reached. The aim of this extended abstract is to shed further light on that issue. Indeed, we present an experimental study, involving several distances, assessing (a) their intrinsic separation ability and (b) their predictive power when used in conjunction with clustering algorithms. The experiments have been carried out on six benchmark microarray datasets, where the gold solution is known for each of them. We have used both Hierarchical and K-means clustering algorithms and external validation criteria as evaluation tools. From the methodological point of view, the main result of this study is a ranking of those measures in terms of their intrinsic and clustering abilities, highlighting also the correlations between the two. Pragmatically, based on the outcomes of the experiments, one receives the indication that Minkowski, cosine and Pearson correlation distances seems to be the best choice when dealing with microarray data analysis.

Giancarlo, R., Lo Bosco G, Pinello L (2010). Distance Functions, Clustering Algorithms and Microarray Data Analysis. In C. Blum, R. Battiti (a cura di), Learning and Intelligent Optimization 4th International Conference, LION 4, Venice, Italy, January 18-22, 2010. Selected Papers (pp. 125-138). Heidelberg : Springer [10.1007/978-3-642-13800-3_10].

Distance Functions, Clustering Algorithms and Microarray Data Analysis

GIANCARLO, Raffaele;LO BOSCO, Giosue';PINELLO, Luca

2010-01-01

Abstract

Distance functions are a fundamental ingredient of classification and clustering procedures, and this holds true also in the particular case of microarray data. In the general data mining and classification literature, functions such as Euclidean distance or Pearson correlation have gained their status of de facto standards thanks to a considerable amount of experimental validation. For microarray data, the issue of which distance function works best has been investigated, but no final conclusion has been reached. The aim of this extended abstract is to shed further light on that issue. Indeed, we present an experimental study, involving several distances, assessing (a) their intrinsic separation ability and (b) their predictive power when used in conjunction with clustering algorithms. The experiments have been carried out on six benchmark microarray datasets, where the gold solution is known for each of them. We have used both Hierarchical and K-means clustering algorithms and external validation criteria as evaluation tools. From the methodological point of view, the main result of this study is a ranking of those measures in terms of their intrinsic and clustering abilities, highlighting also the correlations between the two. Pragmatically, based on the outcomes of the experiments, one receives the indication that Minkowski, cosine and Pearson correlation distances seems to be the best choice when dealing with microarray data analysis.

Scheda breve

Scheda completa

Scheda completa (DC)

	Data
	
				2010
			
	ISBN della monografia 
DATO PREVISTO SU LOGINMIUR
	
				978-3-642-13799-0
978-3-642-13800-3
			
	DOI del contributo 
DATO PREVISTO SU LOGINMIUR
	
				https://dx.doi.org/10.1007/978-3-642-13800-3_10
			
	Citazione
	
				Giancarlo, R.,  Lo Bosco G,  Pinello L (2010). Distance Functions, Clustering Algorithms and Microarray Data Analysis. In C. Blum, R. Battiti (a cura di), Learning and Intelligent Optimization  4th International Conference, LION 4, Venice, Italy, January 18-22, 2010. Selected Papers (pp. 125-138). Heidelberg : Springer [10.1007/978-3-642-13800-3_10].
			
	Appare nelle tipologie:
	
				2.07 Contributo in atti di convegno pubblicato in volume

File in questo prodotto:

File	Dimensione	Formato
Giancarlo et al. - 2010 - Distance Functions , Clustering Algorithms and Microarray Data Analysis.pdf Solo gestori archvio Tipologia: Versione Editoriale Dimensione 212.29 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	212.29 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/58466

Citazioni

ND

30

23

social impact