Archivio istituzionale della ricerca dell'Università degli Studi di Palermo

Background: Clustering is a key step in the analysis of gene expression data, and in fact, many classical clustering algorithms are used, or more innovative ones have been designed and validated for the task. Despite the widespread use of artificial intelligence techniques in bioinformatics and, more generally, data analysis, there are very few clustering algorithms based on the genetic paradigm, yet that paradigm has great potential in finding good heuristic solutions to a difficult optimization problem such as clustering. Results: GenClust is a new genetic algorithm for clustering gene expression data. It has two key features: (a) a novel coding of the search space that is simple, compact and easy to update; (b) it can be used naturally in conjunction with data driven internal validation methods. We have experimented with the FOM methodology, specifically conceived for validating clusters of gene expression data. The validity of GenClust has been assessed experimentally on real data sets, both with the use of validation measures and in comparison with other algorithms, i.e., Average Link, Cast, Click and K-means. Conclusion: Experiments show that none of the algorithms we have used is markedly superior to the others across data sets and validation measures; i.e., in many cases the observed differences between the worst and best performing algorithm may be statistically insignificant and they could be considered equivalent. However, there are cases in which an algorithm may be better than others and therefore worthwhile. In particular, experiments for GenClust show that, although simple in its data representation, it converges very rapidly to a local optimum and that its ability to identify meaningful clusters is comparable, and sometimes superior, to that of more sophisticated algorithms. In addition, it is well suited for use in conjunction with data driven internal validation measures and, in particular, the FOM methodology.

DI GESU' V, GIANCARLO R, LO BOSCO G, RAIMONDI A, SCATURRO D (2005). GenClust: A Genetic Algorithm for Clustering Gene Expression Data. BMC BIOINFORMATICS, 6 (289), 1-21 [10.1186/1471-2105-6-289].

GenClust: A Genetic Algorithm for Clustering Gene Expression Data

DI GESU', Vito;GIANCARLO, Raffaele;LO BOSCO, Giosue';RAIMONDI A;SCATURRO D.

2005-01-01

Abstract

Background: Clustering is a key step in the analysis of gene expression data, and in fact, many classical clustering algorithms are used, or more innovative ones have been designed and validated for the task. Despite the widespread use of artificial intelligence techniques in bioinformatics and, more generally, data analysis, there are very few clustering algorithms based on the genetic paradigm, yet that paradigm has great potential in finding good heuristic solutions to a difficult optimization problem such as clustering. Results: GenClust is a new genetic algorithm for clustering gene expression data. It has two key features: (a) a novel coding of the search space that is simple, compact and easy to update; (b) it can be used naturally in conjunction with data driven internal validation methods. We have experimented with the FOM methodology, specifically conceived for validating clusters of gene expression data. The validity of GenClust has been assessed experimentally on real data sets, both with the use of validation measures and in comparison with other algorithms, i.e., Average Link, Cast, Click and K-means. Conclusion: Experiments show that none of the algorithms we have used is markedly superior to the others across data sets and validation measures; i.e., in many cases the observed differences between the worst and best performing algorithm may be statistically insignificant and they could be considered equivalent. However, there are cases in which an algorithm may be better than others and therefore worthwhile. In particular, experiments for GenClust show that, although simple in its data representation, it converges very rapidly to a local optimum and that its ability to identify meaningful clusters is comparable, and sometimes superior, to that of more sophisticated algorithms. In addition, it is well suited for use in conjunction with data driven internal validation measures and, in particular, the FOM methodology.

Scheda breve

Scheda completa

Scheda completa (DC)

	Data
	
				2005
			
	Titolo del periodico 
DATO PREVISTO SU LOGINMIUR
	
				BMC BIOINFORMATICS
			
	DOI del contributo 
DATO PREVISTO SU LOGINMIUR
	
				https://dx.doi.org/10.1186/1471-2105-6-289
			
	Citazione
	
				DI GESU' V,  GIANCARLO R,  LO BOSCO G,  RAIMONDI A,  SCATURRO D (2005). GenClust: A Genetic Algorithm for Clustering Gene Expression Data. BMC BIOINFORMATICS, 6 (289), 1-21 [10.1186/1471-2105-6-289].
			
	Appare nelle tipologie:
	
				1.01 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
Di Gesú et al. - 2005 - GenClust a genetic algorithm for clustering gene expression data.pdf accesso aperto Dimensione 1.08 MB Formato Adobe PDF Visualizza/Apri	1.08 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/4046

Citazioni

ND

43

32

social impact