Clustering is one of the most well known activities in scientific investigation and the object of research in many disciplines, ranging from Statistics to Computer Science. In this beautiful area, one of the most difficult challenges is a particular instance of the model selection problem, i.e., the identification of the correct number of clusters in a dataset. In what follows, for ease of reference, we refer to that instance still as model selection. It is an important part of any statistical analysis. The techniques used for solving it are mainly either Bayesian or data-driven, and are both based on internal knowledge. That is, they use information obtained by processing the input data. Although both techniques have been evaluated in the realm of microarray data analysis, their merits (relative to each other) has not been assessed. Here we will fill this gap in the literature by comparing three Bayesians versus several state of the art data-driven model selection methods. Our results show that, although in some cases Bayesian methods guarantee good results, they are not able to compete in terms of ability to predict the correct number of clusters in a dataset with the data-driven methods
Giancarlo, R., Lo Bosco G, Utro F (2015). Bayesian versus data driven model selection for microarray data. NATURAL COMPUTING, 14(3), 393-402 [10.1007/s11047-014-9446-5].
Bayesian versus data driven model selection for microarray data
GIANCARLO, Raffaele;LO BOSCO, Giosue';
2015-01-01
Abstract
Clustering is one of the most well known activities in scientific investigation and the object of research in many disciplines, ranging from Statistics to Computer Science. In this beautiful area, one of the most difficult challenges is a particular instance of the model selection problem, i.e., the identification of the correct number of clusters in a dataset. In what follows, for ease of reference, we refer to that instance still as model selection. It is an important part of any statistical analysis. The techniques used for solving it are mainly either Bayesian or data-driven, and are both based on internal knowledge. That is, they use information obtained by processing the input data. Although both techniques have been evaluated in the realm of microarray data analysis, their merits (relative to each other) has not been assessed. Here we will fill this gap in the literature by comparing three Bayesians versus several state of the art data-driven model selection methods. Our results show that, although in some cases Bayesian methods guarantee good results, they are not able to compete in terms of ability to predict the correct number of clusters in a dataset with the data-driven methodsFile | Dimensione | Formato | |
---|---|---|---|
Bayesian_versus_data_driven_model_selection_for_microarray_data.pdf
Solo gestori archvio
Tipologia:
Versione Editoriale
Dimensione
556.92 kB
Formato
Adobe PDF
|
556.92 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
post_print_Bayesian_versus_data_driven_model_selection_for_microarray_data .pdf
accesso aperto
Tipologia:
Post-print
Dimensione
529.79 kB
Formato
Adobe PDF
|
529.79 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.