Clustering is one of the most well known activities in scientific investigation and the object of research in many disciplines, ranging from Statistics to Computer Science. In this beautiful area, one of the most difficult challenges is a particular instance of the model selection problem, i.e., the identification of the correct number of clusters in a dataset. In what follows, for ease of reference, we refer to that instance still as model selection. It is an important part of any statistical analysis. The techniques used for solving it are mainly either Bayesian or data-driven, and are both based on internal knowledge. That is, they use information obtained by processing the input data. Although both techniques have been evaluated in the realm of microarray data analysis, their merits (relative to each other) has not been assessed. Here we will fill this gap in the literature by comparing three Bayesians versus several state of the art data-driven model selection methods. Our results show that, although in some cases Bayesian methods guarantee good results, they are not able to compete in terms of ability to predict the correct number of clusters in a dataset with the data-driven methods

Giancarlo, R., Lo Bosco G, Utro F (2015). Bayesian versus data driven model selection for microarray data. NATURAL COMPUTING, 14(3), 393-402 [10.1007/s11047-014-9446-5].

Bayesian versus data driven model selection for microarray data

GIANCARLO, Raffaele;LO BOSCO, Giosue';
2015-01-01

Abstract

Clustering is one of the most well known activities in scientific investigation and the object of research in many disciplines, ranging from Statistics to Computer Science. In this beautiful area, one of the most difficult challenges is a particular instance of the model selection problem, i.e., the identification of the correct number of clusters in a dataset. In what follows, for ease of reference, we refer to that instance still as model selection. It is an important part of any statistical analysis. The techniques used for solving it are mainly either Bayesian or data-driven, and are both based on internal knowledge. That is, they use information obtained by processing the input data. Although both techniques have been evaluated in the realm of microarray data analysis, their merits (relative to each other) has not been assessed. Here we will fill this gap in the literature by comparing three Bayesians versus several state of the art data-driven model selection methods. Our results show that, although in some cases Bayesian methods guarantee good results, they are not able to compete in terms of ability to predict the correct number of clusters in a dataset with the data-driven methods
2015
Settore INF/01 - Informatica
Giancarlo, R., Lo Bosco G, Utro F (2015). Bayesian versus data driven model selection for microarray data. NATURAL COMPUTING, 14(3), 393-402 [10.1007/s11047-014-9446-5].
File in questo prodotto:
File Dimensione Formato  
Bayesian_versus_data_driven_model_selection_for_microarray_data.pdf

Solo gestori archvio

Tipologia: Versione Editoriale
Dimensione 556.92 kB
Formato Adobe PDF
556.92 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
post_print_Bayesian_versus_data_driven_model_selection_for_microarray_data .pdf

accesso aperto

Tipologia: Post-print
Dimensione 529.79 kB
Formato Adobe PDF
529.79 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/96557
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 3
  • ???jsp.display-item.citation.isi??? 2
social impact