Archivio istituzionale della ricerca dell'Università degli Studi di Palermo

We develop an algorithm that is fast and scalable in the detection of a nested partition extracted from a dendrogram that is obtained from hierarchical clustering of a multivariate series. Our algorithm provides a -value for each clade observed in the hierarchical tree. The -value is obtained by computing many bootstrap replicas of the dissimilarity matrix and by performing a statistical test on each difference between the dissimilarity associated with a given clade and the dissimilarity of the clade of its parent node. We prove the efficacy of our algorithm with a set of benchmarks generated by a hierarchically nested factor model. We compare results obtained by our algorithm with those of Pvclust. Pvclust is a widely-used algorithm pursuing a global approach originally developed in the context of phylogenetic studies. In our numerical experiments, we focus on the role of multiple hypothesis test correction and the robustness of the algorithms to inaccuracies and errors of datasets. We verify that our algorithm is much faster than Pvclust algorithm and has a better scalability both in the number of elements and in the number of records of the investigated multivariate set. We also apply our algorithm to two empirical datasets, one related to a biological complex system and the other related to financial time-series. We prove that the clusters detected by our methodology are meaningful with respect to some consensus partitioning of the two datasets.

Bongiorno, C., Miccichè, S., Mantegna, R.N. (2022). Statistically validated hierarchical clustering: Nested partitions in hierarchical trees. PHYSICA. A, 593, 126933-1-126933-15 [10.1016/j.physa.2022.126933].

Statistically validated hierarchical clustering: Nested partitions in hierarchical trees

Miccichè, Salvatore^{Secondo

Membro del Collaboration Group};Mantegna, Rosario N.^{Ultimo

Membro del Collaboration Group}

2022-05-01

Abstract

We develop an algorithm that is fast and scalable in the detection of a nested partition extracted from a dendrogram that is obtained from hierarchical clustering of a multivariate series. Our algorithm provides a -value for each clade observed in the hierarchical tree. The -value is obtained by computing many bootstrap replicas of the dissimilarity matrix and by performing a statistical test on each difference between the dissimilarity associated with a given clade and the dissimilarity of the clade of its parent node. We prove the efficacy of our algorithm with a set of benchmarks generated by a hierarchically nested factor model. We compare results obtained by our algorithm with those of Pvclust. Pvclust is a widely-used algorithm pursuing a global approach originally developed in the context of phylogenetic studies. In our numerical experiments, we focus on the role of multiple hypothesis test correction and the robustness of the algorithms to inaccuracies and errors of datasets. We verify that our algorithm is much faster than Pvclust algorithm and has a better scalability both in the number of elements and in the number of records of the investigated multivariate set. We also apply our algorithm to two empirical datasets, one related to a biological complex system and the other related to financial time-series. We prove that the clusters detected by our methodology are meaningful with respect to some consensus partitioning of the two datasets.

Scheda breve

Scheda completa

Scheda completa (DC)

	Data
	
				mag-2022
			
	Settore scientifico disciplinare del contributo
	
				Settore PHYS-06/A - Fisica per le scienze della vita, l'ambiente e i beni culturali
			
	Titolo del periodico 
DATO PREVISTO SU LOGINMIUR
	
				PHYSICA. A
			
	DOI del contributo 
DATO PREVISTO SU LOGINMIUR
	
				https://dx.doi.org/10.1016/j.physa.2022.126933
			
	URL dell'editore (Open access ove possibile)
	
				https://www.sciencedirect.com/science/article/pii/S0378437122000498
			
	Citazione
	
				Bongiorno, C., Miccichè, S., Mantegna, R.N. (2022). Statistically validated hierarchical clustering: Nested partitions in hierarchical trees. PHYSICA. A, 593, 126933-1-126933-15 [10.1016/j.physa.2022.126933].
			
	Appare nelle tipologie:
	
				1.01 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
1-s2.0-S0378437122000498-main.pdf Solo gestori archvio Descrizione: articolo Tipologia: Versione Editoriale Dimensione 1.45 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	1.45 MB	Adobe PDF	Visualizza/Apri Richiedi una copia
1906.06908 (1).pdf accesso aperto Descrizione: articolo Tipologia: Pre-print Dimensione 1.61 MB Formato Adobe PDF Visualizza/Apri	1.61 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/536022

Citazioni

ND

8

6

social impact