We introduce a method to generate multivariate series of symbols from a finite alphabet with a given hierarchical structure of similarities based on the Hamming distance. The target hierarchical structure of similarities is arbitrary, for instance the one obtained by some hierarchical clustering method applied to an empirical matrix of similarities. The method that we present here is based on a generating mechanism that does not make use of mutation rate, which is widely used in phylogenetic analysis. Here we use the proposed simulation method to investigate the relationship between the bootstrap value associated with a node of a phylogeny and the probability of finding that node in the true phylogeny. The results of this analysis are compared with those obtained in the literature according to an evolutionary model with a per-symbol constant mutation rate. We observe that the relationship between the bootstrap value of a node and the probability of the corresponding clade being correct is sensitive to both the length of data series and the length of the branch connecting the node to its closest ancestor in the phylogenetic tree, whereas such a relationship is only slightly affected by the topology of the true phylogeny and by the absolute value of similarity.

Tumminello, M., Lillo, F., Mantegna, R.N. (2008). Generation of hierarchically correlated multivariate symbolic sequences: With an application to the assessment of bootstrap confidence in phylogenetic analysis. THE EUROPEAN PHYSICAL JOURNAL. B, CONDENSED MATTER PHYSICS, 2008, 333-340 [10.1140/ep jb/e2008-00225-7].

Generation of hierarchically correlated multivariate symbolic sequences: With an application to the assessment of bootstrap confidence in phylogenetic analysis.

TUMMINELLO, Michele;LILLO, Fabrizio;MANTEGNA, Rosario Nunzio
2008-01-01

Abstract

We introduce a method to generate multivariate series of symbols from a finite alphabet with a given hierarchical structure of similarities based on the Hamming distance. The target hierarchical structure of similarities is arbitrary, for instance the one obtained by some hierarchical clustering method applied to an empirical matrix of similarities. The method that we present here is based on a generating mechanism that does not make use of mutation rate, which is widely used in phylogenetic analysis. Here we use the proposed simulation method to investigate the relationship between the bootstrap value associated with a node of a phylogeny and the probability of finding that node in the true phylogeny. The results of this analysis are compared with those obtained in the literature according to an evolutionary model with a per-symbol constant mutation rate. We observe that the relationship between the bootstrap value of a node and the probability of the corresponding clade being correct is sensitive to both the length of data series and the length of the branch connecting the node to its closest ancestor in the phylogenetic tree, whereas such a relationship is only slightly affected by the topology of the true phylogeny and by the absolute value of similarity.
2008
Tumminello, M., Lillo, F., Mantegna, R.N. (2008). Generation of hierarchically correlated multivariate symbolic sequences: With an application to the assessment of bootstrap confidence in phylogenetic analysis. THE EUROPEAN PHYSICAL JOURNAL. B, CONDENSED MATTER PHYSICS, 2008, 333-340 [10.1140/ep jb/e2008-00225-7].
File in questo prodotto:
File Dimensione Formato  
epjb6.pdf

Solo gestori archvio

Dimensione 459.41 kB
Formato Adobe PDF
459.41 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/45660
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? 1
social impact