In this paper we introduce a new alignment-free method for comparing sequences which is combinatorial by nature and does not use any compressor nor any information-theoretic notion. Such a method is based on an extension of the Burrows-Wheeler Transform, a transformation widely used in the context of Data Compression. The new extended transformation takes as input a multiset of sequences and produces as output a string obtained by a suitable rearrangement of the characters of all the input sequences. By using such a transformation we give a general method for comparing sequences that takes into account how much the characters coming from the different input sequences are mixed in the output string. Such a method is tested on a real data set for the whole mitochondrial genome phylogeny problem. However, the goal of this paper is to introduce a new and general methodology for automatic categorization of sequences.
Mantaci, S., Restivo, A., Rosone, G., & Sciortino, M. (2008). A New Combinatorial Approach to Sequence Comparison. THEORY OF COMPUTING SYSTEMS, 42(3), 411-429 [10.1007/s00224-007-9078-6].
Data di pubblicazione: | 2008 | |
Titolo: | A New Combinatorial Approach to Sequence Comparison | |
Autori: | ||
Citazione: | Mantaci, S., Restivo, A., Rosone, G., & Sciortino, M. (2008). A New Combinatorial Approach to Sequence Comparison. THEORY OF COMPUTING SYSTEMS, 42(3), 411-429 [10.1007/s00224-007-9078-6]. | |
Rivista: | ||
Digital Object Identifier (DOI): | http://dx.doi.org/10.1007/s00224-007-9078-6 | |
Abstract: | In this paper we introduce a new alignment-free method for comparing sequences which is combinatorial by nature and does not use any compressor nor any information-theoretic notion. Such a method is based on an extension of the Burrows-Wheeler Transform, a transformation widely used in the context of Data Compression. The new extended transformation takes as input a multiset of sequences and produces as output a string obtained by a suitable rearrangement of the characters of all the input sequences. By using such a transformation we give a general method for comparing sequences that takes into account how much the characters coming from the different input sequences are mixed in the output string. Such a method is tested on a real data set for the whole mitochondrial genome phylogeny problem. However, the goal of this paper is to introduce a new and general methodology for automatic categorization of sequences. | |
Appare nelle tipologie: | 1.01 Articolo in rivista |
File in questo prodotto:
File | Descrizione | Tipologia | Licenza | |
---|---|---|---|---|
MRRS_TofCS_2008.pdf | N/A | Administrator Richiedi una copia |