In this paper we introduce a new alignment-free method for comparing sequences which is combinatorial by nature and does not use any compressor nor any information-theoretic notion. Such a method is based on an extension of the Burrows-Wheeler Transform, a transformation widely used in the context of Data Compression. The new extended transformation takes as input a multiset of sequences and produces as output a string obtained by a suitable rearrangement of the characters of all the input sequences. By using such a transformation we define a measure to compare sequences that takes into account how the characters coming from different input sequences are mixed in the output string. Such a method is tested on a real data set for the whole mitochondrial genome phylogeny problem. However, the goal of this paper is to introduce a new and general methodology for automatic categorization of sequences

Mantaci, S., Restivo, A., Rosone, G., Sciortino, M. (2005). A New Combinatorial Approach to Sequence Comparison. In M. Coppo, E. Lodi, G.M. Pinna (a cura di), Theoretical Computer Science (pp. 348-359) [10.1007/11560586_28].

A New Combinatorial Approach to Sequence Comparison

MANTACI, Sabrina;RESTIVO, Antonio;ROSONE, Giovanna;SCIORTINO, Marinella
2005-01-01

Abstract

In this paper we introduce a new alignment-free method for comparing sequences which is combinatorial by nature and does not use any compressor nor any information-theoretic notion. Such a method is based on an extension of the Burrows-Wheeler Transform, a transformation widely used in the context of Data Compression. The new extended transformation takes as input a multiset of sequences and produces as output a string obtained by a suitable rearrangement of the characters of all the input sequences. By using such a transformation we define a measure to compare sequences that takes into account how the characters coming from different input sequences are mixed in the output string. Such a method is tested on a real data set for the whole mitochondrial genome phylogeny problem. However, the goal of this paper is to introduce a new and general methodology for automatic categorization of sequences
2005
Settore INF/01 - Informatica
Mantaci, S., Restivo, A., Rosone, G., Sciortino, M. (2005). A New Combinatorial Approach to Sequence Comparison. In M. Coppo, E. Lodi, G.M. Pinna (a cura di), Theoretical Computer Science (pp. 348-359) [10.1007/11560586_28].
File in questo prodotto:
File Dimensione Formato  
MRRS_LNCS_ICTCS_2005.pdf

Solo gestori archvio

Dimensione 420.59 kB
Formato Adobe PDF
420.59 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/47007
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 6
  • ???jsp.display-item.citation.isi??? 6
social impact