Archivio istituzionale della ricerca dell'Università degli Studi di Palermo

One way to represent a DNA sequence is to break it down into substrings of length L, called L-tuples, and count the occurence of each L-tuple in the sequence. This representation defines a mapping of a sequence into a numerical space by a numerical feature vector of fixed length, that allows to measure sequence similarity in an alignment free way simply using disssimilarity functions between vectors. This work presents a benchmark study of 4 alignment free disssimilarity functions between sequences, computed on their L-tuples representation, for the purpose of sequence classification. In our experiments, we have tested the classes of geometric-based, correlation-based and information-based dissimilarities, incorporating them into a nearest neighbor classifier. Results computed on three dataset of nucleosome forming and inhibiting sequences, shows that the geometric and correlation disssimilaritiess are more suitable for nucleosome classification. Finally, their use could be a valid alternative to the alignment-based similarity measures, which remains yet the preferred choice when dealing with sequence similarity problems

Lo Bosco, G., La Neve, D. (2015). Alignment free Dissimilarities for sequence classification. In C. Angelini, E. Bongcam-Rudloff, A. Decarli, P. Rancoita, S. Rovetta (a cura di), Computational Intelligence Methods for Bioinformatics and Biostatistics, CIBB 2015 (pp. 1-5). Department of Informatics, University of Salerno and Istituto per le Applicazioni del Calcolo “Mauro Picone” CNR.

Alignment free Dissimilarities for sequence classification

LO BOSCO, Giosue';La Neve, D.

2015-01-01

Abstract

One way to represent a DNA sequence is to break it down into substrings of length L, called L-tuples, and count the occurence of each L-tuple in the sequence. This representation defines a mapping of a sequence into a numerical space by a numerical feature vector of fixed length, that allows to measure sequence similarity in an alignment free way simply using disssimilarity functions between vectors. This work presents a benchmark study of 4 alignment free disssimilarity functions between sequences, computed on their L-tuples representation, for the purpose of sequence classification. In our experiments, we have tested the classes of geometric-based, correlation-based and information-based dissimilarities, incorporating them into a nearest neighbor classifier. Results computed on three dataset of nucleosome forming and inhibiting sequences, shows that the geometric and correlation disssimilaritiess are more suitable for nucleosome classification. Finally, their use could be a valid alternative to the alignment-based similarity measures, which remains yet the preferred choice when dealing with sequence similarity problems

Scheda breve

Scheda completa

Scheda completa (DC)

	Data
	
				2015
			
	ISBN della monografia 
DATO PREVISTO SU LOGINMIUR
	
				9788890643798
			
	Citazione
	
				Lo Bosco, G., La Neve, D. (2015). Alignment free Dissimilarities for sequence classification. In C. Angelini, E. Bongcam-Rudloff, A. Decarli, P. Rancoita, S. Rovetta (a cura di), Computational Intelligence Methods for Bioinformatics and Biostatistics, CIBB 2015 (pp. 1-5). Department of Informatics, University of Salerno and Istituto per le Applicazioni del Calcolo “Mauro Picone” CNR.
			
	Appare nelle tipologie:
	
				2.07 Contributo in atti di convegno pubblicato in volume

File in questo prodotto:

File	Dimensione	Formato
lo_bosco_cibb_2015.pdf Solo gestori archvio Tipologia: Versione Editoriale Dimensione 371.96 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	371.96 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/145443

Citazioni

ND

ND

ND

social impact