Archivio istituzionale della ricerca dell'Università degli Studi di Palermo

Entropy, being closely related to repetitiveness and compressibility, is a widely used information-related measure to assess the degree of predictability of a sequence. Entropic profiles are based on information theory principles, and can be used to study the under-/over-representation of subwords, by also providing information about the scale of conserved DNA regions. Here, we focus on the algorithmic aspects related to entropic profiles. In particular, we propose linear time algorithms for their computation that rely on suffix-based data structures, more specifically on the truncated suffix tree (TST) and on the enhanced suffix array (ESA). We performed an extensive experimental campaign showing that our algorithms, beside being faster, make it possible the analysis of longer sequences, even for high degrees of resolution, than state of the art algorithms. © 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.

Cinzia Pizzi, M.O. (2018). Efficient algorithms for sequence analysis with entropic profiles. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 15(1), 117-128 [10.1109/TCBB.2016.2620143].

Efficient algorithms for sequence analysis with entropic profiles

Cinzia Pizzi;Mattia Ornamenti;Simone Spangaro;Simona Ester Rombo;Laxmi Parida

2018-01-01

Abstract

Entropy, being closely related to repetitiveness and compressibility, is a widely used information-related measure to assess the degree of predictability of a sequence. Entropic profiles are based on information theory principles, and can be used to study the under-/over-representation of subwords, by also providing information about the scale of conserved DNA regions. Here, we focus on the algorithmic aspects related to entropic profiles. In particular, we propose linear time algorithms for their computation that rely on suffix-based data structures, more specifically on the truncated suffix tree (TST) and on the enhanced suffix array (ESA). We performed an extensive experimental campaign showing that our algorithms, beside being faster, make it possible the analysis of longer sequences, even for high degrees of resolution, than state of the art algorithms. © 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.

Scheda breve

Scheda completa

Scheda completa (DC)

	Data
	
				2018
			
	Titolo del periodico 
DATO PREVISTO SU LOGINMIUR
	
				IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS
			
	DOI del contributo 
DATO PREVISTO SU LOGINMIUR
	
				https://dx.doi.org/10.1109/TCBB.2016.2620143
			
	Citazione
	
				Cinzia Pizzi, M.O. (2018). Efficient algorithms for sequence analysis with entropic profiles. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 15(1), 117-128 [10.1109/TCBB.2016.2620143].
			
	Appare nelle tipologie:
	
				1.01 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
tcbb2018.pdf Solo gestori archvio Dimensione 4.58 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	4.58 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/274877

Citazioni

ND

5

5

social impact