The degree of predictability of a sequence can be measured by its entropy and it is closely related to its repetitiveness and compressibility. Entropic profiles are useful tools to study the under- and over-representation of subsequences, providing also information about the scale of each conserved DNA region. On the other hand, compact classes of repetitive motifs, such as maximal motifs, have been proved to be useful for the identification of significant repetitions and for the compression of biological sequences. In this paper we show that there is a relationship between entropic profiles and maximal motifs, and in particular we prove that the former are a subset of the latter. As a further contribution we propose a novel linear time linear space algorithm to compute the function Entropic Profile introduced by Vinga and Almeida in [18], and we present some preliminary results on real data, showing the speed up of our approach with respect to other existing techniques

Parida, L., Pizzi, C., Rombo, S.E. (2014). Entropic Profiles, Maximal Motifs and the Discovery of Significant Repetitions in Genomic Sequences. In D. Brown, B. Morgenstern (a cura di), Algorithms in Bioinformatics (pp. 148-160) [10.1007/978-3-662-44753-6_12].

Entropic Profiles, Maximal Motifs and the Discovery of Significant Repetitions in Genomic Sequences

ROMBO, Simona Ester
2014-01-01

Abstract

The degree of predictability of a sequence can be measured by its entropy and it is closely related to its repetitiveness and compressibility. Entropic profiles are useful tools to study the under- and over-representation of subsequences, providing also information about the scale of each conserved DNA region. On the other hand, compact classes of repetitive motifs, such as maximal motifs, have been proved to be useful for the identification of significant repetitions and for the compression of biological sequences. In this paper we show that there is a relationship between entropic profiles and maximal motifs, and in particular we prove that the former are a subset of the latter. As a further contribution we propose a novel linear time linear space algorithm to compute the function Entropic Profile introduced by Vinga and Almeida in [18], and we present some preliminary results on real data, showing the speed up of our approach with respect to other existing techniques
2014
Settore INF/01 - Informatica
978-3-662-44753-6
978-3-662-44752-9
Parida, L., Pizzi, C., Rombo, S.E. (2014). Entropic Profiles, Maximal Motifs and the Discovery of Significant Repetitions in Genomic Sequences. In D. Brown, B. Morgenstern (a cura di), Algorithms in Bioinformatics (pp. 148-160) [10.1007/978-3-662-44753-6_12].
File in questo prodotto:
File Dimensione Formato  
PPR_wabi_camera_ready.pdf

Solo gestori archvio

Tipologia: Versione Editoriale
Dimensione 144.27 kB
Formato Adobe PDF
144.27 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/102132
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 4
  • ???jsp.display-item.citation.isi??? 4
social impact