Motivation: Textual data compression, and the associated techniques coming from information theory, are often perceived as being of interest for data communication and storage. However, they are also deeply related to classification and data mining and analysis. In recent years, a substantial effort has been made for the application of textual data compression techniques to various computational biology tasks, ranging from storage and indexing of large datasets to comparison and reverse engineering of biological networks. Results: The main focus of this review is on a systematic presentation of the key areas of bioinformatics and computational biology where compression has been used. When possible, a unifying organization of the main ideas and techniques is also provided. Availability: It goes without saying that most of the research results reviewed here offer software prototypes to the bioinformatics community. The Supplementary Material provides pointers to software and benchmark datasets for a range of applications of broad interest. In addition to provide reference to software, the Supplementary Material also gives a brief presentation of some fundamental results and techniques related to this paper. It is at: http://www.math.unipa.it/~raffaele/suppMaterial/compReview/

Giancarlo, R., Scaturro, D., Utro, F. (2009). Textual Data Compression In Computational Biology: A synopsis. BIOINFORMATICS, 2009, 1575-1586 [10.1093/bioinformatics/btp117].

Textual Data Compression In Computational Biology: A synopsis

GIANCARLO, Raffaele;UTRO, Filippo
2009-01-01

Abstract

Motivation: Textual data compression, and the associated techniques coming from information theory, are often perceived as being of interest for data communication and storage. However, they are also deeply related to classification and data mining and analysis. In recent years, a substantial effort has been made for the application of textual data compression techniques to various computational biology tasks, ranging from storage and indexing of large datasets to comparison and reverse engineering of biological networks. Results: The main focus of this review is on a systematic presentation of the key areas of bioinformatics and computational biology where compression has been used. When possible, a unifying organization of the main ideas and techniques is also provided. Availability: It goes without saying that most of the research results reviewed here offer software prototypes to the bioinformatics community. The Supplementary Material provides pointers to software and benchmark datasets for a range of applications of broad interest. In addition to provide reference to software, the Supplementary Material also gives a brief presentation of some fundamental results and techniques related to this paper. It is at: http://www.math.unipa.it/~raffaele/suppMaterial/compReview/
2009
Giancarlo, R., Scaturro, D., Utro, F. (2009). Textual Data Compression In Computational Biology: A synopsis. BIOINFORMATICS, 2009, 1575-1586 [10.1093/bioinformatics/btp117].
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/48721
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 76
  • ???jsp.display-item.citation.isi??? 56
social impact