We introduce a combinatorial optimization framework that naturally induces a class of optimal word permutations with respect to a suitably defined cost function taking into account various measures of relatedness between words. The Burrows and Wheeler transform (bwt) (cf. [M. Burrows, D. Wheeler, A block sorting lossless data compression algorithm, Technical Report 124, Digital Equipment Corporation, 1994]), and its analog for labelled trees (cf. [P. Ferragina, F. Luccio, G. Manzini, S. Muthukrishnan, Structuring labeled trees for optimal succinctness, and beyond, in: Proc. of the 45th Annual IEEE Symposium on Foundations of Computer Science, 2005, pp. 198–207]), are special cases in the class. We also show that the class of optimal word permutations defined here is identical to the one identified by Ferragina et al. for compression boosting [P. Ferragina, R. Giancarlo, G. Manzini, M. Sciortino, Boosting textual compression in optimal linear time, Journal of the ACM 52 (2005) 688–713]. Therefore, they are all highly compressible. We also provide, by using techniques from Combinatorics on Words, a fast method to compute bwt without using any end-of-string symbol. We also investigate more general classes of optimal word permutations, where relatedness of symbols may be measured by functions more complex than context length. For this general problem we provide an instance that is MAX-SNP hard, and therefore unlikely to be solved or approximated efficiently. The results presented here indicate that a key feature of the Burrows and Wheeler transform seems to be, besides compressibility, the existence of efficient algorithms for its computation and inversion.

GIANCARLO R, RESTIVO A, SCIORTINO M (2007). From First Principles to the Burrows and Wheeler Transform and Beyond, via Combinatorial Optimization. THEORETICAL COMPUTER SCIENCE, 387, 236-248 [10.1016/j.tcs.2007.07.019].

From First Principles to the Burrows and Wheeler Transform and Beyond, via Combinatorial Optimization

GIANCARLO, Raffaele;RESTIVO, Antonio;SCIORTINO, Marinella
2007-01-01

Abstract

We introduce a combinatorial optimization framework that naturally induces a class of optimal word permutations with respect to a suitably defined cost function taking into account various measures of relatedness between words. The Burrows and Wheeler transform (bwt) (cf. [M. Burrows, D. Wheeler, A block sorting lossless data compression algorithm, Technical Report 124, Digital Equipment Corporation, 1994]), and its analog for labelled trees (cf. [P. Ferragina, F. Luccio, G. Manzini, S. Muthukrishnan, Structuring labeled trees for optimal succinctness, and beyond, in: Proc. of the 45th Annual IEEE Symposium on Foundations of Computer Science, 2005, pp. 198–207]), are special cases in the class. We also show that the class of optimal word permutations defined here is identical to the one identified by Ferragina et al. for compression boosting [P. Ferragina, R. Giancarlo, G. Manzini, M. Sciortino, Boosting textual compression in optimal linear time, Journal of the ACM 52 (2005) 688–713]. Therefore, they are all highly compressible. We also provide, by using techniques from Combinatorics on Words, a fast method to compute bwt without using any end-of-string symbol. We also investigate more general classes of optimal word permutations, where relatedness of symbols may be measured by functions more complex than context length. For this general problem we provide an instance that is MAX-SNP hard, and therefore unlikely to be solved or approximated efficiently. The results presented here indicate that a key feature of the Burrows and Wheeler transform seems to be, besides compressibility, the existence of efficient algorithms for its computation and inversion.
2007
GIANCARLO R, RESTIVO A, SCIORTINO M (2007). From First Principles to the Burrows and Wheeler Transform and Beyond, via Combinatorial Optimization. THEORETICAL COMPUTER SCIENCE, 387, 236-248 [10.1016/j.tcs.2007.07.019].
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/4169
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 21
  • ???jsp.display-item.citation.isi??? 13
social impact