A bit catastrophe, loosely defined, is when a change in just one character of a string causes a significant change in the size of the compressed string. We study this phenomenon for the Burrows-Wheeler Transform (BWT), a string transform at the heart of several of the most popular compressors and aligners today. The parameter determining the size of the compressed data is the number of equal-letter runs of the BWT, commonly denoted r. We exhibit infinite families of strings in which insertion, deletion, resp. substitution of one character increases r from constant to Θ(log n), where n is the length of the string. These strings can be interpreted both as examples for an increase by a multiplicative or an additive Θ(log n) -factor. As regards multiplicative factor, they attain the upper bound given by Akagi, Funakoshi, and Inenaga [Inf & Comput. 2023] of O(log nlog r), since here r= O(1 ). We then give examples of strings in which insertion, deletion, resp. substitution of a character increases r by a Θ(n) additive factor. These strings significantly improve the best known lower bound for an additive factor of Ω(log n) [Giuliani et al., SOFSEM 2021].

Giuliani S., Inenaga S., Liptak Z., Romana G., Sciortino M., Urbina C. (2023). Bit Catastrophes for the Burrows-Wheeler Transform. In F. Drewes, M. Volkov (a cura di), Developments in Language Theory. 27th International Conference, DLT 2023, Umeå, Sweden, June 12–16, 2023, Proceedings (pp. 86-99). Springer [10.1007/978-3-031-33264-7_8].

Bit Catastrophes for the Burrows-Wheeler Transform

Romana G.;Sciortino M.
;
2023-01-01

Abstract

A bit catastrophe, loosely defined, is when a change in just one character of a string causes a significant change in the size of the compressed string. We study this phenomenon for the Burrows-Wheeler Transform (BWT), a string transform at the heart of several of the most popular compressors and aligners today. The parameter determining the size of the compressed data is the number of equal-letter runs of the BWT, commonly denoted r. We exhibit infinite families of strings in which insertion, deletion, resp. substitution of one character increases r from constant to Θ(log n), where n is the length of the string. These strings can be interpreted both as examples for an increase by a multiplicative or an additive Θ(log n) -factor. As regards multiplicative factor, they attain the upper bound given by Akagi, Funakoshi, and Inenaga [Inf & Comput. 2023] of O(log nlog r), since here r= O(1 ). We then give examples of strings in which insertion, deletion, resp. substitution of a character increases r by a Θ(n) additive factor. These strings significantly improve the best known lower bound for an additive factor of Ω(log n) [Giuliani et al., SOFSEM 2021].
2023
978-3-031-33263-0
978-3-031-33264-7
Giuliani S., Inenaga S., Liptak Z., Romana G., Sciortino M., Urbina C. (2023). Bit Catastrophes for the Burrows-Wheeler Transform. In F. Drewes, M. Volkov (a cura di), Developments in Language Theory. 27th International Conference, DLT 2023, Umeå, Sweden, June 12–16, 2023, Proceedings (pp. 86-99). Springer [10.1007/978-3-031-33264-7_8].
File in questo prodotto:
File Dimensione Formato  
Bit-Catastrophes-fortheBurrowsWheeler-TransformLecture-Notes-in-Computer-Science-including-subseries-Lecture-Notes-in-Artificial-Intelligence-and-Lecture-Notes-in-Bioinformatics.pdf

Solo gestori archvio

Tipologia: Versione Editoriale
Dimensione 473.64 kB
Formato Adobe PDF
473.64 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/620079
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 4
  • ???jsp.display-item.citation.isi??? ND
social impact