A bit catastrophe, loosely defined, is when a change in just one character of a string causes a significant change in the size of the compressed string. We study this phenomenon for the Burrows-Wheeler Transform (BWT), a string transform at the heart of several of the most popular compressors and aligners today. The parameter determining the size of the compressed data is the number of equal-letter runs of the BWT, commonly denoted $r$. We exhibit infinite families of strings in which insertion, deletion, resp.\ substitution of one character increases $r$ from constant to $\Theta(\log n)$, where $n$ is the length of the string. These strings can be interpreted both as examples for an increase by a multiplicative or an additive $\Theta(\log n)$-factor. As regards the multiplicative factor, they attain the upper bound given by Akagi, Funakoshi, and Inenaga [Inf \& Comput. 2023] of $\Oh(\log n \log r)$, since here $r=\Oh(1)$. We then give examples of strings in which insertion, deletion, resp.\ substitution of a character increases $r$ by a $\Theta(\sqrt n)$ additive factor. These strings significantly improve the best known lower bound for an additive factor of $\Omega(\log n)$ [Giuliani et al., SOFSEM 2021].

Giuliani, S., Inenaga, S., Lipták, Z., Romana, G., Sciortino, M., Urbina, C. (2025). Bit Catastrophes for the Burrows-Wheeler Transform. THEORY OF COMPUTING SYSTEMS, 69(2) [10.1007/s00224-024-10212-9].

Bit Catastrophes for the Burrows-Wheeler Transform

Romana, Giuseppe;Sciortino, Marinella
;
2025-01-01

Abstract

A bit catastrophe, loosely defined, is when a change in just one character of a string causes a significant change in the size of the compressed string. We study this phenomenon for the Burrows-Wheeler Transform (BWT), a string transform at the heart of several of the most popular compressors and aligners today. The parameter determining the size of the compressed data is the number of equal-letter runs of the BWT, commonly denoted $r$. We exhibit infinite families of strings in which insertion, deletion, resp.\ substitution of one character increases $r$ from constant to $\Theta(\log n)$, where $n$ is the length of the string. These strings can be interpreted both as examples for an increase by a multiplicative or an additive $\Theta(\log n)$-factor. As regards the multiplicative factor, they attain the upper bound given by Akagi, Funakoshi, and Inenaga [Inf \& Comput. 2023] of $\Oh(\log n \log r)$, since here $r=\Oh(1)$. We then give examples of strings in which insertion, deletion, resp.\ substitution of a character increases $r$ by a $\Theta(\sqrt n)$ additive factor. These strings significantly improve the best known lower bound for an additive factor of $\Omega(\log n)$ [Giuliani et al., SOFSEM 2021].
2025
Settore INFO-01/A - Informatica
Giuliani, S., Inenaga, S., Lipták, Z., Romana, G., Sciortino, M., Urbina, C. (2025). Bit Catastrophes for the Burrows-Wheeler Transform. THEORY OF COMPUTING SYSTEMS, 69(2) [10.1007/s00224-024-10212-9].
File in questo prodotto:
File Dimensione Formato  
s00224-024-10212-9.pdf

accesso aperto

Tipologia: Versione Editoriale
Dimensione 867.2 kB
Formato Adobe PDF
867.2 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/680030
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact