We first review how we can store a run-length compressed suffix array (RLCSA) for a text T of length n over an alphabet of size σ whose Burrows-Wheeler Transform (BWT) consists of r runs in O (r log(n/r) + r log σ + σ) bits such that later, given character a and the suffix-array (SA) interval for P, we can find the SA interval for aP in O(log ra + log log n) time, where ra is the number of runs of copies of a in the BWT. We then show how to modify the RLCSA such that we find the SA interval for aP in only O(log ra) time, without increasing its asymptotic space bound. Our key idea is applying a result by Nishimoto and Tabei (ICALP 2021) and then replacing rank queries on sparse bitvectors by a constant number of select queries. We also review two-level indexing and discuss how our faster RLCSA may be useful in improving it. Finally, we briefly discuss how two-level indexing may speed up a recent heuristic for finding maximal exact matches of a pattern with respect to an indexed text.
Brown, N.K., Gagie, T., Manzini, G., Navarro, G., Sciortino, M. (2025). Faster Run-Length Compressed Suffix Arrays. In A.M. Alessio Conte (a cura di), From Strings to Graphs, and Back Again: A Festschrift for Roberto Grossi’s 60th Birthday. Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing [10.4230/OASIcs.Grossi.10].
Faster Run-Length Compressed Suffix Arrays
Sciortino M.
2025-01-01
Abstract
We first review how we can store a run-length compressed suffix array (RLCSA) for a text T of length n over an alphabet of size σ whose Burrows-Wheeler Transform (BWT) consists of r runs in O (r log(n/r) + r log σ + σ) bits such that later, given character a and the suffix-array (SA) interval for P, we can find the SA interval for aP in O(log ra + log log n) time, where ra is the number of runs of copies of a in the BWT. We then show how to modify the RLCSA such that we find the SA interval for aP in only O(log ra) time, without increasing its asymptotic space bound. Our key idea is applying a result by Nishimoto and Tabei (ICALP 2021) and then replacing rank queries on sparse bitvectors by a constant number of select queries. We also review two-level indexing and discuss how our faster RLCSA may be useful in improving it. Finally, we briefly discuss how two-level indexing may speed up a recent heuristic for finding maximal exact matches of a pattern with respect to an indexed text.| File | Dimensione | Formato | |
|---|---|---|---|
|
OASIcs.Grossi.10.pdf
accesso aperto
Tipologia:
Versione Editoriale
Dimensione
734.91 kB
Formato
Adobe PDF
|
734.91 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


