The extended Burrows-Wheeler Transform (eBWT) [Mantaci et al. TCS 2007] is a variant of the BWT, introduced for collections of strings. In this paper, we present the extended r-index, an analogous data structure to the r-index [Gagie et al. JACM 2020]. It occupies O(r) words, with r the number of runs of the eBWT, and offers the same functionalities as the r-index. We also show how to efficiently support finding maximal exact matches (MEMs). We implemented the extended r-index and tested it on circular bacterial genomes and plasmids, comparing it to five state-of-the-art compressed text indexes. While our data structure maintains similar time and memory requirements for answering pattern matching queries as the original r-index, it is the only index in the literature that can naturally be used for both circular and linear input collections. This is an extended version of [Boucher et al., r-indexing the eBWT, SPIRE 2021].

Boucher, C., Cenzato, D., Lipták, Z., Rossi, M., Sciortino, M. (2024). r-indexing the eBWT. INFORMATION AND COMPUTATION, 298 [10.1016/j.ic.2024.105155].

r-indexing the eBWT

Sciortino, Marinella
2024-01-01

Abstract

The extended Burrows-Wheeler Transform (eBWT) [Mantaci et al. TCS 2007] is a variant of the BWT, introduced for collections of strings. In this paper, we present the extended r-index, an analogous data structure to the r-index [Gagie et al. JACM 2020]. It occupies O(r) words, with r the number of runs of the eBWT, and offers the same functionalities as the r-index. We also show how to efficiently support finding maximal exact matches (MEMs). We implemented the extended r-index and tested it on circular bacterial genomes and plasmids, comparing it to five state-of-the-art compressed text indexes. While our data structure maintains similar time and memory requirements for answering pattern matching queries as the original r-index, it is the only index in the literature that can naturally be used for both circular and linear input collections. This is an extended version of [Boucher et al., r-indexing the eBWT, SPIRE 2021].
2024
Settore INF/01 - Informatica
Boucher, C., Cenzato, D., Lipták, Z., Rossi, M., Sciortino, M. (2024). r-indexing the eBWT. INFORMATION AND COMPUTATION, 298 [10.1016/j.ic.2024.105155].
File in questo prodotto:
File Dimensione Formato  
r-indexing.pdf

accesso aperto

Tipologia: Versione Editoriale
Dimensione 2.35 MB
Formato Adobe PDF
2.35 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/635773
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? 0
social impact