Computing the Original eBWT Faster, Simpler, and with Less Memory

Boucher, C.; Cenzato, D.; Liptak, Z.; Rossi, M.; Sciortino, M.

doi:10.1007/978-3-030-86692-1_11

Mantaci et al. [TCS 2007] defined the eBWT to extend the definition of the BWT to a collection of strings. However, since this introduction, it has been used more generally to describe any BWT of a collection of strings, and the fundamental property of the original definition (i.e., the independence from the input order) is frequently disregarded. In this paper, we propose a simple linear-time algorithm for the construction of the original eBWT, which does not require the preprocessing of Bannai et al. [CPM 2021]. As a byproduct, we obtain the first linear-time algorithm for computing the BWT of a single string that uses neither an end-of-string symbol nor Lyndon rotations. We combine our new eBWT construction with a variation of prefix-free parsing to allow for scalable construction of the eBWT. We evaluate our algorithm (pfpebwt) on sets of human chromosomes 19, Salmonella, and SARS-CoV2 genomes, and demonstrate that it is the fastest method for all collections, with a maximum speedup of 7.6 × on the second best method. The peak memory is at most 2 × larger than the second best method. Comparing with methods that are also, as our algorithm, able to report suffix array samples, we obtain a 57.1 × improvement in peak memory. The source code is publicly available at https://github.com/davidecenzato/PFP-eBWT.

Boucher C., Cenzato D., Liptak Z., Rossi M., Sciortino M. (2021). Computing the Original eBWT Faster, Simpler, and with Less Memory. In T. Lecroq, H. Touzet (a cura di), String Processing and Information Retrieval - 28th International Symposium, SPIRE 2021, Lille, France, October 4–6, 2021, Proceedings Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (pp. 129-142). Springer Science and Business Media Deutschland GmbH [10.1007/978-3-030-86692-1_11].

Computing the Original eBWT Faster, Simpler, and with Less Memory

Boucher C.;Cenzato D.;Liptak Z.;Rossi M.;Sciortino M.

2021-01-01

Abstract

Mantaci et al. [TCS 2007] defined the eBWT to extend the definition of the BWT to a collection of strings. However, since this introduction, it has been used more generally to describe any BWT of a collection of strings, and the fundamental property of the original definition (i.e., the independence from the input order) is frequently disregarded. In this paper, we propose a simple linear-time algorithm for the construction of the original eBWT, which does not require the preprocessing of Bannai et al. [CPM 2021]. As a byproduct, we obtain the first linear-time algorithm for computing the BWT of a single string that uses neither an end-of-string symbol nor Lyndon rotations. We combine our new eBWT construction with a variation of prefix-free parsing to allow for scalable construction of the eBWT. We evaluate our algorithm (pfpebwt) on sets of human chromosomes 19, Salmonella, and SARS-CoV2 genomes, and demonstrate that it is the fastest method for all collections, with a maximum speedup of 7.6 × on the second best method. The peak memory is at most 2 × larger than the second best method. Comparing with methods that are also, as our algorithm, able to report suffix array samples, we obtain a 57.1 × improvement in peak memory. The source code is publicly available at https://github.com/davidecenzato/PFP-eBWT.

Scheda breve

Scheda completa

Scheda completa (DC)

	Data
	
				2021
			
	ISBN della monografia 
DATO PREVISTO SU LOGINMIUR
	
				978-3-030-86691-4
978-3-030-86692-1
			
	DOI del contributo 
DATO PREVISTO SU LOGINMIUR
	
				https://dx.doi.org/10.1007/978-3-030-86692-1_11
			
	URL dell'editore (Open access ove possibile)
	
				https://link.springer.com/chapter/10.1007/978-3-030-86692-1_11
			
	Citazione
	
				Boucher C.,  Cenzato D.,  Liptak Z.,  Rossi M.,  Sciortino M. (2021). Computing the Original eBWT Faster, Simpler, and with Less Memory. In T. Lecroq, H. Touzet (a cura di), String Processing and Information Retrieval - 
28th International Symposium, SPIRE 2021, Lille, France, October 4–6, 2021, Proceedings
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (pp. 129-142). Springer Science and Business Media Deutschland GmbH [10.1007/978-3-030-86692-1_11].
			
	Appare nelle tipologie:
	
				2.07 Contributo in atti di convegno pubblicato in volume

File in questo prodotto:

File	Dimensione	Formato
Boucher2021_Chapter_ComputingTheOriginalEBWTFaster.pdf Solo gestori archvio Tipologia: Versione Editoriale Dimensione 445.07 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	445.07 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/526395

Citazioni

ND

32

25

Archivio istituzionale della ricerca dell'Università degli Studi di Palermo

Computing the Original eBWT Faster, Simpler, and with Less Memory

Boucher C.;Cenzato D.;Liptak Z.;Rossi M.;Sciortino M.

2021-01-01

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

Citazioni

social impact

Archivio istituzionale della ricerca dell'Università degli Studi di Palermo

Computing the Original eBWT Faster, Simpler, and with Less Memory

Boucher C.;Cenzato D.;Liptak Z.;Rossi M.;Sciortino M.

2021-01-01

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Informazioni

Citazioni

social impact

Conferma cancellazione

Scheda breve

Scheda completa

Scheda completa (DC)