Archivio istituzionale della ricerca dell'Università degli Studi di Palermo

With the rapid growth of Next Generation Sequencing (NGS) technologies, large amounts of "omics" data are daily collected and need to be processed. Indexing and compressing large sequences datasets are some of the most important tasks in this context. Here, we propose a novel approach for the computation of Burrows Wheeler transform relying on Big Data technologies, i.e., Apache Spark and Hadoop. We implement three algorithms based on the MapReduce framework, distributing the index computation and not only the input dataset, differently than previous approaches from the literature. Experimental results performed on real datasets show that the proposed approach is promising.

Galluzzo, Y., Giancarlo, R., Randazzo, M., Rombo, S.E. (2026). Burrows Wheeler Transform on a Large Scale: Algorithms Implemented in Apache Spark †. DATA, 11(3) [10.3390/data11030048].

Burrows Wheeler Transform on a Large Scale: Algorithms Implemented in Apache Spark †

Galluzzo Y.;Giancarlo R.;Rombo S. E.

2026-03-01

Abstract

With the rapid growth of Next Generation Sequencing (NGS) technologies, large amounts of "omics" data are daily collected and need to be processed. Indexing and compressing large sequences datasets are some of the most important tasks in this context. Here, we propose a novel approach for the computation of Burrows Wheeler transform relying on Big Data technologies, i.e., Apache Spark and Hadoop. We implement three algorithms based on the MapReduce framework, distributing the index computation and not only the input dataset, differently than previous approaches from the literature. Experimental results performed on real datasets show that the proposed approach is promising.

Scheda breve

Scheda completa

Scheda completa (DC)

	Data
	
				mar-2026
			
	Settore scientifico disciplinare del contributo
	
				Settore INFO-01/A - Informatica
			
	Titolo del periodico 
DATO PREVISTO SU LOGINMIUR
	
				DATA
			
	DOI del contributo 
DATO PREVISTO SU LOGINMIUR
	
				https://dx.doi.org/10.3390/data11030048
			
	URL dell'editore (Open access ove possibile)
	
				https://www.mdpi.com/2306-5729/11/3/48
			
	Citazione
	
				Galluzzo, Y., Giancarlo, R., Randazzo, M., Rombo, S.E. (2026). Burrows Wheeler Transform on a Large Scale: Algorithms Implemented in Apache Spark †. DATA, 11(3) [10.3390/data11030048].
			
	Appare nelle tipologie:
	
				1.01 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
data-11-00048.pdf accesso aperto Tipologia: Versione Editoriale Dimensione 809.77 kB Formato Adobe PDF Visualizza/Apri	809.77 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/703593

Citazioni

ND

1

1

social impact