Archivio istituzionale della ricerca dell'Università degli Studi di Palermo

Sampled string matching is a very effective technique to reduce the search time for a pattern within a text at the cost of a small amount of additional memory, used for storing a partial index of the text. This approach has recently received some interest and has been applied to improve both online and offline string matching solutions, improving standard solutions by more than 50%. However, this improvement is currently only achievable in the case of texts on large-sized alphabets, and remains small (or absent) in the case of small-sized alphabets. In this article we propose an extension of the approach to text-sampling, known as Character Distance Sampling, to the case of small alphabets, obtaining an improvement of up to 98% compared to standard solutions in the case of online string matching. We also extend this approach to the case of offline string matching, introducing a sampled version of the suffix array, obtaining performances up to 5 times higher than the search obtained on the standard suffix array. Differently from what has been done by previous solutions, our idea is not based on the reduction of the number of indexed suffixes, but on the construction of the index directly on the sampled text.

Faro S., Marino F.P., Pavone A. (2023). Improved characters distance sampling for online and offline text searching. THEORETICAL COMPUTER SCIENCE, 946 [10.1016/j.tcs.2022.12.034].

Improved characters distance sampling for online and offline text searching

Faro S.;Marino F. P.;Pavone A.

2023-02-10

Abstract

Sampled string matching is a very effective technique to reduce the search time for a pattern within a text at the cost of a small amount of additional memory, used for storing a partial index of the text. This approach has recently received some interest and has been applied to improve both online and offline string matching solutions, improving standard solutions by more than 50%. However, this improvement is currently only achievable in the case of texts on large-sized alphabets, and remains small (or absent) in the case of small-sized alphabets. In this article we propose an extension of the approach to text-sampling, known as Character Distance Sampling, to the case of small alphabets, obtaining an improvement of up to 98% compared to standard solutions in the case of online string matching. We also extend this approach to the case of offline string matching, introducing a sampled version of the suffix array, obtaining performances up to 5 times higher than the search obtained on the standard suffix array. Differently from what has been done by previous solutions, our idea is not based on the reduction of the number of indexed suffixes, but on the construction of the index directly on the sampled text.

Scheda breve

Scheda completa

Scheda completa (DC)

	Data
	
				10-feb-2023
			
	Titolo del periodico 
DATO PREVISTO SU LOGINMIUR
	
				THEORETICAL COMPUTER SCIENCE
			
	DOI del contributo 
DATO PREVISTO SU LOGINMIUR
	
				https://dx.doi.org/10.1016/j.tcs.2022.12.034
			
	URL dell'editore (Open access ove possibile)
	
				https://www.sciencedirect.com/science/article/abs/pii/S0304397522007642
			
	Citazione
	
				Faro S.,  Marino F.P.,  Pavone A. (2023). Improved characters distance sampling for online and offline text searching. THEORETICAL COMPUTER SCIENCE, 946 [10.1016/j.tcs.2022.12.034].
			
	Appare nelle tipologie:
	
				1.01 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
Pubblicazione 2023-TCS.pdf Solo gestori archvio Tipologia: Versione Editoriale Dimensione 402.81 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	402.81 kB	Adobe PDF	Visualizza/Apri Richiedi una copia
post-print-pavone_handle_10447_640839.pdf accesso aperto Tipologia: Post-print Dimensione 555.94 kB Formato Adobe PDF Visualizza/Apri	555.94 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/640839

Citazioni

ND

4

2

social impact