Archivio istituzionale della ricerca dell'Università degli Studi di Palermo

Sampled string matching is an efficient approach to the string matching problem introduced in order to overcome the prohibitive space requirements of indexed matching, on the one hand, and drastically re- duce searching time for the online solutions, on the other hand. Known solutions to sampled string matching are able to speed up the searching up to 9 times while using less than 11% of the text size. However, they appear to work efficiently only in the case of natural language texts or, in general, when searching on input sequences over large alphabets. In this paper we extend sampled-string matching to the case of small al- phabets obtaining a new efficient solution which turns out to be feasible also for searching biological data like genome or protein sequences. Our solution extends a recent approach called Character Distance Sampling by using condensed characters in order to enlarge the size of the alpha- bet and speed up the searching process. From our experimental results it turns out that our solution significantly reduces the searching time when compared against previous sampled string matching algorithms. In particular on biological data it leads to reduce the space consumption up to 80% and to speed up the searching up to 96%, thus improving the standard online string matching algorithm up to 99.6% using less than 1% of the original text size.

Faro S., Marino F.P., Pavone A. (2021). Enhancing Characters Distance Text Sampling by Condensed Alphabets. In C. Sacerdoti Coen, I. Salvo (a cura di), Proceedings of the 22nd Italian Conference on Theoretical Computer Science (pp. 1-15).

Enhancing Characters Distance Text Sampling by Condensed Alphabets

Faro S.;Marino F. P.;Pavone A.

2021-01-01

Abstract

Sampled string matching is an efficient approach to the string matching problem introduced in order to overcome the prohibitive space requirements of indexed matching, on the one hand, and drastically re- duce searching time for the online solutions, on the other hand. Known solutions to sampled string matching are able to speed up the searching up to 9 times while using less than 11% of the text size. However, they appear to work efficiently only in the case of natural language texts or, in general, when searching on input sequences over large alphabets. In this paper we extend sampled-string matching to the case of small al- phabets obtaining a new efficient solution which turns out to be feasible also for searching biological data like genome or protein sequences. Our solution extends a recent approach called Character Distance Sampling by using condensed characters in order to enlarge the size of the alpha- bet and speed up the searching process. From our experimental results it turns out that our solution significantly reduces the searching time when compared against previous sampled string matching algorithms. In particular on biological data it leads to reduce the space consumption up to 80% and to speed up the searching up to 96%, thus improving the standard online string matching algorithm up to 99.6% using less than 1% of the original text size.

Scheda breve

Scheda completa

Scheda completa (DC)

	Data
	
				2021
			
	URL dell'editore (Open access ove possibile)
	
				https://ceur-ws.org/Vol-3072/
			
	Citazione
	
				Faro S.,  Marino F.P.,  Pavone A. (2021). Enhancing Characters Distance Text Sampling by Condensed Alphabets. In C. Sacerdoti Coen, I. Salvo (a cura di), Proceedings of the 22nd Italian Conference on Theoretical Computer Science (pp. 1-15).
			
	Appare nelle tipologie:
	
				2.07 Contributo in atti di convegno pubblicato in volume

File in questo prodotto:

File	Dimensione	Formato
Pubblicazione 2021-ICTCS.pdf accesso aperto Tipologia: Versione Editoriale Dimensione 150.83 kB Formato Adobe PDF Visualizza/Apri	150.83 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/640632

Citazioni

ND

4

ND

social impact