Archivio istituzionale della ricerca dell'Università degli Studi di Palermo

Searching for all occurrences of a pattern in a text is a fundamental problem in computer science with applications in many other fields, like natural language processing, information retrieval and computational biology. Sampled string matching is an efficient approach recently introduced in order to overcome the prohibitive space requirements of an index construction, on the one hand, and drastically reduce searching time for the online solutions, on the other hand. In this paper we present a new algorithm for the sampled string matching problem, based on a characters distance sampling approach. The main idea is to sample the distances between consecutive occurrences of a given pivot character and then to search online the sampled data for any occurrence of the sampled pattern, before verifying the original text. From a theoretical point of view we prove that, under suitable conditions, our solution can achieve both linear worst-case time complexity and optimal average-time complexity. From a practical point of view it turns out that our solution shows a sub-linear behaviour in practice and speeds up online searching by a factor of up to 9, using limited additional space whose amount goes from 11 to 2.8% of the text size, with a gain up to 50% if compared with previous solutions.

Faro S., Marino F.P., Pavone A. (2020). Efficient Online String Matching Based on Characters Distance Text Sampling. ALGORITHMICA, 82(11), 3390-3412 [10.1007/s00453-020-00732-4].

Efficient Online String Matching Based on Characters Distance Text Sampling

Faro S.;Marino F. P.;Pavone A.

2020-11-01

Abstract

Searching for all occurrences of a pattern in a text is a fundamental problem in computer science with applications in many other fields, like natural language processing, information retrieval and computational biology. Sampled string matching is an efficient approach recently introduced in order to overcome the prohibitive space requirements of an index construction, on the one hand, and drastically reduce searching time for the online solutions, on the other hand. In this paper we present a new algorithm for the sampled string matching problem, based on a characters distance sampling approach. The main idea is to sample the distances between consecutive occurrences of a given pivot character and then to search online the sampled data for any occurrence of the sampled pattern, before verifying the original text. From a theoretical point of view we prove that, under suitable conditions, our solution can achieve both linear worst-case time complexity and optimal average-time complexity. From a practical point of view it turns out that our solution shows a sub-linear behaviour in practice and speeds up online searching by a factor of up to 9, using limited additional space whose amount goes from 11 to 2.8% of the text size, with a gain up to 50% if compared with previous solutions.

Scheda breve

Scheda completa

Scheda completa (DC)

	Data
	
				nov-2020
			
	Titolo del periodico 
DATO PREVISTO SU LOGINMIUR
	
				ALGORITHMICA
			
	DOI del contributo 
DATO PREVISTO SU LOGINMIUR
	
				https://dx.doi.org/10.1007/s00453-020-00732-4
			
	URL dell'editore (Open access ove possibile)
	
				https://link.springer.com/article/10.1007/s00453-020-00732-4
			
	Citazione
	
				Faro S.,  Marino F.P.,  Pavone A. (2020). Efficient Online String Matching Based on Characters Distance Text Sampling. ALGORITHMICA, 82(11), 3390-3412 [10.1007/s00453-020-00732-4].
			
	Appare nelle tipologie:
	
				1.01 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
Pubblicazione 2020-Algorithmica.pdf Solo gestori archvio Tipologia: Versione Editoriale Dimensione 942.91 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	942.91 kB	Adobe PDF	Visualizza/Apri Richiedi una copia
post_print_pavone_handle_10447_640837.pdf accesso aperto Tipologia: Post-print Dimensione 423.78 kB Formato Adobe PDF Visualizza/Apri	423.78 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/640837

Citazioni

ND

9

5

social impact