Sampled String Matching is a hybrid approach to the string matching problem, blending online and offline solutions. Among various sampling methods, Character Distance Sampling (CDS) is one of the fastest and most versatile techniques. In optimal conditions, CDS can achieve speedup of up to 100 times, while requiring minimal additional space — ranging from 10% to as little as 0.8% of the original text size. Furthermore, CDS is adaptable, effectively handling non-classical string matching problems and computing structural properties of strings such as periods and coverages. As with all sampling-based matching algorithms, a verification phase on the original text is necessary after the initial matching on the sampled strings. Often, the computational effort required for this verification phase can be substantial. In this article, we introduce a novel sampling method that tracks the context around each sampled location rather than the distances to those locations. This approach aims to reduce the number of candidate occurrences and the subsequent verification effort. Our experimental results indicate that the proposed method outperforms CDS, particularly for short patterns, achieving a speedup of between 15% and 40%.

Faro, S., Lecroq, T., Marino, F.P., Pavone, A., Scafiti, S. (2024). Improving Sampled Matching through Character Context Sampling. In U. Ugo de'Liguoro, M. Palazzo, L. Roversi (a cura di), Proceedings of the 25th Italian Conference on Theoretical Computer Science (pp. 300-312).

Improving Sampled Matching through Character Context Sampling

Pavone A.
;
2024-01-01

Abstract

Sampled String Matching is a hybrid approach to the string matching problem, blending online and offline solutions. Among various sampling methods, Character Distance Sampling (CDS) is one of the fastest and most versatile techniques. In optimal conditions, CDS can achieve speedup of up to 100 times, while requiring minimal additional space — ranging from 10% to as little as 0.8% of the original text size. Furthermore, CDS is adaptable, effectively handling non-classical string matching problems and computing structural properties of strings such as periods and coverages. As with all sampling-based matching algorithms, a verification phase on the original text is necessary after the initial matching on the sampled strings. Often, the computational effort required for this verification phase can be substantial. In this article, we introduce a novel sampling method that tracks the context around each sampled location rather than the distances to those locations. This approach aims to reduce the number of candidate occurrences and the subsequent verification effort. Our experimental results indicate that the proposed method outperforms CDS, particularly for short patterns, achieving a speedup of between 15% and 40%.
2024
Settore INFO-01/A - Informatica
Faro, S., Lecroq, T., Marino, F.P., Pavone, A., Scafiti, S. (2024). Improving Sampled Matching through Character Context Sampling. In U. Ugo de'Liguoro, M. Palazzo, L. Roversi (a cura di), Proceedings of the 25th Italian Conference on Theoretical Computer Science (pp. 300-312).
File in questo prodotto:
File Dimensione Formato  
paper190-2.pdf

accesso aperto

Tipologia: Versione Editoriale
Dimensione 1.47 MB
Formato Adobe PDF
1.47 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/692038
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact