Matching statistics were introduced to solve the approximate string matching problem, which is a recurrent subroutine in bioinformatics applications. In 2010, Ohlebusch et al. [SPIRE 2010] proposed a time and space efficient algorithm for computing matching statistics which relies on some components of a compressed suffix tree - notably, the longest common prefix (LCP) array. In this paper, we show how their algorithm can be generalized from strings to Wheeler deterministic finite automata. Most importantly, we introduce a notion of LCP array for Wheeler automata, thus establishing a first clear step towards extending (compressed) suffix tree functionalities to labeled graphs.

Conte, A., Cotumaccio, N., Gagie, T., Manzini, G., Prezza, N., Sciortino, M. (2023). Computing matching statistics on Wheeler DFAs. In A. Bilgin, M.W. Marcellin, J. Serra-Sagrista, J.A. Storer (a cura di), Data Compression Conference 2023 (pp. 150-159). 10662 LOS VAQUEROS CIRCLE, PO BOX 3014, LOS ALAMITOS, CA 90720-1264 USA : IEEE COMPUTER SOC [10.1109/DCC55655.2023.00023].

Computing matching statistics on Wheeler DFAs

Sciortino, M
2023-01-01

Abstract

Matching statistics were introduced to solve the approximate string matching problem, which is a recurrent subroutine in bioinformatics applications. In 2010, Ohlebusch et al. [SPIRE 2010] proposed a time and space efficient algorithm for computing matching statistics which relies on some components of a compressed suffix tree - notably, the longest common prefix (LCP) array. In this paper, we show how their algorithm can be generalized from strings to Wheeler deterministic finite automata. Most importantly, we introduce a notion of LCP array for Wheeler automata, thus establishing a first clear step towards extending (compressed) suffix tree functionalities to labeled graphs.
2023
979-8-3503-4795-1
Conte, A., Cotumaccio, N., Gagie, T., Manzini, G., Prezza, N., Sciortino, M. (2023). Computing matching statistics on Wheeler DFAs. In A. Bilgin, M.W. Marcellin, J. Serra-Sagrista, J.A. Storer (a cura di), Data Compression Conference 2023 (pp. 150-159). 10662 LOS VAQUEROS CIRCLE, PO BOX 3014, LOS ALAMITOS, CA 90720-1264 USA : IEEE COMPUTER SOC [10.1109/DCC55655.2023.00023].
File in questo prodotto:
File Dimensione Formato  
Computing_matching_statistics_on_Wheeler_DFAs.pdf

Solo gestori archvio

Tipologia: Versione Editoriale
Dimensione 195.83 kB
Formato Adobe PDF
195.83 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
2301.05338.pdf

accesso aperto

Descrizione: Pre-print su arXiv
Tipologia: Pre-print
Dimensione 161.03 kB
Formato Adobe PDF
161.03 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/620076
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 4
  • ???jsp.display-item.citation.isi??? 2
social impact