Archivio istituzionale della ricerca dell'Università degli Studi di Palermo

Suffix trees are highly regarded data structures for text indexing and string algorithms [MCreight 76, Weiner 73]. For any given string w of length n=|w|, a suffix tree for w takes O(n) nodes and links. It is often presented as a compacted version of a suffix trie for w, where the latter is the trie (or digital search tree) built on the suffixes of w. Here the compaction process replaces each maximal chain of unary nodes with a single arc. For this, the suffix tree requires that the labels of its arcs are substrings encoded as pointers to w (or equivalent information). On the contrary, the arcs of the suffix trie are labeled by single symbols but there can be Θ(n2) nodes and links for suffix tries in the worst case because of their unary nodes. It is an interesting question if the suffix trie can be stored using O(n) nodes. We present the linear-size suffix trie, which guarantees O(n) nodes. We use a new technique for reducing the number of unary nodes to O(n), that stems from some results on antidictionaries. For instance, by using the linear-size suffix trie, we are able to check whether a pattern p of length m=|p| occurs in w in O(mlog⁡|Σ|) time and we can find the longest common substring of two strings w1 and w2 in O((|w1|+|w2|)log⁡|Σ|) time for an alphabet Σ.

Crochemore, M., Epifanio, C., Grossi, R., Mignosi, F. (2016). Linear-size suffix tries. THEORETICAL COMPUTER SCIENCE, 638, 171-178 [10.1016/j.tcs.2016.04.002].

Linear-size suffix tries

Crochemore, M.;EPIFANIO, Chiara;Grossi, R.;Mignosi, F.

2016-01-01

Abstract

Suffix trees are highly regarded data structures for text indexing and string algorithms [MCreight 76, Weiner 73]. For any given string w of length n=|w|, a suffix tree for w takes O(n) nodes and links. It is often presented as a compacted version of a suffix trie for w, where the latter is the trie (or digital search tree) built on the suffixes of w. Here the compaction process replaces each maximal chain of unary nodes with a single arc. For this, the suffix tree requires that the labels of its arcs are substrings encoded as pointers to w (or equivalent information). On the contrary, the arcs of the suffix trie are labeled by single symbols but there can be Θ(n2) nodes and links for suffix tries in the worst case because of their unary nodes. It is an interesting question if the suffix trie can be stored using O(n) nodes. We present the linear-size suffix trie, which guarantees O(n) nodes. We use a new technique for reducing the number of unary nodes to O(n), that stems from some results on antidictionaries. For instance, by using the linear-size suffix trie, we are able to check whether a pattern p of length m=|p| occurs in w in O(mlog⁡|Σ|) time and we can find the longest common substring of two strings w1 and w2 in O((|w1|+|w2|)log⁡|Σ|) time for an alphabet Σ.

Scheda breve

Scheda completa

Scheda completa (DC)

	Data
	
				2016
			
	Titolo del periodico 
DATO PREVISTO SU LOGINMIUR
	
				THEORETICAL COMPUTER SCIENCE
			
	DOI del contributo 
DATO PREVISTO SU LOGINMIUR
	
				https://dx.doi.org/10.1016/j.tcs.2016.04.002
			
	URL dell'editore (Open access ove possibile)
	
				https://www.sciencedirect.com/science/article/pii/S0304397516300305?via=ihub
			
	Citazione
	
				Crochemore, M., Epifanio, C., Grossi, R., Mignosi, F. (2016). Linear-size suffix tries. THEORETICAL COMPUTER SCIENCE, 638, 171-178 [10.1016/j.tcs.2016.04.002].
			
	Appare nelle tipologie:
	
				1.01 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
LinearSizeSuffixTries.pdf Solo gestori archvio Tipologia: Versione Editoriale Dimensione 340.63 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	340.63 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/213681

Citazioni

ND

10

6

social impact