Archivio istituzionale della ricerca dell'Università degli Studi di Palermo

In this paper we study some combinatorial properties of a class of languages that represent sets of words occurring in a text S up to some errors. More precisely, we consider sets of words that occur in a text S with k mismatches in any window of size r. The study of this class of languages mainly focuses both on a parameter, called repetition index, and on the set of the minimal forbidden words of the language of factors of S with errors. The repetition index of a string S is defined as the smallest integer such that all strings of this length occur at most in a unique position of the text S up to errors. We prove that there is a strong relation between the repetition index of S and the maximal length of the minimal forbidden words of the language of factors of S with errors. Moreover, the repetition index plays an important role in the construction of an indexing data structure. More precisely, given a text S over a fixed alphabet, we build a data structure for approximate string matching having average size O(|S|⋅logk+1|S|) and answering queries in time O(|x|+|occ(x)|) for any word x, where occ is the list of all occurrences of x in S up to errors.

EPIFANIO C, GABRIELE A, MIGNOSI F, RESTIVO A, SCIORTINO M (2007). Languages with mismatches. THEORETICAL COMPUTER SCIENCE, 385(1-3), 152-166 [10.1016/j.tcs.2007.06.006].

Languages with mismatches

EPIFANIO, Chiara;GABRIELE, Alessandra;MIGNOSI, Filippo;RESTIVO, Antonio;SCIORTINO, Marinella

2007-01-01

Abstract

In this paper we study some combinatorial properties of a class of languages that represent sets of words occurring in a text S up to some errors. More precisely, we consider sets of words that occur in a text S with k mismatches in any window of size r. The study of this class of languages mainly focuses both on a parameter, called repetition index, and on the set of the minimal forbidden words of the language of factors of S with errors. The repetition index of a string S is defined as the smallest integer such that all strings of this length occur at most in a unique position of the text S up to errors. We prove that there is a strong relation between the repetition index of S and the maximal length of the minimal forbidden words of the language of factors of S with errors. Moreover, the repetition index plays an important role in the construction of an indexing data structure. More precisely, given a text S over a fixed alphabet, we build a data structure for approximate string matching having average size O(|S|⋅logk+1|S|) and answering queries in time O(|x|+|occ(x)|) for any word x, where occ is the list of all occurrences of x in S up to errors.

Scheda breve

Scheda completa

Scheda completa (DC)

	Data
	
				2007
			
	Titolo del periodico 
DATO PREVISTO SU LOGINMIUR
	
				THEORETICAL COMPUTER SCIENCE
			
	DOI del contributo 
DATO PREVISTO SU LOGINMIUR
	
				https://dx.doi.org/10.1016/j.tcs.2007.06.006
			
	Citazione
	
				EPIFANIO C,  GABRIELE A,  MIGNOSI F,  RESTIVO A,  SCIORTINO M (2007). Languages with mismatches. THEORETICAL COMPUTER SCIENCE, 385(1-3), 152-166 [10.1016/j.tcs.2007.06.006].
			
	Appare nelle tipologie:
	
				1.01 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
1-s2.0-S0304397507004847-main.pdf Solo gestori archvio Dimensione 371.62 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	371.62 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/29642

Citazioni

ND

15

12

social impact