In this paper we describe a factorial language, denoted by L(S, k,r), that contains all words that occur in a string 5 up to k mismatches every r symbols. Then we give some combinatorial properties of a parameter, called repetition index and denoted by R(S,k,r), defined as the smallest integer h ? 1 such that all strings of this length occur at most in a unique position of the text S up to k mismatches every r symbols. We prove that R(S, k, r) is a non-increasing function of r and a non-decreasing function of k and that the equation r = R(S, k, r) admits a unique solution. The repetition index plays an important role in the construction of an indexing data structure based on a trie that represents the set of all factors of L(S,k,r) having length equal to R(S,k,r). For each word x ?L(S, k, r) this data structure allows us to find the list occ(x) of all occurrences of the word x in a text S up to k mismatches every r symbols in time proportional to |x| + |occ(x)|.

Epifanio, C., Gabriele, A., Mignosi, F. (2005). Languages with mismatches and an application to approximate indexing. In Developments in Language Theory (pp.224-235) [10.1007/11505877_20].

Languages with mismatches and an application to approximate indexing

EPIFANIO, Chiara;GABRIELE, Alessandra;MIGNOSI, Filippo
2005-01-01

Abstract

In this paper we describe a factorial language, denoted by L(S, k,r), that contains all words that occur in a string 5 up to k mismatches every r symbols. Then we give some combinatorial properties of a parameter, called repetition index and denoted by R(S,k,r), defined as the smallest integer h ? 1 such that all strings of this length occur at most in a unique position of the text S up to k mismatches every r symbols. We prove that R(S, k, r) is a non-increasing function of r and a non-decreasing function of k and that the equation r = R(S, k, r) admits a unique solution. The repetition index plays an important role in the construction of an indexing data structure based on a trie that represents the set of all factors of L(S,k,r) having length equal to R(S,k,r). For each word x ?L(S, k, r) this data structure allows us to find the list occ(x) of all occurrences of the word x in a text S up to k mismatches every r symbols in time proportional to |x| + |occ(x)|.
International Conference on Developments in Language Theory
Palermo
4-8 luglio
9
2005
12
A stampa
Epifanio, C., Gabriele, A., Mignosi, F. (2005). Languages with mismatches and an application to approximate indexing. In Developments in Language Theory (pp.224-235) [10.1007/11505877_20].
Proceedings (atti dei congressi)
Epifanio, C; Gabriele, A; Mignosi, F
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/28264
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 4
  • ???jsp.display-item.citation.isi??? 4
social impact