A word x that is absent from a word y is called minimal if all its proper factors occur in y. Given a collection of k words y-1, y-2,...,y-k over an alphabet Σ, we are asked to compute the set M^ℓ-y-1#...#y-k of minimal absent words of length at most ℓ of word y=y-1#y-2#...#y-k, NotElementΣ. In data compression, this corresponds to computing the antidictionary of k documents. In bioinformatics, it corresponds to computing words that are absent from a genome of k chromosomes. This computation generally requires Ω(n) space for n=|y| using any of the plenty available O(n)-time algorithms. This is because an Ω(n)-sized text index is constructed over y which can be impractical for large n. We do the identical computation incrementally using output-sensitive space. This goal is reasonable when ||M^ℓ-y-1#...#y-N || =o(n), for all N [1, k]. For instance, in the human genome, n ≈ 3 × 10^9 but ||M^12-y-1#...#y-k|| ≈ 10^6. We consider a constant-sized alphabet for stating our results. We show that all M^ℓ-y-1,...,M^ℓ-y-1#...#y-k can be computed in O(kn+k-N=1||M^ℓ-y-1#...#y-N||) total time using O(MaxIn+MaxOut) space, where MaxIn is the length of the longest word in y-1,...,y-k and MaxOut=max||M^ℓ-y-1#...#y-N||:N [1, k]. Proof-of-concept experimental results are also provided confirming our theoretical findings and justifying our contribution

Ayad L.A.K., Badkobeh G., Fici G., Heliou A., Pissis S.P. (2019). Constructing Antidictionaries in Output-Sensitive Space. In Data Compression Conference Proceedings (pp. 538-547). 345 E 47TH ST, NEW YORK, NY 10017 USA : Institute of Electrical and Electronics Engineers Inc. [10.1109/DCC.2019.00062].

Constructing Antidictionaries in Output-Sensitive Space

Fici G.;
2019-01-01

Abstract

A word x that is absent from a word y is called minimal if all its proper factors occur in y. Given a collection of k words y-1, y-2,...,y-k over an alphabet Σ, we are asked to compute the set M^ℓ-y-1#...#y-k of minimal absent words of length at most ℓ of word y=y-1#y-2#...#y-k, NotElementΣ. In data compression, this corresponds to computing the antidictionary of k documents. In bioinformatics, it corresponds to computing words that are absent from a genome of k chromosomes. This computation generally requires Ω(n) space for n=|y| using any of the plenty available O(n)-time algorithms. This is because an Ω(n)-sized text index is constructed over y which can be impractical for large n. We do the identical computation incrementally using output-sensitive space. This goal is reasonable when ||M^ℓ-y-1#...#y-N || =o(n), for all N [1, k]. For instance, in the human genome, n ≈ 3 × 10^9 but ||M^12-y-1#...#y-k|| ≈ 10^6. We consider a constant-sized alphabet for stating our results. We show that all M^ℓ-y-1,...,M^ℓ-y-1#...#y-k can be computed in O(kn+k-N=1||M^ℓ-y-1#...#y-N||) total time using O(MaxIn+MaxOut) space, where MaxIn is the length of the longest word in y-1,...,y-k and MaxOut=max||M^ℓ-y-1#...#y-N||:N [1, k]. Proof-of-concept experimental results are also provided confirming our theoretical findings and justifying our contribution
2019
978-1-7281-0657-1
Ayad L.A.K., Badkobeh G., Fici G., Heliou A., Pissis S.P. (2019). Constructing Antidictionaries in Output-Sensitive Space. In Data Compression Conference Proceedings (pp. 538-547). 345 E 47TH ST, NEW YORK, NY 10017 USA : Institute of Electrical and Electronics Engineers Inc. [10.1109/DCC.2019.00062].
File in questo prodotto:
File Dimensione Formato  
Constructing Antidictionaries in Output-Sensitive Space.pdf

Solo gestori archvio

Tipologia: Versione Editoriale
Dimensione 571.77 kB
Formato Adobe PDF
571.77 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/372623
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 6
  • ???jsp.display-item.citation.isi??? 2
social impact