Archivio istituzionale della ricerca dell'Università degli Studi di Palermo

A word x that is absent from a word y is called minimal if all its proper factors occur in y. Given a collection of k words y-1, y-2,...,y-k over an alphabet Σ, we are asked to compute the set M^ℓ-y-1#...#y-k of minimal absent words of length at most ℓ of word y=y-1#y-2#...#y-k, NotElementΣ. In data compression, this corresponds to computing the antidictionary of k documents. In bioinformatics, it corresponds to computing words that are absent from a genome of k chromosomes. This computation generally requires Ω(n) space for n=|y| using any of the plenty available O(n)-time algorithms. This is because an Ω(n)-sized text index is constructed over y which can be impractical for large n. We do the identical computation incrementally using output-sensitive space. This goal is reasonable when ||M^ℓ-y-1#...#y-N || =o(n), for all N [1, k]. For instance, in the human genome, n ≈ 3 × 10^9 but ||M^12-y-1#...#y-k|| ≈ 10^6. We consider a constant-sized alphabet for stating our results. We show that all M^ℓ-y-1,...,M^ℓ-y-1#...#y-k can be computed in O(kn+k-N=1||M^ℓ-y-1#...#y-N||) total time using O(MaxIn+MaxOut) space, where MaxIn is the length of the longest word in y-1,...,y-k and MaxOut=max||M^ℓ-y-1#...#y-N||:N [1, k]. Proof-of-concept experimental results are also provided confirming our theoretical findings and justifying our contribution

Ayad L.A.K., Badkobeh G., Fici G., Heliou A., Pissis S.P. (2019). Constructing Antidictionaries in Output-Sensitive Space. In Data Compression Conference Proceedings (pp. 538-547). 345 E 47TH ST, NEW YORK, NY 10017 USA : Institute of Electrical and Electronics Engineers Inc. [10.1109/DCC.2019.00062].

Constructing Antidictionaries in Output-Sensitive Space

Ayad L. A. K.;Badkobeh G.;Fici G.;Heliou A.;Pissis S. P.

2019-01-01

Abstract

A word x that is absent from a word y is called minimal if all its proper factors occur in y. Given a collection of k words y-1, y-2,...,y-k over an alphabet Σ, we are asked to compute the set M^ℓ-y-1#...#y-k of minimal absent words of length at most ℓ of word y=y-1#y-2#...#y-k, NotElementΣ. In data compression, this corresponds to computing the antidictionary of k documents. In bioinformatics, it corresponds to computing words that are absent from a genome of k chromosomes. This computation generally requires Ω(n) space for n=|y| using any of the plenty available O(n)-time algorithms. This is because an Ω(n)-sized text index is constructed over y which can be impractical for large n. We do the identical computation incrementally using output-sensitive space. This goal is reasonable when ||M^ℓ-y-1#...#y-N || =o(n), for all N [1, k]. For instance, in the human genome, n ≈ 3 × 10^9 but ||M^12-y-1#...#y-k|| ≈ 10^6. We consider a constant-sized alphabet for stating our results. We show that all M^ℓ-y-1,...,M^ℓ-y-1#...#y-k can be computed in O(kn+k-N=1||M^ℓ-y-1#...#y-N||) total time using O(MaxIn+MaxOut) space, where MaxIn is the length of the longest word in y-1,...,y-k and MaxOut=max||M^ℓ-y-1#...#y-N||:N [1, k]. Proof-of-concept experimental results are also provided confirming our theoretical findings and justifying our contribution

Scheda breve

Scheda completa

Scheda completa (DC)

	Data
	
				2019
			
	ISBN della monografia 
DATO PREVISTO SU LOGINMIUR
	
				978-1-7281-0657-1
			
	DOI del contributo 
DATO PREVISTO SU LOGINMIUR
	
				https://dx.doi.org/10.1109/DCC.2019.00062
			
	URL alternativo rispetto a quello dell'editore 
DATO PREVISTO SU LOGINMIUR
	
				http://ieeexplore.ieee.org/xpl/conhome.jsp?punumber=1000177
			
	Citazione
	
				Ayad L.A.K.,  Badkobeh G.,  Fici G.,  Heliou A.,  Pissis S.P. (2019). Constructing Antidictionaries in Output-Sensitive Space. In Data Compression Conference Proceedings (pp. 538-547). 345 E 47TH ST, NEW YORK, NY 10017 USA : Institute of Electrical and Electronics Engineers Inc. [10.1109/DCC.2019.00062].
			
	Appare nelle tipologie:
	
				2.07 Contributo in atti di convegno pubblicato in volume

File in questo prodotto:

File	Dimensione	Formato
Constructing Antidictionaries in Output-Sensitive Space.pdf Solo gestori archvio Tipologia: Versione Editoriale Dimensione 571.77 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	571.77 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/372623

Citazioni

ND

6

5

social impact