The Longest Common Factor (LCF) of a set of strings is a well studied problem having a wide range of applications in Bioinformatics: from microarrays to DNA sequences analysis. This problem has been solved by Hui (2000) who uses a famous constant-time solution to the Lowest Common Ancestor (LCA) problem in trees coupled with use of suffix trees. A data structure for the LCA problem, although linear in space and construction time, introduces a multiplicative constant in both space and time that reduces the range of applications in many biological applications. In this article we present a new method for solving the LCF problem using the suffix tree structure with an auxiliary array that take space O(n). Our algorithm works in time O(nlog a), where n is the total input size and a is the multiplicative constant introduced by the alphabet. a is the size of the alphabet. We also consider a different version of our algorithm that applies to DAWGs. In this case, we prove that the algorithm works in both time and space proportional to data DAWG’s size.

CROCHEMORE M, GABRIELE A, MIGNOSI F, PESARESI M (2008). On the Longest Common Factor Problem. In IFIP International Federation for Information Processing (pp.143-155). Springer Boston [10.1007/978-0-387-09680-3_10].

On the Longest Common Factor Problem

GABRIELE, Alessandra;MIGNOSI, Filippo;
2008-01-01

Abstract

The Longest Common Factor (LCF) of a set of strings is a well studied problem having a wide range of applications in Bioinformatics: from microarrays to DNA sequences analysis. This problem has been solved by Hui (2000) who uses a famous constant-time solution to the Lowest Common Ancestor (LCA) problem in trees coupled with use of suffix trees. A data structure for the LCA problem, although linear in space and construction time, introduces a multiplicative constant in both space and time that reduces the range of applications in many biological applications. In this article we present a new method for solving the LCF problem using the suffix tree structure with an auxiliary array that take space O(n). Our algorithm works in time O(nlog a), where n is the total input size and a is the multiplicative constant introduced by the alphabet. a is the size of the alphabet. We also consider a different version of our algorithm that applies to DAWGs. In this case, we prove that the algorithm works in both time and space proportional to data DAWG’s size.
Settore INF/01 - Informatica
Fifth Ifip International Conference On Theoretical Computer Science – Tcs 2008
2008
13
CROCHEMORE M, GABRIELE A, MIGNOSI F, PESARESI M (2008). On the Longest Common Factor Problem. In IFIP International Federation for Information Processing (pp.143-155). Springer Boston [10.1007/978-0-387-09680-3_10].
Proceedings (atti dei congressi)
CROCHEMORE M; GABRIELE A; MIGNOSI F; PESARESI M
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/39788
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact