Unbalanced translocations are among the most frequent chromosomal alterations, accounted for 30% of all losses of heterozygosity, a major genetic event causing inactivation of tumor suppressor genes. Despite of their central role in genomic sequence analysis, little attention has been devoted to the problem of matching sequences allowing for this kind of chromosomal alteration. In this paper we investigate the approximate string matching problem when the edit operations are non-overlapping unbalanced translocations of adjacent factors. In particular, we first present a O(nm3)-time and O(m2)-space algorithm based on the dynamic-programming approach. Then we improve our first result by designing a second solution which makes use of the Directed Acyclic Word Graph of the pattern. In particular, we show that under the assumptions of equiprobability and independence of characters, our algorithm has a O(n log2σ m) average time complexity, for an alphabet of size σ, still maintaining the O(nm3)-time and the O(m2)-space complexity in the worst case. To the best of our knowledge this is the first solution in literature for the approximate string matching problem allowing for unbalanced translocations of factors.

Cantone D., Faro S., Pavone A. (2020). Sequence searching allowing for non-overlapping adjacent unbalanced translocations. In 20th International Workshop on Algorithms in Bioinformatics (WABI 2020). Schloss Dagstuhl – Leibniz-Zentrum für Informatik [10.4230/LIPIcs.WABI.2020.19].

Sequence searching allowing for non-overlapping adjacent unbalanced translocations

Pavone A.
2020-01-01

Abstract

Unbalanced translocations are among the most frequent chromosomal alterations, accounted for 30% of all losses of heterozygosity, a major genetic event causing inactivation of tumor suppressor genes. Despite of their central role in genomic sequence analysis, little attention has been devoted to the problem of matching sequences allowing for this kind of chromosomal alteration. In this paper we investigate the approximate string matching problem when the edit operations are non-overlapping unbalanced translocations of adjacent factors. In particular, we first present a O(nm3)-time and O(m2)-space algorithm based on the dynamic-programming approach. Then we improve our first result by designing a second solution which makes use of the Directed Acyclic Word Graph of the pattern. In particular, we show that under the assumptions of equiprobability and independence of characters, our algorithm has a O(n log2σ m) average time complexity, for an alphabet of size σ, still maintaining the O(nm3)-time and the O(m2)-space complexity in the worst case. To the best of our knowledge this is the first solution in literature for the approximate string matching problem allowing for unbalanced translocations of factors.
2020
Settore INF/01 - Informatica
978-3-95977-161-0
Cantone D., Faro S., Pavone A. (2020). Sequence searching allowing for non-overlapping adjacent unbalanced translocations. In 20th International Workshop on Algorithms in Bioinformatics (WABI 2020). Schloss Dagstuhl – Leibniz-Zentrum für Informatik [10.4230/LIPIcs.WABI.2020.19].
File in questo prodotto:
File Dimensione Formato  
Pubblicazione 2020 - WABI.pdf

accesso aperto

Tipologia: Versione Editoriale
Dimensione 226.91 kB
Formato Adobe PDF
226.91 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/640675
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? ND
social impact