Bacteria classification has been deeply investigated with different tools for many purposes, such as early diagnosis, metagenomics, phylogenetics. Classification methods based on ribosomal DNA sequences are considered a reference in this area. We present a new classificatier for bacteria species based on a dissimilarity measure of purely combinatorial nature. This measure is based on the notion of Minimal Absent Words, a combinatorial definition that recently found applications in bioinformatics. We can therefore incorporate this measure into a probabilistic neural network in order to classify bacteria species. Our approach is motivated by the fact that there is a vast literature on the combinatorics of Minimal Absent Words in relation with the degree of repetitiveness of a sequence. We ran our experiments on a public dataset of Ribosomal RNA Sequences from the complex 16S. Our approach showed a very high score in the accuracy of the classification, proving hence that our method is comparable with the standard tools available for the automatic classification of bacteria species.

Fici, G., Langiu, A., Lo Bosco, G., Rizzo, R. (2018). Bacteria classification using minimal absent words. AIMS MEDICAL SCIENCE, 5(1), 23-32 [10.3934/medsci.2018.1.23].

Bacteria classification using minimal absent words

Fici, G
;
Lo Bosco, G;Rizzo, R.
2018-01-01

Abstract

Bacteria classification has been deeply investigated with different tools for many purposes, such as early diagnosis, metagenomics, phylogenetics. Classification methods based on ribosomal DNA sequences are considered a reference in this area. We present a new classificatier for bacteria species based on a dissimilarity measure of purely combinatorial nature. This measure is based on the notion of Minimal Absent Words, a combinatorial definition that recently found applications in bioinformatics. We can therefore incorporate this measure into a probabilistic neural network in order to classify bacteria species. Our approach is motivated by the fact that there is a vast literature on the combinatorics of Minimal Absent Words in relation with the degree of repetitiveness of a sequence. We ran our experiments on a public dataset of Ribosomal RNA Sequences from the complex 16S. Our approach showed a very high score in the accuracy of the classification, proving hence that our method is comparable with the standard tools available for the automatic classification of bacteria species.
2018
Settore INF/01 - Informatica
Fici, G., Langiu, A., Lo Bosco, G., Rizzo, R. (2018). Bacteria classification using minimal absent words. AIMS MEDICAL SCIENCE, 5(1), 23-32 [10.3934/medsci.2018.1.23].
File in questo prodotto:
File Dimensione Formato  
Bacteria classification using minimal absent words.pdf

accesso aperto

Tipologia: Versione Editoriale
Dimensione 753.24 kB
Formato Adobe PDF
753.24 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/250124
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? 1
social impact