DNA sequences are the basic data type that is processed to perform a generic study of biological data analysis. One key component of the biological analysis is represented by sequence classification, a methodology that is widely used to analyze sequential data of different nature. However, its application to DNA sequences requires a proper representation of such sequences, which is still an open research problem. Machine Learning (ML) methodologies have given a fundamental contribution to the solution of the problem. Among them, recently, also Deep Neural Network (DNN) models have shown strongly encouraging results. In this chapter, we deal with specific classification problems related to two biological scenarios: (A) metagenomics and (B) chromatin organization. The investigations have been carried out by considering DNA sequences as input data for the classifica-tion methodologies. In particular, we study and test the efficacy of (1) different DNA sequence representations and (2) several Deep Learning (DL) architectures that process sequences for the solution of the related supervised classification problems. Although developed for specific classification tasks, we think that such architectures could be served as a suggestion for developing other DNN models that process the same kind of input.

Amato, D., Di Gangi, M.A., Fiannaca, A., La Paglia, L., La Rosa, M., Lo Bosco, G., et al. (2021). Classification of Sequences with Deep Artificial Neural Networks: Representation and Architectural Issues. In M. Elloumi (a cura di), Deep Learning for Biomedical Data Analysis (pp. 27-59) [10.1007/978-3-030-71676-9_2].

Classification of Sequences with Deep Artificial Neural Networks: Representation and Architectural Issues

Amato, Domenico;Lo Bosco, Giosué
;
Rizzo, Riccardo;
2021-07-01

Abstract

DNA sequences are the basic data type that is processed to perform a generic study of biological data analysis. One key component of the biological analysis is represented by sequence classification, a methodology that is widely used to analyze sequential data of different nature. However, its application to DNA sequences requires a proper representation of such sequences, which is still an open research problem. Machine Learning (ML) methodologies have given a fundamental contribution to the solution of the problem. Among them, recently, also Deep Neural Network (DNN) models have shown strongly encouraging results. In this chapter, we deal with specific classification problems related to two biological scenarios: (A) metagenomics and (B) chromatin organization. The investigations have been carried out by considering DNA sequences as input data for the classifica-tion methodologies. In particular, we study and test the efficacy of (1) different DNA sequence representations and (2) several Deep Learning (DL) architectures that process sequences for the solution of the related supervised classification problems. Although developed for specific classification tasks, we think that such architectures could be served as a suggestion for developing other DNN models that process the same kind of input.
lug-2021
Settore INF/01 - Informatica
Amato, D., Di Gangi, M.A., Fiannaca, A., La Paglia, L., La Rosa, M., Lo Bosco, G., et al. (2021). Classification of Sequences with Deep Artificial Neural Networks: Representation and Architectural Issues. In M. Elloumi (a cura di), Deep Learning for Biomedical Data Analysis (pp. 27-59) [10.1007/978-3-030-71676-9_2].
File in questo prodotto:
File Dimensione Formato  
Amato_et_al.pdf

Solo gestori archvio

Tipologia: Versione Editoriale
Dimensione 684.81 kB
Formato Adobe PDF
684.81 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/515568
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? ND
social impact