DNA sequences are the basic data type that is processed to perform a generic study of biological data analysis. One key component of the biological analysis is represented by sequence classification, a methodology that is widely used to analyze sequential data of different nature. However, its application to DNA sequences requires a proper representation of such sequences, which is still an open research problem. Machine Learning (ML) methodologies have given a fundamental contribution to the solution of the problem. Among them, recently, also Deep Neural Network (DNN) models have shown strongly encouraging results. In this chapter, we deal with specific classification problems related to two biological scenarios: (A) metagenomics and (B) chromatin organization. The investigations have been carried out by considering DNA sequences as input data for the classifica-tion methodologies. In particular, we study and test the efficacy of (1) different DNA sequence representations and (2) several Deep Learning (DL) architectures that process sequences for the solution of the related supervised classification problems. Although developed for specific classification tasks, we think that such architectures could be served as a suggestion for developing other DNN models that process the same kind of input.
Amato, D., Di Gangi, M.A., Fiannaca, A., La Paglia, L., La Rosa, M., Lo Bosco, G., et al. (2021). Classification of Sequences with Deep Artificial Neural Networks: Representation and Architectural Issues. In M. Elloumi (a cura di), Deep Learning for Biomedical Data Analysis (pp. 27-59) [10.1007/978-3-030-71676-9_2].
Classification of Sequences with Deep Artificial Neural Networks: Representation and Architectural Issues
Amato, Domenico;Lo Bosco, Giosué
;Rizzo, Riccardo;
2021-07-01
Abstract
DNA sequences are the basic data type that is processed to perform a generic study of biological data analysis. One key component of the biological analysis is represented by sequence classification, a methodology that is widely used to analyze sequential data of different nature. However, its application to DNA sequences requires a proper representation of such sequences, which is still an open research problem. Machine Learning (ML) methodologies have given a fundamental contribution to the solution of the problem. Among them, recently, also Deep Neural Network (DNN) models have shown strongly encouraging results. In this chapter, we deal with specific classification problems related to two biological scenarios: (A) metagenomics and (B) chromatin organization. The investigations have been carried out by considering DNA sequences as input data for the classifica-tion methodologies. In particular, we study and test the efficacy of (1) different DNA sequence representations and (2) several Deep Learning (DL) architectures that process sequences for the solution of the related supervised classification problems. Although developed for specific classification tasks, we think that such architectures could be served as a suggestion for developing other DNN models that process the same kind of input.File | Dimensione | Formato | |
---|---|---|---|
Amato_et_al.pdf
Solo gestori archvio
Tipologia:
Versione Editoriale
Dimensione
684.81 kB
Formato
Adobe PDF
|
684.81 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.