Recent advances in next generation sequencing (NGS) technology provide a cost-effective approach to large-scale resequencing of livestock samples in order to study several biological phenomena. NGS produces millions of short DNA sequences that require an unbiased way to make possible comprehensive searches for variation to identify putative causative mutations for economically important traits. The aim of this work was to present a bioinformatics pipeline analysis for variants discovery in ovine genome. A total of 30 individuals belonging to Valle del Belice dairy ewes was used for whole genome sequencing of pooled libraries prepared using Illumina Nextera Kit. Paired-end sequencing was carried out in an 8-lanes flow-cell of the Illumina HiScanSQ platform yielding a total of 1,159,664,912, 101 bp length reads. The left and right raw reads were separated into two files, and converted to the fastq format using CASAVA 1.8. The whole procedure was split in different workflows, in order to give more flexibility to end-users. One workflow is aimed to verify the quality of the raw sequencing reads using FastQC and FASTX-Toolkit, in order to keep bases with Phred quality Score greater than 20 and to trim the reads with poor quality. Another step aligns the reads to the Ovis aries 3.1 reference genome using BWAmem with standard parameters. The resulting SAM file was converted in BAM file using the SAMtools software, then unmapped and duplicate reads were removed using the CleanSam and MarkDuplicate commands of the Picard software. Therefore, to get more accurate base qualities, Genetic Analysis Tool Kit (GATK) was used to locally realign reads such that the number of mismatching bases due to indels is minimized across all the reads (IndelRealigner) and to detect systematic errors in base quality scores (BaseRecalibrator). In the last workflow SNPs and indelsare identified using mpileup command of SAMtools software. The resulting BCF file is passed to “bcftools view” tool to be filtered and converted into VCF format. Finally, for variants annotation the SNPSift software was used. A total of 6,357,170 variations, of which 5,265,739 SNPs and 1,091,431 indels, were discovered. About 77% of the SNPs were present in the Ovis aries dbSNP v147 while the remaining were novel SNPs. The discovered SNPs must be validated and then could be used to several applications as phylogenic analysis, genome-wide association studies or genomic selection.

Tolone, M., Sardina, M.T., Di Gerlando, R., Mastrangelo, S., Sutera, A.M., Portolano, B. (2017). A pipeline for variants discovery using next-generation DNA sequencing data. In Proceeding XXII ASPA Congress (pp.151-152). Taylor & Francis Group.

A pipeline for variants discovery using next-generation DNA sequencing data

TOLONE, Marco;SARDINA, Maria Teresa;DI GERLANDO, Rosalia;MASTRANGELO, Salvatore;SUTERA, Anna Maria;PORTOLANO, Baldassare
2017-01-01

Abstract

Recent advances in next generation sequencing (NGS) technology provide a cost-effective approach to large-scale resequencing of livestock samples in order to study several biological phenomena. NGS produces millions of short DNA sequences that require an unbiased way to make possible comprehensive searches for variation to identify putative causative mutations for economically important traits. The aim of this work was to present a bioinformatics pipeline analysis for variants discovery in ovine genome. A total of 30 individuals belonging to Valle del Belice dairy ewes was used for whole genome sequencing of pooled libraries prepared using Illumina Nextera Kit. Paired-end sequencing was carried out in an 8-lanes flow-cell of the Illumina HiScanSQ platform yielding a total of 1,159,664,912, 101 bp length reads. The left and right raw reads were separated into two files, and converted to the fastq format using CASAVA 1.8. The whole procedure was split in different workflows, in order to give more flexibility to end-users. One workflow is aimed to verify the quality of the raw sequencing reads using FastQC and FASTX-Toolkit, in order to keep bases with Phred quality Score greater than 20 and to trim the reads with poor quality. Another step aligns the reads to the Ovis aries 3.1 reference genome using BWAmem with standard parameters. The resulting SAM file was converted in BAM file using the SAMtools software, then unmapped and duplicate reads were removed using the CleanSam and MarkDuplicate commands of the Picard software. Therefore, to get more accurate base qualities, Genetic Analysis Tool Kit (GATK) was used to locally realign reads such that the number of mismatching bases due to indels is minimized across all the reads (IndelRealigner) and to detect systematic errors in base quality scores (BaseRecalibrator). In the last workflow SNPs and indelsare identified using mpileup command of SAMtools software. The resulting BCF file is passed to “bcftools view” tool to be filtered and converted into VCF format. Finally, for variants annotation the SNPSift software was used. A total of 6,357,170 variations, of which 5,265,739 SNPs and 1,091,431 indels, were discovered. About 77% of the SNPs were present in the Ovis aries dbSNP v147 while the remaining were novel SNPs. The discovered SNPs must be validated and then could be used to several applications as phylogenic analysis, genome-wide association studies or genomic selection.
Settore AGR/17 - Zootecnica Generale E Miglioramento Genetico
giu-2017
XXII ASPA Congress
Perugia
13-16 Giugno 2017
XXII
2017
2
A stampa
Tolone, M., Sardina, M.T., Di Gerlando, R., Mastrangelo, S., Sutera, A.M., Portolano, B. (2017). A pipeline for variants discovery using next-generation DNA sequencing data. In Proceeding XXII ASPA Congress (pp.151-152). Taylor & Francis Group.
Proceedings (atti dei congressi)
Tolone, M; Sardina, MT; Di Gerlando, R; Mastrangelo, S; Sutera, AM; Portolano, B.
File in questo prodotto:
File Dimensione Formato  
Book of Abstracts_2017.pdf

accesso aperto

Descrizione: Proceedings
Dimensione 25.69 MB
Formato Adobe PDF
25.69 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/241352
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact