Archivio istituzionale della ricerca dell'Università degli Studi di Palermo

Recent advances in next generation sequencing (NGS) technology provide a cost-effective approach to large-scale resequencing of livestock samples in order to study several biological phenomena. NGS produces millions of short DNA sequences that require an unbiased way to make possible comprehensive searches for variation to identify putative causative mutations for economically important traits. The aim of this work was to present a bioinformatics pipeline analysis for variants discovery in ovine genome. A total of 30 individuals belonging to Valle del Belice dairy ewes was used for whole genome sequencing of pooled libraries prepared using Illumina Nextera Kit. Paired-end sequencing was carried out in an 8-lanes flow-cell of the Illumina HiScanSQ platform yielding a total of 1,159,664,912, 101 bp length reads. The left and right raw reads were separated into two files, and converted to the fastq format using CASAVA 1.8. The whole procedure was split in different workflows, in order to give more flexibility to end-users. One workflow is aimed to verify the quality of the raw sequencing reads using FastQC and FASTX-Toolkit, in order to keep bases with Phred quality Score greater than 20 and to trim the reads with poor quality. Another step aligns the reads to the Ovis aries 3.1 reference genome using BWAmem with standard parameters. The resulting SAM file was converted in BAM file using the SAMtools software, then unmapped and duplicate reads were removed using the CleanSam and MarkDuplicate commands of the Picard software. Therefore, to get more accurate base qualities, Genetic Analysis Tool Kit (GATK) was used to locally realign reads such that the number of mismatching bases due to indels is minimized across all the reads (IndelRealigner) and to detect systematic errors in base quality scores (BaseRecalibrator). In the last workflow SNPs and indelsare identified using mpileup command of SAMtools software. The resulting BCF file is passed to “bcftools view” tool to be filtered and converted into VCF format. Finally, for variants annotation the SNPSift software was used. A total of 6,357,170 variations, of which 5,265,739 SNPs and 1,091,431 indels, were discovered. About 77% of the SNPs were present in the Ovis aries dbSNP v147 while the remaining were novel SNPs. The discovered SNPs must be validated and then could be used to several applications as phylogenic analysis, genome-wide association studies or genomic selection.

Tolone, M., Sardina, M.T., Di Gerlando, R., Mastrangelo, S., Sutera, A.M., Portolano, B. (2017). A pipeline for variants discovery using next-generation DNA sequencing data. ITALIAN JOURNAL OF ANIMAL SCIENCE, 16(1), 151-152.

A pipeline for variants discovery using next-generation DNA sequencing data

TOLONE, Marco;SARDINA, Maria Teresa;DI GERLANDO, Rosalia;MASTRANGELO, Salvatore;SUTERA, Anna Maria;PORTOLANO, Baldassare

2017-01-01

Abstract

Recent advances in next generation sequencing (NGS) technology provide a cost-effective approach to large-scale resequencing of livestock samples in order to study several biological phenomena. NGS produces millions of short DNA sequences that require an unbiased way to make possible comprehensive searches for variation to identify putative causative mutations for economically important traits. The aim of this work was to present a bioinformatics pipeline analysis for variants discovery in ovine genome. A total of 30 individuals belonging to Valle del Belice dairy ewes was used for whole genome sequencing of pooled libraries prepared using Illumina Nextera Kit. Paired-end sequencing was carried out in an 8-lanes flow-cell of the Illumina HiScanSQ platform yielding a total of 1,159,664,912, 101 bp length reads. The left and right raw reads were separated into two files, and converted to the fastq format using CASAVA 1.8. The whole procedure was split in different workflows, in order to give more flexibility to end-users. One workflow is aimed to verify the quality of the raw sequencing reads using FastQC and FASTX-Toolkit, in order to keep bases with Phred quality Score greater than 20 and to trim the reads with poor quality. Another step aligns the reads to the Ovis aries 3.1 reference genome using BWAmem with standard parameters. The resulting SAM file was converted in BAM file using the SAMtools software, then unmapped and duplicate reads were removed using the CleanSam and MarkDuplicate commands of the Picard software. Therefore, to get more accurate base qualities, Genetic Analysis Tool Kit (GATK) was used to locally realign reads such that the number of mismatching bases due to indels is minimized across all the reads (IndelRealigner) and to detect systematic errors in base quality scores (BaseRecalibrator). In the last workflow SNPs and indelsare identified using mpileup command of SAMtools software. The resulting BCF file is passed to “bcftools view” tool to be filtered and converted into VCF format. Finally, for variants annotation the SNPSift software was used. A total of 6,357,170 variations, of which 5,265,739 SNPs and 1,091,431 indels, were discovered. About 77% of the SNPs were present in the Ovis aries dbSNP v147 while the remaining were novel SNPs. The discovered SNPs must be validated and then could be used to several applications as phylogenic analysis, genome-wide association studies or genomic selection.

Scheda breve

Scheda completa

Scheda completa (DC)

	Data
	
				2017
			
	Nome del convegno 
DATO PREVISTO SU LOGINMIUR
	
				XXII ASPA Congress
			
	Luogo del convegno
	
				Perugia
			
	Data del convegno
	
				13-16 Giugno 2017
			
	Numero del convegno
	
				XXII
			
	Titolo del periodico 
DATO PREVISTO SU LOGINMIUR
	
				ITALIAN JOURNAL OF ANIMAL SCIENCE
			
	Citazione
	
				Tolone, M., Sardina, M.T., Di Gerlando, R., Mastrangelo, S., Sutera, A.M., Portolano, B. (2017). A pipeline for variants discovery using next-generation DNA sequencing data. ITALIAN JOURNAL OF ANIMAL SCIENCE, 16(1), 151-152.
			
	Appare nelle tipologie:
	
				1.05 Abstract in atti di convegno pubblicato in rivista

File in questo prodotto:

File	Dimensione	Formato
Book of Abstracts_2017.pdf accesso aperto Descrizione: Proceedings Dimensione 25.69 MB Formato Adobe PDF Visualizza/Apri	25.69 MB	Adobe PDF	Visualizza/Apri
Book+of+Abstracts_2017_compressed.pdf Solo gestori archvio Tipologia: Versione Editoriale Dimensione 3.75 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	3.75 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/241352

Citazioni

ND

ND

ND

social impact