Schema for Rhesus SNVs - Annotated SNVs from the Rhesus Macaque Sequencing Consortium
  Database: rheMac10    Primary Table: rhesusSNVs
VCF File: /gbdb/rheMac10/rhesusSNVs/rheMac10.snvs.vcf.gz
Format description: The fields of a Variant Call Format data line
See the Variant Call Format specification for more details
fielddescription
chromAn identifier from the reference genome
posThe reference position, with the 1st base having position 1
idSemi-colon separated list of unique identifiers where available
refReference base(s)
altComma separated list of alternate non-reference alleles called on at least one of the samples
qualPhred-scaled quality score for the assertion made in ALT. i.e. give -10log_10 prob(call in ALT is wrong)
filterPASS if this position has passed all filters. Otherwise, a semicolon-separated list of codes for filters that fail
infoAdditional information encoded as a semicolon-separated series of short keys with optional comma-separated values
formatIf genotype columns are specified in header, a semicolon-separated list of of short keys starting with GT
genotypesIf genotype columns are specified in header, a tab-separated set of genotype column values; each value is a colon-separated list of values corresponding to keys in the format column

Sample Rows
 
chromposidrefaltqualfilterinfoformatgenotypes
17611:761:G:*,AG*,A190.27PASSAC=7,4;AF=0.007543,0.00431;AN=928;BaseQRankSum=0;DP=1232;ExcessHet=0.1707;FS=18.075;InbreedingCoeff=0.1218;MLEAC=15,9;MLEAF=0.01 ...GT:AD:DP:GQ:PGT:PID:PL:PS0/0:1,0,0:1:3:.:.:0,3,32,3,32,32:.0/0:1,0,0:1:3:.:.:0,3,23,3,23,23:../.:0,0,0:0:.:.:.:0,0,0,0,0,0:../.:0,0,0:0:.:.:.:0,0,0,0,0,0:.0/0:2,0,0:2:6:.:.:0,6,79,6,79,79:.0/0:7,0,0:7:15:.:.:0,15,225,15,225,225:.0/0:2,0,0:2:6:.:.:0,6,90,6,90,90:.0/0:4,0,0:4:12:.:.:0,12,137,12,137,137:.0/0:9,0,0:9:0:.:.:0,0,213,0,213,213:../.:0,0,0:0:.:.:.:0,0,0,0,0,0:....
18811:881:T:G,*TG,*3903.71PASSAC=53,3;AF=0.09,0.005119;AN=586;BaseQRankSum=0.842;DP=1916;ExcessHet=0.0005;FS=6.044;InbreedingCoeff=0.2032;MLEAC=254,7;MLEAF=0. ...GT:AD:DP:GQ:PGT:PID:PL:PS0/0:9,0,0:9:0:.:.:0,0,109,0,109,109:.0/0:24,0,0:24:0:.:.:0,0,155,0,155,155:../.:0,0,0:0:.:.:.:0,0,0,0,0,0:.0|1:3,3,0:6:98:1|0:856_T_G:98,0,117,107,126,233:8560/1:6,7,0:13:36:.:.:36,0,124,53,136,189:../.:0,0,0:0:.:.:.:0,0,0,0,0,0:../.:0,0,0:0:.:.:.:0,0,0,0,0,0:.0/0:4,0,0:4:0:.:.:0,0,75,0,75,75:../.:0,0,0:0:.:.:.:0,0,0,0,0,0:.1/1:0,2,0:2:6:.:.:89,6,0,89,6,89:....
19081:908:T:C,*TC,*2692.73PASSAC=66,17;AF=0.073,0.019;AN=898;BaseQRankSum=0;DP=4408;ExcessHet=-0;FS=5.659;InbreedingCoeff=0.3212;MLEAC=121,47;MLEAF=0.135,0.05 ...GT:AD:DP:GQ:PGT:PID:PL:PS0/0:14,0,0:14:42:.:.:0,42,568,42,568,568:.0/0:35,0,0:35:66:.:.:0,66,1272,66,1272,1272:../.:0,0,0:0:.:.:.:0,0,0,0,0,0:.0/0:10,0,0:10:30:.:.:0,30,401,30,401,401:.0/0:8,0,0:8:24:.:.:0,24,194,24,194,194:../.:1,0,0:1:.:.:.:0,0,0,0,0,0:../.:3,0,0:3:.:.:.:0,0,0,0,0,0:.0/0:10,0,0:10:30:.:.:0,30,313,30,313,313:../.:1,0,0:1:.:.:.:0,0,0,0,0,0:.0/0:5,0,0:5:15:.:.:0,15,148,15,148,148:....
19801:980:T:*,CT*,C83.15PASSAC=2,1;AF=0.001174,0.0005869;AN=1704;BaseQRankSum=0.955;DP=15108;ExcessHet=3.0183;FS=0;InbreedingCoeff=0.0166;MLEAC=2,1;MLEAF=0. ...GT:AD:DP:GQ:PGT:PID:PL:PS0/0:30,0,0:30:90:.:.:0,90,1051,90,1051,1051:.0/0:43,0,0:43:99:.:.:0,105,1575,105,1575,1575:.0/0:15,0,0:15:45:.:.:0,45,548,45,548,548:.0/0:30,0,0:30:87:.:.:0,87,1305,87,1305,1305:.0/0:34,0,0:34:99:.:.:0,99,1485,99,1485,1485:.0/0:20,0,0:20:60:.:.:0,60,557,60,557,557:.0/0:27,0,0:27:81:.:.:0,81,920,81,920,920:.0/0:20,0,0:20:60:.:.:0,60,660,60,660,660:.0/0:35,0,0:35:99:.:.:0,102,1530,102,1530,1530:.0/0:17,0,0:17:48:.:.:0,48,720,48,720,720:....
19811:981:G:C,*GC,*306.81PASSAC=4,2;AF=0.002347,0.001174;AN=1704;BaseQRankSum=0.437;DP=14994;ExcessHet=0.02;FS=0;InbreedingCoeff=0.0961;MLEAC=3,2;MLEAF=0.001 ...GT:AD:DP:GQ:PGT:PID:PL:PS0/0:30,0,0:30:90:.:.:0,90,1051,90,1051,1051:.0/0:43,0,0:43:99:.:.:0,105,1575,105,1575,1575:.0/0:15,0,0:15:45:.:.:0,45,548,45,548,548:.0/0:31,0,0:31:90:.:.:0,90,1350,90,1350,1350:.0/0:34,0,0:34:87:.:.:0,87,933,87,933,933:.0/0:20,0,0:20:60:.:.:0,60,557,60,557,557:.0/0:28,0,0:28:35:.:.:0,35,843,35,843,843:.0/0:20,0,0:20:60:.:.:0,60,660,60,660,660:.0/0:35,0,0:35:99:.:.:0,102,1530,102,1530,1530:.0/0:17,0,0:17:48:.:.:0,48,720,48,720,720:....
19821:982:T:C,*TC,*3336.47PASSAC=36,3;AF=0.021,0.001777;AN=1688;BaseQRankSum=-1.036;DP=15244;ExcessHet=5.8055;FS=18.283;InbreedingCoeff=0.0241;MLEAC=49,3;MLEA ...GT:AD:DP:GQ:PGT:PID:PL:PS0/0:30,0,0:30:78:.:.:0,78,1170,78,1170,1170:.0/0:43,0,0:43:99:.:.:0,105,1575,105,1575,1575:.0/0:15,0,0:15:45:.:.:0,45,548,45,548,548:.0/0:31,0,0:31:90:.:.:0,90,1350,90,1350,1350:.0/0:32,0,0:32:96:.:.:0,96,904,96,904,904:.0/0:20,0,0:20:60:.:.:0,60,557,60,557,557:.0/0:27,0,0:27:81:.:.:0,81,893,81,893,893:.0/0:20,0,0:20:60:.:.:0,60,660,60,660,660:.0/0:32,0,0:32:71:.:.:0,71,872,71,872,872:.0/0:13,0,0:13:0:.:.:0,0,305,0,305,305:....
19841:984:T:A,*TA,*305.79PASSAC=4,2;AF=0.00235,0.001175;AN=1702;BaseQRankSum=1.47;DP=15678;ExcessHet=0.0204;FS=0;InbreedingCoeff=0.0765;MLEAC=3,2;MLEAF=0.001 ...GT:AD:DP:GQ:PGT:PID:PL:PS0/0:30,0,0:30:69:.:.:0,69,1035,69,1035,1035:.0/0:43,0,0:43:99:.:.:0,105,1575,105,1575,1575:.0/0:15,0,0:15:45:.:.:0,45,548,45,548,548:.0/0:31,0,0:31:90:.:.:0,90,1350,90,1350,1350:.0/0:33,0,0:33:99:.:.:0,99,947,99,947,947:.0/0:20,0,0:20:60:.:.:0,60,557,60,557,557:.0/0:27,0,0:27:81:.:.:0,81,893,81,893,893:.0/0:20,0,0:20:60:.:.:0,60,660,60,660,660:.0/0:38,0,0:38:99:.:.:0,111,1106,111,1106,1106:.0/0:17,0,0:17:48:.:.:0,48,720,48,720,720:....
110031:1003:A:G,*AG,*898.87PASSAC=12,3;AF=0.007042,0.001761;AN=1704;BaseQRankSum=-0.854;DP=17267;ExcessHet=3.3284;FS=23.31;InbreedingCoeff=0.02;MLEAC=13,3;MLEA ...GT:AD:DP:GQ:PGT:PID:PL:PS0/0:27,0,0:27:72:.:.:0,72,1080,72,1080,1080:.0/0:43,0,0:43:99:.:.:0,105,1575,105,1575,1575:.0/0:17,0,0:17:42:.:.:0,42,630,42,630,630:.0/0:31,0,0:31:72:.:.:0,72,1080,72,1080,1080:.0/0:37,0,0:37:99:.:.:0,99,1076,99,1076,1076:.0/0:22,0,0:22:66:.:.:0,66,608,66,608,608:.0/0:32,0,0:32:90:.:.:0,90,1350,90,1350,1350:.0/0:20,0,0:20:60:.:.:0,60,660,60,660,660:.0/0:38,0,0:38:89:.:.:0,89,1108,89,1108,1108:.0/0:13,0,0:13:0:.:.:0,0,358,0,358,358:....
111221:1122:C:G,*,ACG,*,A2004.52PASSAC=33,14,1;AF=0.023,0.009655,0.0006897;AN=1450;BaseQRankSum=0;DP=13093;ExcessHet=0;FS=10.338;InbreedingCoeff=0.2538;MLEAC=45,20, ...GT:AD:DP:GQ:PGT:PID:PL:PS0/0:18,0,0,0:18:51:.:.:0,51,765,51,765,765,51,765,765,765:.0/0:32,0,0,0:32:59:.:.:0,59,1104,59,1104,1104,59,1104,1104,1104:../.:2,0,0,0:2:.:.:.:0,0,0,0,0,0,0,0,0,0:.0/0:12,0,0,0:12:33:.:.:0,33,495,33,495,495,33,495,495,495:.0/0:47,0,0,0:47:99:.:.:0,114,1577,114,1577,1577,114,1577,1577,1577:.0/0:2,0,0,0:2:3:.:.:0,3,45,3,45,45,3,45,45,45:.0/0:9,0,0,0:9:0:.:.:0,0,147,0,147,147,0,147,147,147:.0/0:21,3,0,0:24:18:.:.:0,18,767,63,774,819,63,774,819,819:.0/0:29,0,0,0:29:13:.:.:0,13,738,13,738,738,13,738,738,738:.0/0:13,0,0,0:13:39:.:.:0,39,490,39,490,490,39,490,490,490:....
111351:1135:C:T,*CT,*709.14PASSAC=9,13;AF=0.005814,0.008398;AN=1548;BaseQRankSum=1.22;DP=11923;ExcessHet=3.6718;FS=13.122;InbreedingCoeff=0.0696;MLEAC=12,16;ML ...GT:AD:DP:GQ:PGT:PID:PL:PS0/0:16,0,0:16:45:.:.:0,45,675,45,675,675:.0/0:30,0,0:30:81:.:.:0,81,1215,81,1215,1215:.0/0:3,0,0:3:0:.:.:0,0,24,0,24,24:.0/0:10,0,0:10:30:.:.:0,30,390,30,390,390:.0/0:47,0,0:47:99:.:.:0,114,1577,114,1577,1577:../.:0,0,0:0:.:.:.:0,0,0,0,0,0:.0/0:8,0,0:8:18:.:.:0,18,270,18,270,270:.0/0:20,0,0:20:57:.:.:0,57,855,57,855,855:.0/0:26,0,0:26:72:.:.:0,72,1080,72,1080,1080:.0/0:12,0,0:12:33:.:.:0,33,495,33,495,495:....

Rhesus SNVs (rhesusSNVs) Track Description
 

Description

This track shows single nucleotide variants (SNVs), from the Rhesus Macaque Genome Consortium that were sequenced and identified by Jeff Rogers' lab at BCM-HGSC.

Display Conventions

In "dense" mode, a vertical line is drawn at the position of each variant. In "pack" mode, since these variants have been phased, the display shows a clustering of haplotypes in the viewed range, sorted by similarity of alleles weighted by proximity to a central variant. The clustering view can highlight local patterns of linkage.

In the clustering display, each sample's phased diploid genotype is split into two independent haplotypes. Each haplotype is placed in a horizontal row of pixels; when the number of haplotypes exceeds the number of vertical pixels for the track, multiple haplotypes fall in the same pixel row and pixels are averaged across haplotypes.

Each variant is a vertical bar with white (invisible) representing the reference allele and black representing the non-reference allele(s). Tick marks are drawn at the top and bottom of each variant's vertical bar to make the bar more visible when most alleles are reference alleles. The vertical bar for the central variant used in clustering is outlined in purple. In order to avoid long compute times, the range of alleles used in clustering may be limited; alleles used in clustering have purple tick marks at the top and bottom.

The clustering tree is displayed to the left of the main image. It does not represent relatedness of individuals; it simply shows the arrangement of local haplotypes by similarity. When a rightmost branch is purple, it means that all haplotypes in that branch are identical, at least within the range of variants used in clustering.

Methods

All SNV calls are relative to the reference rhesus macaque genome (Mmul_10/rheMac10). Gene models from the Ensembl release 98 merged Ensembl and RefSeq dataset that also includes annotations based on PacBio iso-seq (available here) were used to predict the functional consequences of the SNVs.

Whole-genome sequencing was performed over an eight-year period. Consequently, as technology improved, the sequencing platforms used to generate next-generation sequencing reads for this dataset progressed as follows: Illumina HiSeq 2000, HiSeq Rapid 2500, HiSeq X, and NovaSeq platforms, generating 2 X 100 bp or 2 X 150 bp paired-end reads, as is typical for each platform. All underlying sequence data have been deposited into the SRA (BioProject ID: PRJNA251548).

Reads were aligned to the reference genome (Mmul_10/rheMac10) , which also included the mitochondria genome (NC_005943.1) and had the pseudoautosomal region of chromosome Y masked using BWA-MEM 0.7.12-r1039 (Li and Durbin, 2009; Li, 2013). To identify reads potentially originating from a single fragment of DNA and mark them in the bam files, we used Picard MarkDuplicates version 1.105.

SNVs were called using the Genome Analysis Toolkit (GATK) version 4.1.2.0 (McKenna, et al., 2010) and a VCF file was generated. The hard filters suggested by the developers of GATK (https://software.broadinstitute.org/gatk/documentation/article?id=11097) were applied to the SNVs and all failing SNVs were removed. We then used GATK VariantAnnotator to annotate SNVs applying AlleleBalance. SNVs with an allelic balance for heterozygous calls (ABHet=ref/(ref+alt)) ABHet < 0.2 or ABHet > 0.8 were removed.

The Variant Effect Predictor software from Ensembl (McLaren et al., 2010) was used to predict the functional consequence of SNVs queried against Ensembl release 98 rhesus macaque gene models based on Ensembl and RefSeq gene predictions and including PacBio iso-seq data.

Definitions of consequence types can be found in the VEP documentation.

Credits

Thanks to the Rhesus Macaque Genome Consortium and Jeff Rogers' lab at BCM-HGSC for supplying the data for this track.

References

Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at http://arxiv.org/pdf/1303.3997v2.pdf 2013.

Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009 Jul 15;25(14):1754-60. PMID: 19451168; PMC: PMC2705234

McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010 Sep;20(9):1297-303. PMID: 20644199; PMC: PMC2928508

McLaren W, Pritchard B, Rios D, Chen Y, Flicek P, Cunningham F. Deriving the consequences of genomic variants with the Ensembl API and SNP Effect Predictor. Bioinformatics. 2010 Aug 15;26(16):2069-70. PMID: 20562413; PMC: PMC2916720