The Genome Aggregation Database (gnomAD) - Genome and Exome Variants (gnomAD Variants)
tracks show single nucleotide variants (SNVs) and small insertion/deletion variants of <50
nucleotides (indels) from 123,136 exomes and 15,496 whole genomes of unrelated individuals, short
variant release 2.0.2. For more information on the processing pipeline and population annotations,
see the following
and the 2.0.2 README.
There are two subracks in this track set:
gnomAD Exome Variants: short variants of 123,136 exomes, release 2.0.2.
gnomAD Genome Variants: short variants of 15,496 genomes, release 2.0.2.
Display Conventions and Configuration
In mode, a vertical line is drawn at the position of each
In mode, "ref" and "alt" alleles are
displayed to the left of a vertical line with colored portions corresponding to allele counts.
Hovering the mouse pointer over a variant pops up a display of alleles and counts.
The details for variants include extensive tag=value annotations extracted from the
INFO column of gnomAD VCF files, listed under the label "INFO column annotations".
The tags beginning with GC list genotype counts in an order that is
fairly intuitive when a variant has two alleles:
homozygous reference, heterozygous, homozygous alternate.
However, when there are multiple alternate alleles, the number of combinations increases
and the order of genotypes listed is a bit more complicated. If the alternate alleles
are named alt1, alt2, alt3, ... altN, then the order of genotype counts follows this pattern:
For the variant Quality Control (QC) process, a combination of a random forest classifier and hard
filters, described below, were used. Filters above can be used to exclude variant sets that failed
Random Forest (RF), Inbreeding Coefficient (InbreedingCoeff) and/or Allele Count
(AC0) filters, or passed all (PASS) filters. As variant QC was performed on exomes and genomes
separately but using the same pipeline, some variants have 2 filter statuses which may be discordant
in some cases. There are 144,941 variants that did not pass the quality filters in the exomes data
set, but passed the filters in the genomes, and 290,254 variants for the reverse case. Users should
just treat them with caution.
A trained, allele-specific random forest classifier was used to build a high quality set of variants.
In order to set a threshold for the PASS / RF filter in the release, metrics to determine a cutoff
on the random forest model output were used based on: precision / recall against two well
characterized samples, number of singleton Mendelian violations in the trios, singleton transition
for SNVs and singleton insertion for indels.
For exomes, a RF probability of >= 0.1 for SNVs and >= 0.2 for indels was used for the
filtration process. For genomes, a RF probability >= 0.4 for both SNVs and indels was used.
In addition to the random forest filter, all variants falling in low complexity (LCR filter) and
segmental duplication (SEGDUP filter) regions, variants with low inbreeding coefficient (excess
heterozygotes defined by an inbreeding coefficient < -0.3), and all variants where no individual
had a high-quality, non-reference genotype (Genotype Quality (GQ) >= 20, depth (DP) >= 10,
and minor alelle balance (AB) > 0.2 for heterozygotes) were also filtered
The raw data can be explored interactively with the
Table Browser or the Data Integrator. For
automated analysis, the data may be queried from our REST API or downloaded as files from our download server, subject
to the conditions set forth by the gnomAD consortium (see below). Coverage values
for the genome are in bigWig files in
the coverage/ subdirectory. Variant VCFs can be found in the vcf/ subdirectory.