gnomAD gnomAD Variants Track Settings
 
Genome Aggregation Database (gnomAD) - Genome and Exome Variants

Track collection: Genome Aggregation Database (gnomAD) - Variants, Coverage, and Constraint

+  Description
+  All tracks in this collection (5)

Display mode:       Reset to defaults

Filters

Exclude variants with Quality/confidence score (QUAL) score less than
Exclude variants with these FILTER values:
 
RF (Failed random forests filters (SNV cutoff 0.4, indels cutoff 0.4))
PASS (All filters passed for at least one of the alleles at that site (see AS_FilterStatus for allele-specific filter status))
InbreedingCoeff (InbreedingCoeff < -0.3)
AC0 (Allele Count is zero (i.e. no high-confidence genotype (GQ >= 20, DP >= 10, AB => 0.2 for het calls)))
Minimum minor allele frequency (if INFO column includes AF or AC+AN):

VCF configuration help

List subtracks: only selected/visible    all  
pack
 gnomAD Exome Variants  Genome Aggregation Database (gnomAD) Exome Variants   Schema 
pack
 gnomAD Genome Variants  Genome Aggregation Database (gnomAD) Genome Variants   Schema 
Data version: Release 2.0.2

Description

The Genome Aggregation Database (gnomAD) - Genome and Exome Variants (gnomAD Variants) tracks show single nucleotide variants (SNVs) and small insertion/deletion variants of <50 nucleotides (indels) from 123,136 exomes and 15,496 whole genomes of unrelated individuals, short variant release 2.0.2. For more information on the processing pipeline and population annotations, see the following blog post and the 2.0.2 README.

There are two subracks in this track set:

  1. gnomAD Exome Variants: short variants of 123,136 exomes, release 2.0.2.
  2. gnomAD Genome Variants: short variants of 15,496 genomes, release 2.0.2.

Display Conventions and Configuration

In mode, a vertical line is drawn at the position of each variant.

In mode, "ref" and "alt" alleles are displayed to the left of a vertical line with colored portions corresponding to allele counts. Hovering the mouse pointer over a variant pops up a display of alleles and counts.

The details for variants include extensive tag=value annotations extracted from the INFO column of gnomAD VCF files, listed under the label "INFO column annotations". The tags beginning with GC list genotype counts in an order that is fairly intuitive when a variant has two alleles: homozygous reference, heterozygous, homozygous alternate. However, when there are multiple alternate alleles, the number of combinations increases and the order of genotypes listed is a bit more complicated. If the alternate alleles are named alt1, alt2, alt3, ... altN, then the order of genotype counts follows this pattern:

ref/ref,
ref/alt1, alt1/alt1,
ref/alt2, alt1/alt2, alt2/alt2,
ref/alt3, alt1/alt3, alt2/alt3, alt3/alt3,
...
ref/altN, alt1/altN, alt2/altN, alt3/altN, ..., altN/altN

Variant filters

For the variant Quality Control (QC) process, a combination of a random forest classifier and hard filters, described below, were used. Filters above can be used to exclude variant sets that failed Random Forest (RF), Inbreeding Coefficient (InbreedingCoeff) and/or Allele Count (AC0) filters, or passed all (PASS) filters. As variant QC was performed on exomes and genomes separately but using the same pipeline, some variants have 2 filter statuses which may be discordant in some cases. There are 144,941 variants that did not pass the quality filters in the exomes data set, but passed the filters in the genomes, and 290,254 variants for the reverse case. Users should just treat them with caution.

A trained, allele-specific random forest classifier was used to build a high quality set of variants. In order to set a threshold for the PASS / RF filter in the release, metrics to determine a cutoff on the random forest model output were used based on: precision / recall against two well characterized samples, number of singleton Mendelian violations in the trios, singleton transition for SNVs and singleton insertion for indels.

For exomes, a RF probability of >= 0.1 for SNVs and >= 0.2 for indels was used for the filtration process. For genomes, a RF probability >= 0.4 for both SNVs and indels was used.

In addition to the random forest filter, all variants falling in low complexity (LCR filter) and segmental duplication (SEGDUP filter) regions, variants with low inbreeding coefficient (excess heterozygotes defined by an inbreeding coefficient < -0.3), and all variants where no individual had a high-quality, non-reference genotype (Genotype Quality (GQ) >= 20, depth (DP) >= 10, and minor alelle balance (AB) > 0.2 for heterozygotes) were also filtered out.

Data Access

The raw data can be explored interactively with the Table Browser or the Data Integrator. For automated analysis, the data may be queried from our REST API or downloaded as files from our download server, subject to the conditions set forth by the gnomAD consortium (see below). Coverage values for the genome are in bigWig files in the coverage/ subdirectory. Variant VCFs can be found in the vcf/ subdirectory.

The data can also be found directly from the gnomAD downloads page. Please refer to our mailing list archives for questions or our Data Access FAQ for more information.

More information about using and understanding the gnomAD data can be found in the gnomAD FAQ site.

Credits

Thanks to the Genome Aggregation Database Consortium for making these data available. The data are released under the ODC Open Database License (OBdL) as described here.

References

Lek M, Karczewski KJ, Minikel EV, Samocha KE, Banks E, Fennell T, O'Donnell-Luria AH, Ware JS, Hill AJ, Cummings BB et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature. 2016 Aug 18;536(7616):285-91. PMID: 27535533; PMC: PMC5018207

Karczewski KJ, Francioli LC, Tiao G, Cummings BB, Alföldi J, Wang Q, Collins RL, Laricchia KM, Ganna A, Birnbaum DP et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature. 2020 May;581(7809):434-443. PMID: 32461654; PMC: PMC7334197

Collins RL, Brand H, Karczewski KJ, Zhao X, Alföldi J, Francioli LC, Khera AV, Lowther C, Gauthier LD, Wang H et al. A structural variation reference for medical and population genetics. Nature. 2020 May;581(7809):444-451. PMID: 32461652; PMC: PMC7334194

Cummings BB, Karczewski KJ, Kosmicki JA, Seaby EG, Watts NA, Singer-Berk M, Mudge JM, Karjalainen J, Satterstrom FK, O'Donnell-Luria AH et al. Transcript expression-aware annotation improves rare variant interpretation. Nature. 2020 May;581(7809):452-458. PMID: 32461655; PMC: PMC7334198