Schema for gnomAD v3.1.1 - Genome Aggregation Database (gnomAD) Genome Variants v3.1.1

JavaScript is disabled in your web browser

You must have JavaScript enabled in your web browser to use the Genome Browser

Database: hg38 Primary Table: gnomadGenomesVariantsV3_1_1 Data last updated: 2024-03-19
Big Bed File Download: /gbdb/hg38/gnomAD/v3.1.1/genomes.bb
Item Count: 759,320,431
The data is stored in the binary BigBed format.

Format description: gnomAD v3.1.1 variant data and gnomAD v3.1 chrM data

field	example	description
`chrom`	chr1	Chromosome (or contig, scaffold, etc.)
`chromStart`	165970949	Start position in chromosome
`chromEnd`	165970950	End position in chromosome
`name`	a0fce8884bfcd197006ee6200ad1fa1b	md5 of data used as a key into external data file
`score`	0	Score from 0-1000
`strand`	.	+ or -
`thickStart`	165970949	Start of where display should be thick (start codon)
`thickEnd`	165970950	End of where display should be thick (stop codon)
`reserved`	95,95,95	Used as itemRgb as of 2004-11-22
`ref`	G	Reference Sequence
`alt`	T	Alternate Sequence
`FILTER`	PASS	FILTER tags from VCF
`AC`	2	Allele Count
`AN`	133976	Allele Number
`AF`	1.49280e-05	Allele Frequency
`faf95`	2.48000e-06	Filtering allele frequency (using Poisson 95% CI) for samples
`nhomalt`	0	Count of homozygous individuals in samples
`rsId`	rs1571083851	dbSnp rsID
`genes`		List of genes affected by variant
`annot`	other	Annotation type: pLoF, missense, synonymous, or other
`variation_type`	intergenic_variant	Variant type(s)
`_startPos`	165970950	Unshifted chromStart position from VCF for link outs
`_displayName`	chr1-165970950-G-T	gnomAD display name
`AC_non_cancer`	Non-Cancer	Is this variant in the All or Non-Cancer subset
`_dataOffset`	511110607480	Offset into gnomad.v3.1.1.details.tab.gz for line with more info
`_dataLen`	888	Length of the line in gnomad.v3.1.1.details.tab.gz

Sample Rows

chrom	chromStart	chromEnd	name	strand	thickStart	thickEnd	reserved	ref	alt	FILTER	AC	AN	AF	faf95	nhomalt	rsId	annot	variation_type	_startPos	_displayName	AC_non_cancer	_dataOffset	_dataLen
chr1	165970949	165970950	a0fce8884bfcd197006ee6200ad1fa1b	.	165970949	165970950	95,95,95	G	T	PASS	2	133976	1.49280e-05	2.48000e-06	0	rs1571083851	other	intergenic_variant	165970950	chr1-165970950-G-T	Non-Cancer	511110607480	888
chr1	165970949	165970950	b7488aabddc2a249ffd7f5d57da7a274	.	165970949	165970950	95,95,95	G	C	AC0,AS_VQSR	0	133970	0.00000	0.00000	0		other	intergenic_variant	165970950	chr1-165970950-G-C	All	511110606607	872
chr1	165970950	165970951	40c6d84102d2af197a3dfd8a6acd5fe6	.	165970950	165970951	95,95,95	G	T	PASS	2	134834	1.48331e-05	2.46000e-06	0		other	intergenic_variant	165970951	chr1-165970951-G-T	Non-Cancer	511110609242	888
chr1	165970950	165970951	59d143dee19206c234ba4de0a41a95f8	.	165970950	165970951	95,95,95	G	C	AC0	0	134832	0.00000	0.00000	0		other	intergenic_variant	165970951	chr1-165970951-G-C	All	511110608369	872
chr1	165970952	165970953	d5e03a552f6f42d1c70e3fc365f3aef0	.	165970952	165970953	95,95,95	G	C	PASS	1	134078	7.45835e-06	0.00000	0		other	intergenic_variant	165970953	chr1-165970953-G-C	Non-Cancer	511110610131	884
chr1	165970953	165970954	90c808f97427945a84a2ad23344e84d4	.	165970953	165970954	95,95,95	C	G	PASS	1	134108	7.45668e-06	0.00000	0		other	intergenic_variant	165970954	chr1-165970954-C-G	Non-Cancer	511110611016	884
chr1	165970953	165970954	c97d521542b7fd0f9f859fd488e0e185	.	165970953	165970954	95,95,95	C	T	AS_VQSR	1	134100	7.45712e-06	0.00000	0	rs1486985498	other	intergenic_variant	165970954	chr1-165970954-C-T	Non-Cancer	511110611901	884
chr1	165970954	165970955	17e10d5ce7f2fb297ea586b0f3f06c02	.	165970954	165970955	95,95,95	G	A	PASS	63938	117230	0.545406	0.541863	17196	rs35625097	other	intergenic_variant	165970955	chr1-165970955-G-A	Non-Cancer	511110612786	957
chr1	165970955	165970956	30e4cf9873e84a93cd81be3c2b4c92da	.	165970955	165970956	95,95,95	C	G	PASS	1	130438	7.66648e-06	0.00000	0		other	intergenic_variant	165970956	chr1-165970956-C-G	Non-Cancer	511110613744	884
chr1	165970956	165970957	33ccf5d99ba3de5813cdb67737646c35	.	165970956	165970957	95,95,95	T	C	AC0	0	131798	0.00000	0.00000	0		other	intergenic_variant	165970957	chr1-165970957-T-C	All	511110614629	872

gnomAD v3.1.1 (gnomadGenomesVariantsV3_1_1) Track Description

Description

The gnomAD v3.1 track shows variants from 76,156 whole genomes (and no exomes), all mapped to the GRCh38/hg38 reference sequence. 4,454 genomes were added to the number of genomes in the previous v3 release. For more detailed information on gnomAD v3.1, see the related blog post.

The gnomAD v3.1.1 track contains the same underlying data as v3.1, but with minor corrections to the VEP annotations and dbSNP rsIDs. On the UCSC side, we have now included the mitochondrial chromosome data that was released as part of gnomAD v3.1 (but after the UCSC version of the track was released). For more information about gnomAD v3.1.1, please see the related changelog.

GnomAD Genome Mutational Constraint is based on v3.1.2 and is available only on hg38. It shows the reduced variation caused by purifying natural selection. This is similar to negative selection on loss-of-function (LoF) for genes, but can be calculated for non-coding regions too. Positive values are red and reflect stronger mutation constraint (and less variation), indicating higher natural selection pressure in a region. Negative values are green and reflect lower mutation constraint (and more variation), indicating less selection pressure and less functional effect. Briefly, for any 1kbp window in the genome, a model based on trinucleotide sequence context, base-level methylation, and regional genomic features predicts expected number of mutations, and compares this number to the observed number of mutations using a Z-score (see preprint in the Reference section for details). The chrX scores were added as received from the authors, as there are no de novo mutation data available on chrX (for estimating the effects of regional genomic features on mutation rates), they are more speculative than the ones on the autosomes.

The gnomAD Predicted Constraint Metrics track contains metrics of pathogenicity per-gene as predicted for gnomAD v2.1.1 and identifies genes subject to strong selection against various classes of mutation. This includes data on both the gene and transcript level.

The gnomAD v2 tracks show variants from 125,748 exomes and 15,708 whole genomes, all mapped to the GRCh37/hg19 reference sequence and lifted to the GRCh38/hg38 assembly. The data originate from 141,456 unrelated individuals sequenced as part of various population-genetic and disease-specific studies collected by the Genome Aggregation Database (gnomAD), release 2.1.1. Raw data from all studies have been reprocessed through a unified pipeline and jointly variant-called to increase consistency across projects. For more information on the processing pipeline and population annotations, see the following blog post and the 2.1.1 README.

gnomAD v2 data are based on the GRCh37/hg19 assembly. These tracks display the GRCh38/hg38 lift-over provided by gnomAD on their downloads site.

For questions on the gnomAD data, also see the gnomAD FAQ.

More details on the Variant type(s) can be found on the Sequence Ontology page.

Display Conventions and Configuration

gnomAD v3.1.1

The gnomAD v3.1.1 track version follows the same conventions and configuration as the v3.1 track, except as noted below.

There is a Non-cancer filter used to exclude/include variants from samples of individuals who were not ascertained for having cancer in a cancer study.
There are additional FILTER field filters: AS_VQSR, indel_stack (chrM only), and npg (chrM only).
Where possible, variants overlapping multiple transcripts/genes have been collapsed into one variant, with additional information available on the details page, which has roughly halved the number of items in the bigBed.
The bigBed has been split into two files, one with the information necessary for the track display, and one with the information necessary for the details page. For more information on this data format, please see the Data Access section below.
The VEP annotation is shown as a table instead of spread across multiple fields.
Intergenic variants have not been pre-filtered.

gnomAD v3.1

By default, a maximum of 50,000 variants can be displayed at a time (before applying the filters described below), before the track switches to dense display mode.

Mouse hover on an item will display many details about each variant, including the affected gene(s), the variant type, and annotation (missense, synonymous, etc).

Clicking on an item will display additional details on the variant, including a population frequency table showing allele count in each sub-population.

Following the conventions on the gnomAD browser, items are shaded according to their Annotation type:

pLoF
Missense
Synonymous
Other

Label Options

To maintain consistency with the gnomAD website, variants are by default labeled according to their chromosomal start position followed by the reference and alternate alleles, for example "chr1-1234-T-CAG". dbSNP rsID's are also available as an additional label, if the variant is present in dbSnp.

Filtering Options

Three filters are available for these tracks:

FILTER: Used to exclude/include variants that failed Random Forest (RF), Inbreeding Coefficient (Inbreeding Coeff), or Allele Count (AC0) filters. The PASS option is used to include/exclude variants that pass all of the RF, InbreedingCoeff, and AC0 filters, as denoted in the original VCF.
Annotation type: Used to exclude/include variants that are annotated as Probability Loss of Function (pLoF), Missense, Synonymous, or Other, as annotated by VEP version 85 (GENCODE v19).
Variant Type: Used to exclude/include variants according to the type of variation, as annotated by VEP v85.

There is one additional configurable filter on the minimum minor allele frequency.

gnomAD v2.1.1

The gnomAD v2.1.1 track follows the standard display and configuration options available for VCF tracks, briefly explained below.

In mode, a vertical line is drawn at the position of each variant.
In mode, "ref" and "alt" alleles are displayed to the left of a vertical line with colored portions corresponding to allele counts. Hovering the mouse pointer over a variant pops up a display of alleles and counts.

Filtering Options

Four filters are available for these tracks, the same as the underlying VCF:

AC0: Allele Count 0 after filtering out low confidence genotypes (GQ < 20; DP < 10; and AB < 0.2 for het calls))
InbreedingCoeff: Inbreeding Coefficient < -0.3
RF: Used to exclude/include variants that failed Random Forest filtering thresholds of 0.055272738028512555, 0.20641025579497013 (probabilities of being a true positive variant) for SNPs, indels)
Pass: Variant passes all 3 filters

There are two additional filters available, one for the minimum minor allele frequency, and a configurable filter on the QUAL score.

UCSC Methods

The gnomAD v3.1.1 data is unfiltered.

For the v3.1 update only, in order to cut down on the amount of displayed data, the following variant types have been filtered out, but are still viewable in the gnomAD browser:

Regulatory Region Variants
Downstream/Upstream Gene Variants
Transcription Factor Binding Site Variants

For the full steps used to create the track at UCSC, please see the section denoted "gnomAD v3.1 update" in the hg38 makedoc.

Data Access

The raw data can be explored interactively with the Table Browser, or the Data Integrator. For automated analysis, the data may be queried from our REST API, and the genome annotations are stored in files that can be downloaded from our download server, subject to the conditions set forth by the gnomAD consortium (see below). Variant VCFs can be found in the vcf/ subdirectory. The v3.1 and v3.1.1 variants can be found in a special directory as they have been transformed from the underlying VCF.

For the v3.1.1 variants in particular, the underlying bigBed only contains enough information necessary to use the track in the browser. The extra data like VEP annotations and CADD scores are available in the same directory as the bigBed but in the files gnomad.v3.1.1.details.tab.gz and gnomad.v3.1.1.details.tab.gz.gzi. The gnomad.v3.1.1.details.tab.gz contains the gzip compressed extra data in JSON format, and the .gzi file is available to speed searching of this data. Each variant has an associated md5sum in the name field of the bigBed which can be used along with the _dataOffset and _dataLen fields to get the associated external data, as show below:

# find item of interest:
bigBedToBed genomes.bb stdout | head -4 | tail -1
chr1    12416    12417    854246d79dc5d02dcdbd5f5438542b6e    [..omitted for brevity..]    chr1-12417-G-A    67293    902

# use the final two fields, _dataOffset and _dataLen (add one to _dataLen to include a newline), to get the extra data:
bgzip -b 67293 -s 903 gnomad.v3.1.1.details.tab.gz
854246d79dc5d02dcdbd5f5438542b6e    {"DDX11L1": {"cons": ["non_coding_transcript_variant",  [..omitted for brevity..]

The data can also be found directly from the gnomAD downloads page. Please refer to our mailing list archives for questions, or our Data Access FAQ for more information.

The mutational constraints score was updated in October 2022 from a previous, now deprecated, pre-publication version. The old version can be found in our archive directory on the download server. It can be loaded by copying the URL into our "Custom tracks" input box.

Credits

Thanks to the Genome Aggregation Database Consortium for making these data available. The data are released under the ODC Open Database License (ODbL) as described here.

References

Karczewski KJ, Francioli LC, Tiao G, Cummings BB, Alfoldi J, Wang Q, Collins RL, Laricchia KM, Ganna A, Birnbaum DP et al. Variation across 141,456 human exomes and genomes reveals the spectrum of loss-of-function intolerance across human protein-coding genes. doi: https://doi.org/10.1101/531210.

Lek M, Karczewski KJ, Minikel EV, Samocha KE, Banks E, Fennell T, O'Donnell-Luria AH, Ware JS, Hill AJ, Cummings BB et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature. 2016 Aug 17;536(7616):285-91. PMID: 27535533; PMC: PMC5018207

Chen S, Francioli L, Goodrich J, Collins R, Wang Q, Alfoldi J, Watts N, Vittal C, Gauthier L, Poterba T, Wilson M A genome-wide mutational constraint map quantified from variation in 76,156 human genomes. Biorxiv 2022