Schema for HapMap SNPs - HapMap SNPs (rel27, merged Phase II + Phase III genotypes)
  Database: hg19    Primary Table: hapmapSnpsCEU    Row Count: 4,029,798   Data last updated: 2011-01-23
Format description: HapMap genotype summary
On download server: MariaDB table dump directory
fieldexampleSQL type description
bin 585int(10) unsigned Indexing field to speed chromosome range queries.
chrom chr1varchar(255) Chromosome
chromStart 55298int(10) unsigned Start position in chrom (0 based)
chromEnd 55299int(10) unsigned End position in chrom (1 based)
name rs10399749varchar(255) Reference SNP identifier from dbSnp
score 0int(10) unsigned Minor allele frequency normalized (0-500)
strand +enum('+', '-', '?') Which genomic strand contains the observed alleles
observed C/Tvarchar(255) Observed string from genotype file
allele1 Cenum('A', 'C', 'G', 'T') This allele has been observed
homoCount1 84int(10) unsigned Count of individuals who are homozygous for allele1
allele2 Tenum('C', 'G', 'T', 'none') This allele may not have been observed
homoCount2 0int(10) unsigned Count of individuals who are homozygous for allele2
heteroCount 0int(10) unsigned Count of individuals who are heterozygous

Connected Tables and Joining Fields
        hg19.hapmapAllelesChimp.name (via hapmapSnpsCEU.name)
      hg19.hapmapAllelesMacaque.name (via hapmapSnpsCEU.name)
      hg19.hapmapPhaseIIISummary.name (via hapmapSnpsCEU.name)
      hg19.hapmapSnpsCHB.name (via hapmapSnpsCEU.name)
      hg19.hapmapSnpsJPT.name (via hapmapSnpsCEU.name)
      hg19.hapmapSnpsYRI.name (via hapmapSnpsCEU.name)

Sample Rows
 
binchromchromStartchromEndnamescorestrandobservedallele1homoCount1allele2homoCount2heteroCount
585chr15529855299rs103997490+C/TC84T00
585chr18257082571rs40303030+A/GA0G900
585chr18265182652rs40303000+A/CA0C850
585chr18816888169rs9405500+C/TC0T900
585chr19160491605rs133287140+C/TC89T00
586chr1232213232214rs114909370+A/GA0G900
589chr1534582534583rs66834660+C/GC90G00
589chr1546696546697rs120259280+A/GA0G890
589chr1564476564477rs665010412+A/GA160G04
589chr1564531564532rs112407810+A/GA0G890

Note: all start coordinates in our database are 0-based, not 1-based. See explanation here.

HapMap SNPs (hapmapSnps) Track Description
 

Description

The HapMap Project identified a set of approximately four million common SNPs, and genotyped these SNPs in four populations in Phase II of the project. In Phase III, it genotyped approximately 1.4 to 1.5 million SNPs in eleven populations. This track shows the combined data from Phases II and III. The intent is that this data can be used as a reference for future studies of human disease. This track displays the genotype counts and allele frequencies of those SNPs, and (when available) shows orthologous alleles from the chimp and macaque reference genome assemblies.

The four million HapMap Phase II SNPs were genotyped on individuals from these four human populations:

Phase III expanded to eleven populations: the four above, plus the following: Each of the populations is displayed in a separate subtrack.

The HapMap assays provide biallelic results. Over 99.8% of HapMap SNPs are described as biallelic in dbSNP build 129; approximately 6,800 are described as more complex types (in-del, mixed, etc). 70% of the HapMap SNPs are transitions: 35% are A/G, 35% are C/T.

The orthologous alleles in chimp (panTro2) and macaque (rheMac2) were derived using liftOver.

No two HapMap SNPs occupy the same position. Aside from 430 SNPs from the pseudoautosomal region of chrX and chrY, no SNP is mapped to more than one location in the reference genome. No HapMap SNPs occur on "random" chromosomes (concatenations of unordered and unoriented contigs).

Display Conventions and Configuration

Note: calculation of heterozygosity has changed since the Phase II (rel22) version of this track. Observed heterozygosity is calculated as follows: each population's heterozygosity is computed as the proportion of heterozygous individuals in the population. The population heterozygosities are averaged to determine the overall observed heterozygosity. [For Phase II genotypes, expected heterozygosity was calculated as follows: the allele counts from all populations were summed (not normalized for population size) and used to determine overall major and minor allele frequencies. Assuming Hardy-Weinberg equilibrium, overall expected heterozygosity was calculated as two times the product of major and minor allele frequencies (see Modern Genetic Analysis, section 17-2).]

The human SNPs are displayed in gray using a color gradient based on minor allele frequency. The higher the minor allele frequency, the darker the display. By definition, the maximum minor allele frequency is 50%. When zoomed to base level, the major allele is displayed for each population.

The orthologous alleles from chimp and macaque are displayed in brown using a color gradient based on quality score. Quality scores range from 0 to 100 representing low to high quality. For orthologous alleles, the higher the quality, the darker the display. Quality scores are not available for chimp chromosomes chr21 and chrY; these were set to 98, consistent with the panTro2 browser quality track.

Filters are provided for the data attributes described above. Additionally, a filter is provided for observed heterozgosity (average of all populations' observed heterozygosities). Filters are applied to all subtracks, even if a subtrack is not displayed.

Notes on orthologous allele filters:

  • If a SNP's major allele is different between populations, no overall major allele for human is determined, thus the "matches major human allele" and "matches minor human allele" filters for orthologous alleles do not apply.
  • If a SNP is monomorphic in all populations, the minor allele is not verified in the HapMap dataset. In these cases, the filter to match orthologous alleles to the minor human allele will yield no results.

Credits

This track is based on International HapMap Project release 27 data, provided by the HapMap Data Coordination Center.

References

HapMap Project

The International HapMap Consortium. A second generation human haplotype map of over 3.1 million SNPs. Nature. 2007 Oct 18;449(7164):851-61.

The International HapMap Consortium. A haplotype map of the human genome. Nature. 2005 Oct 27;437(7063):1299-320.

The International HapMap Consortium. The International HapMap Project. Nature. 2003 Dec 18;426(6968):789-96.

HapMap Data Coordination Center

Thorisson GA, Smith AV, Krishnan L, Stein LD. The International HapMap Project Web site. Genome Res. 2005 Nov;15(11):1592-3.

A Sampling of HapMap Literature

Gibson J, Morton NE, Collins A. Extended tracts of homozygosity in outbred human populations. Hum Mol Genet. 2006 Mar 1; 15(5):789-95.

Redon R, Ishikawa S, Fitch KR, Feuk L, Perry GH, Andrews TD, Fiegler H, Shapero MH, Carson AR, Chen W et al. Global variation in copy number in the human genome. Nature. 2006 Nov 23;444(7118):444-454.

Spielman RS, Bastone LA, Burdick JT, Morley M, Ewens WJ, Cheung VG. Common genetic variants account for differences in gene expression among ethnic groups. Nature Genet. 2007 Feb;39(2):226-31.

Tenesa A, Navarro P, Hayes BJ, Duffy DL, Clarke GM, Goddard ME, Visscher PM. Recent human effective population size estimated from linkage disequilibrium. Genome Res. 2007 Apr;17(4):520-6.

Voight BF, Kudaravalli S, Wen X, Pritchard JK. A Map of Recent Positive Selection in the Human Genome. PLoS Biol. 2006 Mar;4(3):e72.

Weir BS, Cardon LR, Anderson AD, Nielsen DM, Hill WG. Measures of human population structure show heterogeneity among genomic regions. Genome Res. 2005 Nov;15(11):1468-76.

Data Source

The genotypes_chr*_*_r27_nr.b36_fwd.txt.gz files from the HapMap FTP site were processed to make this track.