- Genome Browser
- My Data
- About Us
The Gene Haplotype Alleles feature displays the chromosome-phased 1000 Genomes Phase 1 data for protein coding regions. These data comprise the genomes of 1,092 individuals from 14 populations in Africa, Europe, East Asia and the Americas, constructed using a combination of low-coverage whole-genome and exome sequencing.
The variant genotypes have been phased by the 1000 Genomes Project (i.e., the two alleles of each diploid genotype have been assigned to two haplotypes, one inherited from each parent).
Click on any protein-coding gene in the UCSC Genes track and scroll to the Common Gene Haplotype Alleles section. (The feature is currently implemented only on GRCh37/hg19 protein-coding genes.) There will be a table of haplotypes for the protein-coding portion of the gene. Each row in the table represents a unique gene haplotype as found in the 1000 Genomes Phase 1 project data. The table is sortable on any column by clicking on the column headers.
includes rare and synonymous variant sites found in 1000 Genomes subjects in the list of haplotypes.
limits the display to haplotypes defined by common and non-synonymous variation.
displays variant sites (and full sequence) as DNA bases.
displays variant sites (and full sequence) as predicted amino acids. Predicted stop codons are represented by "]" and predicted frameshifts by "[>>]".
The reference variant is shown at the top of each variant site column. This is the value found in the GRCh37/hg19 reference genome at that variant site. In most cases it is a single letter (AA code/DNA base). In the case of an insertion with respect to the reference genome, the reference value is shown as "-". Large deletions are represented by the first two sequence letters followed by "+++".
Hovering the pointer over any of the variant site links will show a more complete description of that variant. For example, the variant description "AA:16 A|T chr9:136137554 SNP: G|A (0.995|0.005) rs55917063" consists of the following elements:
AA:16 A|T - AA residue number and variants (AA view only) chr9:136137554 - genome location SNP: G|A (0.995|0.005) - nucleotide variants and allele frequencies rs55917063 - dbSNP variant name (if one is known)
Clicking on any
non-reference variant shown in the variant sites columns
will link to the full details of that variant site in the 1000 Genomes phase 1 track.
shows the predicted effects of variation on gene sequence for each of the haplotypes. If variant sites are currently displayed as DNA bases, then the predicted DNA sequence is shown (for coding regions only). If variant sites are displayed as amino acids, the predicted protein sequence is shown.
simultaneously shows the DNA sequence above the protein sequence for easy comparison. Showing protein sequence with the DNA triplets is the easiest way to visualize the synonymous variants.
shows the simplified protein sequences view.
hides the full sequence view completely.
Green vertical highlights accentuate the variant sites within the full sequence.
Bold red letters mark the effects of variation. Synonymous changes are only evident when DNA bases are displayed.
Blue vertical highlights show a variant that has been sorted on by clicking its column header. Sorting on a variant can be used to quickly locate one site out of many in the full sequence view.
The AA residue number is shown when hovering over any part of the sequence in amino acid view.
includes all haplotypes. Some large gene models cover many variants and therefore have a very large number of distinct haplotypes represented in the 1000 Genomes project data. If this is the case, only the 100 most frequently occurring haploptyes will be shown in the table, though the true number will be noted.
limits the display to only common haplotypes.
displays the distribution of the haplotypes across the major population groups.
displays the distribution across the more specific 1000 Genomes groups.
changes display from the 1000 Genomes grouping to the major grouping.
hides the population columns.
Each population group is shown as a column in the table, and each row shows the percent of that haplotype that is found in each group. This is not the same as the percent of each group that has the haplotype. Hover over the distribution numbers to show the frequency of occurrence of the haplotype within each group. For example, hovering over 25.7 might show "N=304 of 1183 (found in 71.0% of all ASN)", meaning that of the 1183 occurrences of the haplotype, 304 or 25.7% are found in the ASN group and that 71.0% of all East Asian copies of this gene (in 1000 Genomes phase 1 data) match this haplotype. To see the number of 1000 genomes chromosomes covered for each group, hover over the column header (e.g., ASN will usually show "East Asian [N=572]").
displays all score columns.
hides all score columns.
The numbers listed here are of individuals, but the numbers used in generating the haplotypes table are frequently the number of relevant chromosomes (e.g., 2184 not 1092).
|AMR||Ad Mixed American||181 individuals|
|ASN||East Asian||286 individuals|
|ASW||African Ancestry in Southwest US||61 individuals|
|LWK||Luhya in Webuye, Kenya||97 individuals|
|YRI||Yoruba in Ibadan, Nigeria||88 individuals|
Ad Mixed American:
|CLM||Colombian in Medellin, Colombia||60 individuals|
|MXL||Mexican Ancestry in Los Angeles, California||66 individuals|
|PUR||Puerto Rican in Puerto Rico||55 individuals|
|CHB||Han Chinese in Beijing, China||97 individuals|
|CHS||Han Chinese South||100 individuals|
|JPT||Japanese in Tokyo, Japan||89 individuals|
|CEU||Utah residents with Northern and Western European ancestry||85 individuals|
|FIN||Finnish in Finland||93 individuals|
|Gbr||British in England and Scotland||89 individuals|
|IBS||Iberian populations in Spain||14 individuals|
|TSI||Toscani in Italia||98 individuals|
Scores alone cannot be used to draw definitive conclusions about any haplotype.