RepeatMasker Viz. Track Settings
 
Detailed Visualization of RepeatMasker Annotations   (All Repeats tracks)

Display mode:       Reset to defaults
List subtracks: only selected/visible    all  
hide
 RepeatMasker Viz.  RepeatMasker v4.0.7 Dfam_2.0 : Current Dataset   Data format 
hide
 RepeatMasker Viz.  RepeatMasker v3.0.1 db20100302 : Browser Baseline Dataset   Data format 
Assembly: Human Dec. 2013 (GRCh38/hg38)

Description

This track was created using Arian Smit's RepeatMasker program, which screens DNA sequences for interspersed repeats and low complexity DNA sequences. The program outputs a detailed annotation of the repeats that are present in the query sequence (represented by this track), as well as a modified version of the query sequence in which all the annotated repeats have been masked (generally available on the Downloads page). RepeatMasker uses a separately curated version of the Repbase Update repeat library from the Genetic Information Research Institute (GIRI). Repbase Update is described in Jurka (2000) in the References section below.

Alternatively, RepeatMasker can use the new Dfam database of repeat profile HMMs. Profile HMMs provide a richer description of the repeat families and when used with RepeatMasker + nhmmer provide a more sensitive approach to identifying repeats. Dfam is described in Wheeler et al. (2012) in the References section below.

Display Conventions and Configuration

In dense display mode, a single line is displayed denoting the coverage of repeats using a series of black boxes.

In full display mode, the track view is controlled by the scale of the view. At scales between 10 Mb and 30 kb, this track displays up to ten different classes of repeats (see below) one class per line. The repeat ranges are denoted as grayscale boxes, reflecting both the size of the repeat and the amount of base mismatch, base deletion, and base insertion associated with a repeat element. The higher the combined number of these, the lighter the shading.

In full display mode and at scales less than 30 kb, a new detailed display mode is used. Repeats are displayed as arrow boxes, indicating the size and orientation of the repeat. The interior grayscale shading represents the divergence of the repeat (see above) while the outline color represents the class of the repeat. Dotted lines above the repeat and extending left or right indicate the length of unaligned repeat consensus sequence. If the length of the unaligned sequence is large, a double interruption line is used to indicate that the unaligned sequence is not to scale.

For example, the following repeat is a SINE element in the forward orientation with average divergence. Only the 5' proximal fragment of the consensus sequence is aligned to the genome. The 3' unaligned length (384bp) is not drawn to scale and is instead displayed using a set of interruption lines along with the length of the unaligned sequence.

Layer 1 384

Repeats that have been fragmented by insertions or large internal deletions are now represented by join lines. In the example below, a LINE element is found as two fragments. The solid connection lines indicate that there are no unaligned consensus bases between the two fragments. Also note these fragments represent the end of the repeat, as there is no unaligned consensus sequence following the last fragment.

Layer 1

In cases where there is unaligned consensus sequence between the fragments, the repeat will look like the following. The dotted line indicates the length of the unaligned sequence between the two fragments. In this case the unaligned consensus is longer than the actual genomic distance between these two fragments.

Layer 1

If there is consensus overlap between the two fragments, the joining lines will be drawn to indicate how much of the left fragment is repeated in the right fragment.

Layer 1

The following table lists the repeat class colors:

Color Repeat Class
SINE - Short Interspersed Nuclear Element
LINE - Long Interspersed Nuclear Element
LTR - Long Terminal Repeat
DNA - DNA Transposon
Simple - Single Nucleotide Stretches and Tandem Repeats
Low_complexity - Low Complexity DNA
Satellite - Satellite Repeats
RNA - RNA Repeats (including RNA, tRNA, rRNA, snRNA, scRNA, srpRNA)
Other - Other Repeats (including class RC - Rolling Circle)
Unknown - Unknown Classification

A "?" at the end of the "Family" or "Class" (for example, DNA?) signifies that the curator was unsure of the classification. At some point in the future, either the "?" will be removed or the classification will be changed.

Methods

UCSC has used the most current versions of the RepeatMasker software and repeat libraries available to generate these data. Note that these versions may be newer than those that are publicly available on the Internet.

Data are generated using the RepeatMasker -s flag. Additional flags may be used for certain organisms. Repeats are soft-masked. Alignments may extend through repeats, but are not permitted to initiate in them. See the FAQ for more information.

Credits

Thanks to Arian Smit, Robert Hubley and GIRI for providing the tools and repeat libraries used to generate this track.

References

Smit AFA, Hubley R, Green P. RepeatMasker Open-3.0. http://www.repeatmasker.org. 1996-2010.

Dfam is described in:

Wheeler TJ, Clements J, Eddy SR, Hubley R, Jones TA, Jurka J, Smit AF, Finn RD. Dfam: a database of repetitive DNA based on profile hidden Markov models. Nucleic Acids Res. 2013 Jan;41(Database issue):D70-82. PMID: 23203985; PMC: PMC3531169

Repbase Update is described in:

Jurka J. Repbase Update: a database and an electronic journal of repetitive elements. Trends Genet. 2000 Sep;16(9):418-420. PMID: 10973072

For a discussion of repeats in mammalian genomes, see:

Smit AF. Interspersed repeats and other mementos of transposable elements in mammalian genomes. Curr Opin Genet Dev. 1999 Dec;9(6):657-63. PMID: 10607616

Smit AF. The origin of interspersed repeats in the human genome. Curr Opin Genet Dev. 1996 Dec;6(6):743-8. PMID: 8994846