Ensembl Build Overviews Track Settings
 
Overview of Ensembl Regulatory Build

Display mode:       Reset to defaults
List subtracks: only selected/visible    all  
 
dense
 Ensembl Reg Build  Ensembl Regulatory annotation of regional function   Data format 
 
dense
 Configure
 TFBS Summary  Summary of Ensembl Transcription Factor Binding Site peaks from all cell types   Data format 
Assembly: Human Feb. 2009 (GRCh37/hg19)

Overviews of the Ensembl Regulatory Build and Transcription Factor Binding


Ensembl Regulatory Build

Description

This track represents the Ensembl Regulatory Annotation of regional function on human.

The Ensembl Regulatory Build provides a genome-wide set of regions that are likely to be involved in gene regulation. These regions are classified into six functional types (see below).

Display Conventions and Configuration

The colours used for each of the functional classification follow the agreed ENCODE segmentation standard:

  •  Bright Red  - Predicted promoters
  •  Light Red  - Predicted promoter flanking regions
  •  Orange  - Predicted enhancers
  •  Blue  - CTCF binding sites
  •  Gold  - Unannotated transcription factor binding sites
  •  Yellow  - Unannotated open chromatin regions

Methods

Segmentation and annotation of segmentation states

We start by running a segmentation across 17 human cell types (A549, DND-41, GM12878, K562, H1-hESC, HepG2, HeLa-S3, HSSM, HSSMtube, HUVEC, Monocytes-CD14+, NH-A, NHDF-AD, NHEK, NHLF and Osteoblasts). Each segmentation annotates the genomes of its designated cell types with a fixed number of states, which are generally identified with a number.

For each state of each segmentation, we create a summary track which represents the number of cell types that have that state at any given base pair of the genome. The overlaps of this summary function with known features (transcription start sites, exons) and experimental features (CTCF binding sites, known chromatin repression marks) are used to assign a preliminary label to that state. For practical purposes, this annotation is manually curated. The labels used are either one of the above functional labels, or non-functional labels (dead,weak or repressed).

Defining the MultiCell regulatory features

We first determine the a cell type independent functional annotation of the genome, referred to as the MultiCell Regulatory Build. This build defines the function of genomic regions.

To determine whether a state is useful in practice, it is compared to the overall density of transcription factor binding (as measured with ChIP-seq). Applying increasing integer cutoffs to this signal, we define progressively smaller regions. If these regions reach a 2 fold enrichment in transcription factor binding signali, then the state is retained for the build. This means that although all states are annotated, not all are used to build the Regulatory Build.

For any given segmentation, we define initial regions. For every functional label, all the state summaries that were assigned that labelled and judged informative are summed into a single function. Using the overall TF binding signal as true signal, we select the threshold which produces the highest F-score.

We then merge the regulatory features across segmentations by annotation.

Some simplifications are applied a posteriori:

  • Distal enhancers which overlap promoter flanking regions are merged into the latter.
  • Promoter flanking regions which overlap transcription start sites are incorporated into the flanking regions of the latter features.

Extra features

In addition to the segmentation states, which are essentially derived from histone marks, we integrate independent experimental evidence:

  • Transcription factor binding sites which were observed through ChIP-seq but are covered by none of the newly defined features are added to the Build.
  • Open chromatin regions which were experimentally observed but covered by none of the above annotations, are also added to the Build.


Summary of transcription factor binding site peaks from all cell types

Description

These tracks display the probability of observing binding for any transcription factor, based on the available ChIP-seq data sets.

Display Conventions and Configuration

The signal track displays the overall probability of binding (between 0 and 1, see Methods).

Methods

Publicly available human transcription factor ChIP-seq data sets were obtained, including from the ENCODE and Roadmap Epigenomics Projects. All data was remapped and peak called using the Ensembl Epigenomic Alignment and Peak Calling Pipeline.

For every transcription factor, the probability of having binding at any position, based on the available data sets, ptf, was calculated as

ptf = number of overlapping peaks / number of data sets
Using all transcription factors, the overall probability of binding, ptfbs, was calculated as:
ptfbs = 1 - Π (1 - ptf)


References

Zerbino DR, Johnson N, Wilder SP, Juettemann T, et al. Ensembl Regulation Resources. (in preparation).

Flicek P, et al. Ensembl 2014. Nucleic Acids Research 2014 Jan;42(Database issue):D749-55.

ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 2012 Sep 6;489(7414):57-74.

Bernstein BE, Stamatoyannopoulos JA, Costello JF, Ren B, et al. The NIH Roadmap Epigenomics Mapping Consortium. Nature Biotechnology 2010 Oct;28(10):1045-8.

Contact

Ensembl Helpdesk