Genome Segments Track Settings
 
Genome Segmentations from ENCODE   (All Regulation tracks)

Display mode:       Reset to defaults   
Filter by Segment Type (select multiple items - help)
Select subtracks by method and cell line: (help)
 All Method ChromHMM  Segway  Combined 
Cell Line
GM12878 (Tier 1) 
H1-hESC (Tier 1) 
K562 (Tier 1) 
HeLa-S3 (Tier 2) 
HepG2 (Tier 2) 
HUVEC (Tier 2) 
List subtracks: only selected/visible    all    ()
  Tier↓1 Cell Line↓2 Method↓3   Track Name↓4  
 
hide
 Configure
 1  GM12878  ChromHMM  GM12878 Genome Segmentation by ChromHMM from ENCODE/Analysis    Data format 
 
hide
 Configure
 1  GM12878  Combined  GM12878 Genome Segmentation by Combined Segway+ChromHMM from ENCODE/Analysis    Data format 
 
hide
 Configure
 1  GM12878  Segway  GM12878 Genome Segmentation by Segway from ENCODE/Analysis    Data format 
 
hide
 Configure
 1  H1-hESC  ChromHMM  H1-hESC Genome Segmentation by ChromHMM from ENCODE/Analysis    Data format 
 
hide
 Configure
 1  H1-hESC  Combined  H1-hESC Genome Segmentation by Combined Segway+ChromHMM from ENCODE/Analysis    Data format 
 
hide
 Configure
 1  H1-hESC  Segway  H1-hESC Genome Segmentation by Segway from ENCODE/Analysis    Data format 
 
hide
 Configure
 2  HeLa-S3  ChromHMM  HeLa-S3 Genome Segmentation by ChromHMM from ENCODE/Analysis    Data format 
 
hide
 Configure
 2  HeLa-S3  Combined  HeLa-S3 Genome Segmentation by Combined Segway+ChromHMM from ENCODE/Analysis    Data format 
 
hide
 Configure
 2  HeLa-S3  Segway  HeLa-S3 Genome Segmentation by Segway from ENCODE/Analysis    Data format 
 
hide
 Configure
 2  HepG2  ChromHMM  HepG2 Genome Segmentation by ChromHMM from ENCODE/Analysis    Data format 
 
hide
 Configure
 2  HepG2  Combined  HepG2 Genome Segmentation by Combined Segway+ChromHMM from ENCODE/Analysis    Data format 
 
hide
 Configure
 2  HepG2  Segway  HepG2 Genome Segmentation by Segway from ENCODE/Analysis    Data format 
 
hide
 Configure
 2  HUVEC  ChromHMM  HUVEC Genome Segmentation by ChromHMM from ENCODE/Analysis    Data format 
 
hide
 Configure
 2  HUVEC  Combined  HUVEC Genome Segmentation by Combined Segway+ChromHMM from ENCODE/Analysis    Data format 
 
hide
 Configure
 2  HUVEC  Segway  HUVEC Genome Segmentation by Segway from ENCODE/Analysis    Data format 
 
hide
 Configure
 1  K562  ChromHMM  K562 Genome Segmentation by ChromHMM from ENCODE/Analysis    Data format 
 
hide
 Configure
 1  K562  Combined  K562 Genome Segmentation by Combined Segway+ChromHMM from ENCODE/Analysis    Data format 
 
hide
 Configure
 1  K562  Segway  K562 Genome Segmentation by Segway from ENCODE/Analysis    Data format 
    
Source data version: ENCODE Jan 2011 Freeze
Assembly: Human Feb. 2009 (GRCh37/hg19)

Overview

This set of tracks represents multivariate genome-segmentation results based on ENCODE data (ENCODE Project Consortium, 2012). Using two different unsupervised machine learning techniques (ChromHMM and Segway), the genome was automatically segmented into disjoint segments. Each segment belongs to one of a few specific genomic "states" which is assigned an intuitive label. Each genomic state represents a particular combination and distribution of different ENCODE functional data tracks such as histone modifications, open chromatin data and specific TF binding data. A consensus unified segmentation was also generated by reconciling results from the individual segmentations. The specific descriptions for each segmentation are listed below.

These segmentations were performed on six human cell types (GM12878, K562, H1-hESC, HeLa-S3, HepG2, and HUVEC), integrating ChIP-seq data for 8 chromatin marks, RNA Polymerase II, the CTCF transcription factor, and input data. In total, twenty-five states were used to segment the genome, and these states were then grouped and colored to highlight predicted functional elements.

Display Conventions and Configuration

The number and type of Segmentation states from the individual segmentations differ, but are unified via grouping by color (10 groups for ChromHMM and Segway, 7 for the Combined). The display can be filtered to selected groups using the 'Filter by Segment Type' control on the track configuration page. Groupings that are not represented in the Combined tracks are marked in the menu with an asterisk.


Combined Segmentations

Description

These tracks display chromatin state segmentations from 6 cell lines, using a consensus merge of the segmentations produced by the ChromHMM and Segway software. In both segmentations, twenty-five states were used to segment the genome, however for ease of comprehension and display, the merged segmentation uses only seven states.

Display Conventions and Configuration

The seven states of the combined segmentation, the candidate annotations and associated segment colors are as follows:

TSS Bright Red Predicted promoter region including TSS
PF Light Red Predicted promoter flanking region
E Orange Predicted enhancer
WE Yellow Predicted weak enhancer or open chromatin cis regulatory element
CTCFBlue CTCF enriched element
T Dark Green Predicted transcribed region
R Gray Predicted Repressed or Low Activity region

Methods

ChIP-seq data from the ENCODE Consortium was used to generate this track, and the ChromHMM and Segway programs were used to perform the segmentation. Methods for the ChromHMM and Segway segmentations are described below.

To form the combined segmentation, for each original segmentation, states that could be grouped together based on similar signal patterns were identified. For the ChromHMM segmentation, the states were grouped manually based on the mean signal values across multiple cell lines. For the Segway segmentations run independently over multiple cell lines, multiple hierarchical clustering techniques were applied across all states in the segmentations to identify the most consistent clustering of states, both across cell lines and with respect to existing biological knowledge. Using these criteria, the Ward clustering on euclidean distances between mean signal scores transformed to the unit interval was chosen to cluster the Segway state labels. Subsequently, pairwise relationships between the ChromHMM and Segway merged states were identified using both overlap calculations and manual annotation (Hoffman, Ernst et al. 2013). Pairs of states that were viewed as concordant were assigned to one of the seven state classes. Regions of the genome occupied by concordant states between the two initial segmentations were reassigned to the new summary labels. In some cases there were combinations of states between the two segmentations that could not be reconciled and these combinations were viewed as discordant. Regions with discordant states were not assigned a state label, and were dropped from the summary combined segmentation.


ChromHMM Segmentations

Description

A common set of states across 6 human cell types were learned by computationally integrating ENCODE ChIP-seq, DNase-seq, and FAIRE-seq data using a Hidden Markov Model (HMM). Twenty-five states were used to segment the genome, and these states were then grouped and colored to highlight predicted functional elements. There are 6 ChromHMM tracks. Each track represents the segmentation results for each of the six cell lines.

A related ChromHMM browser track, Chromatin State Segmentation by HMM from ENCODE/Broad (Broad ChromHMM) (Ernst et. al. 2011) reports segmentations for 9 cell types and is based solely on histone data.

Display Conventions and Configuration

The candidate annotations and associated segment colors are as follows:

Tss, TssFBright RedActive Promoter
PromFLight RedPromoter Flanking
PromPPurpleInactive Promoter
Enh, EnhFOrangeCandidate Strong enhancer
EnhWF, EnhW, DNaseU, DNaseD, FaireWYellowCandidate Weak enhancer/DNase
CtrcfO, CtcfBlueDistal CTCF/Candidate Insulator
Gen5', Elon, ElonW, Gen3', Pol2, H4K20Dark GreenTranscription associated
LowLight Green Low activity proximal to active states
ReprD, Repr, ReprWGrayPolycomb repressed
Quies, ArtLight GrayHeterochromatin/Repetitive/Copy Number Variation

Methods

Data from the ENCODE Consortium was used to generate this track, and the ChromHMM program was used to perform the segmentation. Datasets for 10 factors plus input in 6 cell types were binarized separately at a 200 base pair resolution using a Poisson background model and fold enrichment cut-offs. The chromatin states were learned from this binarized data using a multivariate Hidden Markov Model (HMM) that explicitly models the combinatorial patterns of observed modifications (Ernst and Kellis, 2010). To learn a common set of states across the six cell types, first the genomes were concatenated across the cell types. For each of the six cell types, each 200 base pair interval was then assigned to its most likely state under the model.


Segway Segmentations

Description

Sets of states across 6 human cell types were learned by computationally integrating ENCODE ChIP-seq, DNAse-seq and FAIRE-seq data using a Dynamic Bayesian Network (DBN). Twenty-five states were used to segment the genome (listed below in the Display Conventions and Configuration section by their prefixes - such as PromP for PromP1, PromP2, etc.), and these states were then grouped and colored to highlight predicted functional elements (such as the color purple for an inactive promoter region). There are 6 Segway tracks, each representing the segmentation results for a separate cell line. Not every segmentation state is found in each cell line. If you have further questions about the tracks, please contact the authors listed under the Credits section.

Display Conventions and Configuration

The segment state prefixes, associated colors, and candidate annotations are:

Tss, DnaseDBright RedActive Promoter
TssF, PromFLight RedPromoter Flanking
PromPPurpleInactive Promoter
Enh, EnhF, EnhPr, EnhPOrangeCandidate Strong enhancer
EnhW, EnhWfYellowCandidate Weak enhancer
Ctcf, CtcfOBlueDistal CTCF/Candidate Insulator
Gen3', Gen5', Elon, ElonWDark GreenTranscription associated
LowLight GreenLow activity proximal to active states
ReprGrayPolycomb repressed
QuiescLight GrayHeterochromatin/Repetitive/Copy Number Variation

Methods

Data from the ENCODE Consortium was used to generate this track, and the Segway program was used to perform the segmentation. Data for 10 factors plus input in 6 cell types was converted to real valued signal data using the Wiggler program. Using the ENCODE regions (spanning 1% of the human genome) the chromatin states were learned from this data using a Dynamic Bayesian Network (DBN) (Hoffman, et al. 2012). Models were learned separately for each of the six cell types. For each cell type, the Viterbi algorithm was used to assign genomic regions to individual state labels at single base pair resolution over the entire genome.


Credits

The ChromHMM segmentation was produced at the MIT Computational Biology Group (Kellis lab) by Jason Ernst now at UCLA.

The Segway segmentation was produced at the Noble Research Lab by Michael Hoffman, now at the Princess Margaret Cancer Center, Toronto.

The Combined segmentation was produced at the European Bioinformatics Institute (EMBL-EBI, Flicek team), by Steven Wilder and Ian Dunham, as part of the work of the ENCODE Data Analysis Center (Ewan Birney).

References

ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature. 2012 Sep 6;489(7414):57-74. PMID: 22955616; PMC: PMC3439153

Ernst J, Kellis M. ChromHMM: automating chromatin-state discovery and characterization. Nat Methods. 2012 Feb 28;9(3):215-6. PMID: 22373907; PMC: PMC3577932

Ernst J, Kellis M. Discovery and characterization of chromatin states for systematic annotation of the human genome. Nat Biotechnol. 2010 Aug;28(8):817-25. PMID: 20657582; PMC: PMC2919626

Hoffman MM, Buske OJ, Wang J, Weng Z, Bilmes JA, Noble WS. Unsupervised pattern discovery in human chromatin structure through genomic segmentation. Nat Methods. 2012 Mar 18;9(5):473-6. PMID: 22426492; PMC: PMC3340533

Hoffman MM, Ernst J, Wilder SP, Kundaje A, Harris RS, Libbrecht M, Giardine B, Ellenbogen PM, Bilmes JA, Birney E et al. Integrative annotation of chromatin elements from ENCODE data. Nucleic Acids Res. 2013 Jan;41(2):827-41. PMID: 23221638; PMC: PMC3553955

Data Release Policy

The data used to generate these segmentations are covered by the ENCODE data release policy here, and so were subject to some usage restrictions for a 9 month period. There are no restrictions on the use of the ENCODE segmentation data.