ENCODE Regulation Txn Factor ChIP Track Settings
 
Transcription Factor ChIP-seq Clusters (161 factors) from ENCODE with Factorbook Motifs

Track collection: Integrated Regulation from ENCODE

-  Description

These tracks contain information relevant to the regulation of transcription from the ENCODE project. The Transcription track shows transcription levels assayed by sequencing of polyadenylated RNA from a variety of cell types. The Overlayed H3K4Me1 and Overlayed H3K27Ac tracks show where modification of histone proteins is suggestive of enhancer and, to a lesser extent, other regulatory activity. These histone modifications, particularly H3K4Me1, are quite broad. The actual enhancers are typically just a small portion of the area marked by these histone modifications. The Overlay H3K4Me3 track shows a histone mark associated with promoters. The DNase Clusters track shows regions where the chromatin is hypersensitive to cutting by the DNase enzyme, which has been assayed in a large number of cell types. Regulatory regions, in general, tend to be DNase sensitive, and promoters are particularly DNase sensitive. The Txn Factor ChIP tracks show DNA regions where transcription factors, proteins responsible for modulating gene transcription, bind as assayed by chromatin immunoprecipitation with antibodies specific to the transcription factor followed by sequencing of the precipitated DNA (ChIP-seq).

These tracks complement each other and together can shed much light on regulatory DNA. The histone marks are informative at a high level, but they have a resolution of just ~200 bases and do not provide much in the way of functional detail. The DNase hypersensitive assay is higher in resolution at the DNA level and can be done on a large number of cell types since it's just a single assay. At the functional level, DNase hypersensitivity suggests that a region is very likely to be regulatory in nature, but provides little information beyond that. The transcription factor ChIP assay has a high resolution at the DNA level, and, due to the very specific nature of the transcription factors, is often informative with respect to functional detail. However, since each transcription factor must be assayed separately, the information is only available for a limited number of transcription factors on a limited number of cell lines. Though each assay has its strengths and weaknesses, the fact that all of these assays are relatively independent of each other gives increased confidence when multiple tracks are suggesting a regulatory function for a region.

For additional information please click on the hyperlinks for the individual tracks above. Also note that additional histone marks and transcription information is available in other ENCODE tracks. This integrative Super-track just shows a selection of the most informative data of general interest.

To view the full description, click here.

-  All tracks in this collection (8)
Transcription Transcription Levels Assayed by RNA-seq on 9 Cell Lines from ENCODE
Layered H3K4Me1 H3K4Me1 Mark (Often Found Near Regulatory Elements) on 7 cell lines from ENCODE
Layered H3K4Me3 H3K4Me3 Mark (Often Found Near Promoters) on 7 cell lines from ENCODE
Layered H3K27Ac H3K27Ac Mark (Often Found Near Active Regulatory Elements) on 7 cell lines from ENCODE
DNase Clusters DNaseI Hypersensitivity Clusters in 125 cell types from ENCODE (V3)
Txn Factr ChIP E3 Transcription Factor ChIP-seq Clusters (338 factors, 130 cell types) from ENCODE 3
Txn Factor ChIP Transcription Factor ChIP-seq Clusters (161 factors) from ENCODE with Factorbook Motifs
Txn Fac ChIP V2 Transcription Factor ChIP-seq from ENCODE (V2)

Display mode:   

Filter by factor (select multiple items - help)

Highlight motifs:

Cluster right label: cell count (detected/assayed)   cell abbreviations

-  Cell Abbreviations
SymbolCell Type
1 H1-hESC
A A549
G GM12878
H HeLa-S3
I IMR90
K K562
L HepG2
M MCF-7
S SK-N-SH, SK-N-SH_RA
U HUVEC
a AG04449, AG04450, AG09309, AG09319, AG10803, AoAF
b BE2_C, BJ
c Caco-2
d Dnd41
e ECC-1
f Fibrobl
g GM06990, GM08714, GM10847, GM12801, GM12864, GM12865, GM12872, GM12873, GM12874, GM12875, GM12891, GM12892, GM15510, GM18505, GM18526, GM18951, GM19099, GM19193, GM19238, GM19239, GM19240, Gliobla
h HA-sp, HAc, HBMEC, HCFaa, HCM, HCPEpiC, HCT-116, HEEpiC, HEK293, HEK293-T-REx, HFF, HFF-Myc, HL-60, HMEC, HMF, HPAF, HPF, HRE, HRPEpiC, HSMM, HSMMtube, HVMF
m MCF10A-Er-Src
n NB4, NH-A, NHDF-Ad, NHDF-neo, NHEK, NHLF, NT2-D1
o Osteobl
p PANC-1, PBDE, PBDEFetal, PFSK-1, ProgFib
r RPTEC, Raji
s SAEC, SH-SY5Y, SK-N-MC
t T-47D
u U2OS, U87
w WERI-Rb-1, WI-38
Total: 91
Metadata:
Experiment (Assay) type:Integ Cluster
tableName:wgEncodeRegTfbsClusteredV3
File Name for downloading:wgEncodeRegTfbsClusteredV3.bed.gz
View table schema
Source data version: ENCODE Mar 2012 Freeze
Data last updated at UCSC: 2019-01-10

Description

This track shows regions of transcription factor binding derived from a large collection of ChIP-seq experiments performed by the ENCODE project, together with DNA binding motifs identified within these regions by the ENCODE Factorbook repository.

Transcription factors (TFs) are proteins that bind to DNA and interact with RNA polymerases to regulate gene expression. Some TFs contain a DNA binding domain and can bind directly to specific short DNA sequences ('motifs'); others bind to DNA indirectly through interactions with TFs containing a DNA binding domain. High-throughput antibody capture and sequencing methods (e.g. chromatin immunoprecipitation followed by sequencing, or 'ChIP-seq') can be used to identify regions of TF binding genome-wide. These regions are commonly called ChIP-seq peaks.

ENCODE TFBS ChIP-seq data were processed using the computational pipeline developed by the ENCODE Analysis Working Group to generate uniform peaks of TF binding. Peaks for 161 transcription factors in 91 cell types are combined here into clusters to produce a summary display showing occupancy regions for each factor and motif sites within the regions when identified. Additional views of the underlying ChIP-seq data and documentation on the methods used to generate it are available from the ENCODE Uniform TFBS track.

Display Conventions

A gray box encloses each peak cluster of transcription factor occupancy, with the darkness of the box being proportional to the maximum signal strength observed in any cell line contributing to the cluster. The HGNC gene name for the transcription factor is shown to the left of each cluster. Within a cluster, a green highlight indicates the highest scoring site of a Factorbook-identified canonical motif for the corresponding factor. (NOTE: motif highlights are shown only in browser windows of size 50,000 bp or less, and their display can be suppressed by unchecking the highlight motifs box on the track configuration page). Arrows on the highlight designate the matching strand of the motif.

The cell lines where signal was detected for the factor are identified by single-letter abbreviations shown to the right of the cluster. The darkness of each letter is proportional to the signal strength observed in the cell line. Abbreviations starting with capital letters designate ENCODE cell types identified for intensive study - Tier 1 and Tier 2 - while those starting with lowercase letters designate Tier 3 cell lines.

Click on a peak cluster to see more information about the TF/cell assays contributing to the cluster, the cell line abbreviation table, and details about the highest scoring canonical motif in the cluster.

Methods

Peaks of transcription factor occupancy from uniform processing of ENCODE ChIP-seq data by the ENCODE Analysis Working Group were filtered to exclude datasets that did not pass the integrated quality metric (see "Quality Control" section of Uniform TFBS) and then were clustered using the UCSC hgBedsToBedExps tool. Scores were assigned to peaks by multiplying the input signal values by a normalization factor calculated as the ratio of the maximum score value (1000) to the signal value at one standard deviation from the mean, with values exceeding 1000 capped at 1000. This has the effect of distributing scores up to mean plus one 1 standard deviation across the score range, but assigning all above to the maximum score. The cluster score is the highest score for any peak contributing to the cluster.

The Factorbook motif discovery and annotation pipeline uses the MEME-ChIP and FIMO tools from the MEME software suite in conjunction with machine learning methods and manual curation to merge discovered motifs with known motifs reported in Jaspar and TransFac. Motif identifications reported in Wang et al. 2012 (below) were supplemented in this track with more recent data (derived from newer ENCODE datasets - Jan 2011 through Mar 2012 freezes), provided by the Factorbook team. Motif identifications from all datasets were merged, with the most significant value (qvalue) reported being picked when motifs were duplicated in multiple cell lines. The scores for the selected best-scoring motif sites were then transformed to -log10.

Release Notes

Release 4 (February 2014) of this track adds display of the Factorbook motifs. Release 3 (August 2013) added 124 datasets (690 total, vs. 486 in Release 2), representing all ENCODE TF ChIP-seq passing quality assessment through the ENCODE March 2012 data freeze. The peaks used to generate these clusters were called with less stringent thresholds than used during the January 2011 uniform processing shown in Release 2 of this track. The contributing datasets are displayed as individual tracks in the ENCODE Uniform TFBS track, which is available along with the primary data tracks in the ENC TF Binding Supertrack page. The clustering for V3/V4 is based on the transcription factor target, and so differs from V2 where clustering was based on antibody.

For the V3/V4 releases, a new track table format, 'factorSource' was used to represent the primary clusters table and downloads file, wgEncodeRegTfbsClusteredV3. This format consists of standard BED5 fields (see File Formats) followed by an experiment count field (expCount) and finally two fields containing comma-separated lists. The first list field (expNums) contains numeric identifiers for experiments, keyed to the wgEncodeRegTfbsClusteredInputsV3 table, which includes such information as the experiment's underlying Uniform TFBS table name, factor targeted, antibody used, cell type, treatment (if any), and laboratory source. The second list field (expScores) contains the scores for the corresponding experiments. For convenience, the file downloads directory for this track also contains a BED file, wgEncodeRegTfbsClusteredWithCellsV3, that lists each cluster with the cluster score followed by a comma-separated list of cell types.

The Factorbook motif positions that display as green boxes on the track come from an additional table called factorbookMotifPos, and are supported by additional metadata tables such as factorbookMotifCanonical that connects different terms used for the same factor (RELA <--> NFKB1), and factorbookGeneAlias that connects terms to the the link used at factorbook.org (EGR1 <--> EGR-1), and lastly a position weight matrix table, factorbookMotifPwm, used in building the graphical sequence logo for each motif on the item details page. These tables are available on our public MySQL server and as files on our download server.

Credits

This track shows ChIP-seq data from the Myers Lab at the HudsonAlpha Institute for Biotechnology and by the labs of Michael Snyder, Mark Gerstein, Sherman Weissman at Yale University, Peggy Farnham at the University of Southern California, Kevin Struhl at Harvard, Kevin White at the University of Chicago, and Vishy Iyer at the University of Texas, Austin. These data were processed into uniform peak calls by the ENCODE Analysis Working Group pipeline developed by Anshul Kundaje The clustering of the uniform peaks was performed by UCSC. The Factorbook motif identifications and localizations (and valuable assistance with interpretation) were provided by Jie Wang, Bong Hyun Kim and Jiali Zhuang of the Zlab (Weng Lab) at UMass Medical School.

References

Gerstein MB, Kundaje A, Hariharan M, Landt SG, Yan KK, Cheng C, Mu XJ, Khurana E, Rozowsky J, Alexander R et al. Architecture of the human regulatory network derived from ENCODE data. Nature. 2012 Sep 6;489(7414):91-100. PMID: 22955619

Wang J, Zhuang J, Iyer S, Lin X, Whitfield TW, Greven MC, Pierce BG, Dong X, Kundaje A, Cheng Y et al. Sequence features and chromatin structure around the genomic regions bound by 119 human transcription factors. Genome Res. 2012 Sep;22(9):1798-812. PMID: 22955990; PMC: PMC3431495

Wang J, Zhuang J, Iyer S, Lin XY, Greven MC, Kim BH, Moore J, Pierce BG, Dong X, Virgil D et al. Factorbook.org: a Wiki-based database for transcription factor-binding data generated by the ENCODE consortium. Nucleic Acids Res. 2013 Jan;41(Database issue):D171-6. PMID: 23203885; PMC: PMC3531197

Data Release Policy

While primary ENCODE data was subject to a restriction period as described in the ENCODE data release policy, this restriction does not apply to the integrative analysis results, and all primary data underlying this track have passed the restriction date. The data in this track are freely available.