ENCODE FAIRE Track Settings

JavaScript is disabled in your web browser

You must have JavaScript enabled in your web browser to use the Genome Browser

ENCODE FAIRE-seq Peaks and Signal of Open Chromatin based on Uniform processing pipeline

Maximum display mode: Reset to defaults

Select view (Help):

Peaks ▾

Signal ▾

Select subtracks by lab and cell line:

All	*Lab*	UNC
*Cell Line*
All
GM12878
H1-hESC
K562
HeLa-S3
HepG2
HUVEC
A549
Astrocytes
Gliobla
GM12891
GM12892
GM18507
GM19239
HTR8svn
Medullo
NHBE
NHEK
PanIslets
Urothelia

Select subtracks further by: (select multiple categories and items - help)

Tier:

List subtracks: only selected/visible all ()

view^↓1

Tier^↓2

Cell Line^↓3

Lab^↓4

Track Name^↓5

dense

Configure

Peaks

3

A549

UNC

A549 Combined FAIRE-seq Peak Calls

Data format

dense

Configure

Signal

3

A549

UNC

A549 Combined FAIRE-seq Signal

Data format

dense

Configure

Peaks

3

Astrocytes

UNC

Astrocytes Combined FAIRE-seq Peak Calls

Data format

dense

Configure

Signal

3

Astrocytes

UNC

Astrocytes Combined FAIRE-seq Signal

Data format

dense

Configure

Peaks

3

Gliobla

UNC

Glioblastoma Combined FAIRE-seq Peak Calls

Data format

dense

Configure

Signal

3

Gliobla

UNC

Glioblastoma Combined FAIRE-seq Signal

Data format

dense

Configure

Peaks

1

GM12878

UNC

GM12878 Combined FAIRE-seq Peak Calls

Data format

dense

Configure

Signal

1

GM12878

UNC

GM12878 Combined FAIRE-seq Signal

Data format

dense

Configure

Peaks

3

GM12891

UNC

GM12891 Combined FAIRE-seq Peak Calls

Data format

dense

Configure

Signal

3

GM12891

UNC

GM12891 Combined FAIRE-seq Signal

Data format

dense

Configure

Peaks

3

GM12892

UNC

GM12892 Combined FAIRE-seq Peak Calls

Data format

dense

Configure

Signal

3

GM12892

UNC

GM12892 Combined FAIRE-seq Signal

Data format

dense

Configure

Peaks

3

GM18507

UNC

GM18507 Combined FAIRE-seq Peak Calls

Data format

dense

Configure

Signal

3

GM18507

UNC

GM18507 Combined FAIRE-seq Signal

Data format

dense

Configure

Peaks

3

GM19239

UNC

GM19239 Combined FAIRE-seq Peak Calls

Data format

dense

Configure

Signal

3

GM19239

UNC

GM19239 Combined FAIRE-seq Signal

Data format

dense

Configure

Peaks

1

H1-hESC

UNC

H1-hESC Combined FAIRE-seq Peak Calls

Data format

dense

Configure

Signal

1

H1-hESC

UNC

H1-hESC Combined FAIRE-seq Signal

Data format

dense

Configure

Peaks

2

HeLa-S3

UNC

HeLa-S3 Combined FAIRE-seq Peak Calls

Data format

dense

Configure

Signal

2

HeLa-S3

UNC

HeLa-S3 Combined FAIRE-seq Signal

Data format

dense

Configure

Peaks

2

HeLa-S3

UNC

HeLa-S3 with Interferon Alpha Treatment (4 hours) Combined FAIRE-seq Peak Calls

Data format

dense

Configure

Signal

2

HeLa-S3

UNC

HeLa-S3 with Interferon Alpha Treatment (4 hours) Combined FAIRE-seq Signal

Data format

dense

Configure

Peaks

2

HeLa-S3

UNC

HeLa-S3 with Interferon Gamma Treatment (4 hours) Combined FAIRE-seq Peak Calls

Data format

dense

Configure

Signal

2

HeLa-S3

UNC

HeLa-S3 with Interferon Gamma Treatment (4 hours) Combined FAIRE-seq Signal

Data format

dense

Configure

Peaks

2

HepG2

UNC

HepG2 Combined FAIRE-seq Peak Calls

Data format

dense

Configure

Signal

2

HepG2

UNC

HepG2 Combined FAIRE-seq Signal

Data format

dense

Configure

Peaks

3

HTR8svn

UNC

HTR8svn Trophoblast Combined FAIRE-seq Peak Calls

Data format

dense

Configure

Signal

3

HTR8svn

UNC

Trophoblast HTR8svn Combined FAIRE-seq Signal

Data format

dense

Configure

Peaks

2

HUVEC

UNC

HUVEC Combined FAIRE-seq Peak Calls

Data format

dense

Configure

Signal

2

HUVEC

UNC

HUVEC Combined FAIRE-seq Signal

Data format

dense

Configure

Peaks

1

K562

UNC

K562 Combined FAIRE-seq Peak Calls

Data format

dense

Configure

Signal

1

K562

UNC

K562 Combined FAIRE-seq Signal

Data format

dense

Configure

Peaks

1

K562

UNC

K562 with Hydroxyurea Treatment Combined FAIRE-seq Peak Calls

Data format

dense

Configure

Signal

1

K562

UNC

K562 with Hydroxyurea Treatment Combined FAIRE-seq Signal

Data format

dense

Configure

Peaks

1

K562

UNC

K562 with Sodium Butyrate Treatment Combined FAIRE-seq Peak Calls

Data format

dense

Configure

Signal

1

K562

UNC

K562 with Sodium Butyrate Treatment Combined FAIRE-seq Signal

Data format

dense

Configure

Peaks

3

Medullo

UNC

Medullo Combined FAIRE-seq Peak Calls

Data format

dense

Configure

Signal

3

Medullo

UNC

Medullo Combined FAIRE-seq Signal

Data format

dense

Configure

Peaks

3

NHBE

UNC

NHBE Combined FAIRE-seq Peak Calls

Data format

dense

Configure

Signal

3

NHBE

UNC

NHBE Combined FAIRE-seq Signal

Data format

dense

Configure

Peaks

3

NHEK

UNC

NHEK Combined FAIRE-seq Peak Calls

Data format

dense

Configure

Signal

3

NHEK

UNC

NHEK Combined FAIRE-seq Signal

Data format

dense

Configure

Peaks

3

PanIslets

UNC

PanIslets Combined FAIRE-seq Peak Calls

Data format

dense

Configure

Signal

3

PanIslets

UNC

PanIslets Combined FAIRE-seq Signal

Data format

dense

Configure

Peaks

All

UNC

Union of FAIRE-seq Peak Calls Across 25 Whole Cell Types

Data format

dense

Configure

Peaks

3

Urothelia

UNC

Urothelia Combined FAIRE-seq Peak Calls

Data format

dense

Configure

Signal

3

Urothelia

UNC

Urothelia Combined FAIRE-seq Signal

Data format

dense

Configure

Peaks

3

Urothelia

UNC

Urothelia with UT189 Treatment Combined FAIRE-seq Peak Calls

Data format

dense

Configure

Signal

3

Urothelia

UNC

Urothelia with UT189 Treatment Combined FAIRE-seq Signal

Data format

Assembly: Human Feb. 2009 (GRCh37/hg19)

Note: ENCODE Project

Description

These tracks display Formaldehyde-Assisted Isolation of Regulatory Elements (FAIRE) evidence produced as part of the ENCODE Project Consortium (ENCODE Project Consortium, 2012). FAIRE is a method to isolate and identify nucleosome-depleted regions of the genome. FAIRE was initially discovered in yeast and subsequently shown to identify active regulatory elements in human cells (Giresi et al., 2007). Similar to DNaseI HS, FAIRE appears to identify functional regulatory elements that include promoters, enhancers, silencers, insulators, locus control regions and novel elements.

Together with DNaseI HS and ChIP-seq experiments, these tracks display the locations of active regulatory elements identified as open chromatin in multiple cell types from the Duke, UNC-Chapel Hill, UT-Austin, and EBI ENCODE group. Within this project, open chromatin was identified using two independent and complementary methods: DNaseI hypersensitivity (HS) and these FAIRE assays, combined with chromatin immunoprecipitation (ChIP) for select regulatory factors. DNaseI HS and FAIRE provide assay cross-validation with commonly identified regions delineating the highest confidence areas of open chromatin. ChIP assays provide functional validation and preliminary annotation of a subset of open chromatin sites. Each method employed Illumina (formerly Solexa) sequencing by synthesis as the detection platform. The Tier 1 and Tier 2 cell types were additionally verified by a second platform, high-resolution 1% ENCODE tiled microarrays supplied by NimbleGen.

Methods

Cells were grown according to the approved ENCODE cell culture protocols.

FAIRE was performed (Giresi et al., 2007) by cross-linking proteins to DNA using 1% formaldehyde solution, and the complex was sheared using sonication. Phenol/chloroform extractions were performed to remove DNA fragments cross-linked to protein. The DNA recovered in the aqueous phase was sequenced using an Illumina (Solexa) sequencing system. FAIRE-seq data for Tier 1 and Tier 2 cell lines were verified by comparing multiple independent growths (replicates) and determining the reproducibility of the data. For some cell types, additional verification was performed using the same material but hybridized to NimbleGen Human ENCODE tiling arrays (1% of the genome) along with the input DNA as reference (FAIRE-chip). A more detailed protocol is available here. Also see Giresi et al., 2009.

DNA fragments isolated by FAIRE are 100-200 bp in length, with the average length being 134 bp. Sequences from each experiment were aligned to the genome using BWA (Li et al., 2010) for the GRCh37 (hg19) assembly.

The command used for these alignments was:: > bwa aln -t 8 genome.fa s_1.sequence.txt.bfq > s_1.sequence.txt.sai

Where genome.fa is the whole genome sequence and s_1.sequence.txt.bfq is one lane of sequences converted into the required bfq format.

Sequences from multiple lanes are combined for a single replicate using the bwa samse command, and converted in the sam/bam format using SAMtools.

Only those that aligned to 4 or fewer locations were retained. Other sequences were also filtered based on their alignment to problematic regions (such as satellites and rRNA genes - see supplemental materials). The mappings of these short reads to the genome are available for download.

The resulting digital signal was converted to a continuous wiggle track using F-Seq that employs Parzen kernel density estimation to create base pair scores (Boyle et al., 2008b). Input data was generated for several cell lines. These were used directly to create a control/background model used for F-Seq when generating signal annotations for these cell lines. These models were meant to correct for sequencing biases, alignment artifacts, and copy number changes in these cell lines. Input data was not generated directly for other cell lines. Instead, a general background model was derived from the available Input data sets. This provided corrections for sequencing biases and alignment artifacts, but did not correct for cell type specific copy number changes.

The exact command used for this step was:: > fseq -l 800 -v -b <bff files> -p <iff files> aligments.bed

Where the (bff files) are the background files based on alignability, the (iff files) are the background files based on the Input experiments, and alignments.bed are a bed file of filtered sequence alignments.

Discrete FAIRE sites (peaks) were identified from the FAIRE-seq F-seq density signal. Significant regions were determined by fitting the data to a gamma distribution to calculate p-values. Contiguous regions where p-values were below a 0.1 threshold were considered significant.

Uniform signal was generated by processing the aligned reads using the align2rawsignal "Wiggler" software (see http://code.google.com/p/align2rawsignal for details and settings). The method accounts for the depth of sequencing, the mappability of the genome (based on read length and ambiguous bases) and different fragment length shifts for the different datasets being combined. It also differentiates between positions that showed zero signal simply because they are unmappable and positions that are mappable but have no reads.

Data from the high-resolution 1% ENCODE tiled microarrays supplied by NimbleGen were normalized using the Tukey biweight normalization, and peaks were called using ChIPOTle (Buck et al., 2005) at multiple levels of significance. Regions matched on size to these peaks that were devoid of any significant signal were also created as a null model. These data were used for additional verification of Tier 1 and Tier 2 cell lines by ROC analysis. Files containing this data can be found in the Downloads directory labeled Validation view.

Credits

These data and annotations were created by a collaboration of multiple institutions (contact: Terry Furey):

Duke University's Institute for Genome Sciences & Policy (IGSP): Alan Boyle, Lingyun Song, Terry Furey, and Greg Crawford
University of North Carolina at Chapel Hill: Paul Giresi and Jason Lieb
Universty of Texas at Austin: Zheng Liu, Ryan McDaniell, Bum-Kyu Lee, and Vishy Iyer
European Bioinformatics Insitute: Paul Flicek, Damian Keefe, and Ewan Birney
University of Cambridge, Department of Oncology and Cancer Research UK Cambridge Research Institute (CRI): Stefan Graf

We thank NHGRI for ENCODE funding support.

References

Bhinge AA, Kim J, Euskirchen GM, Snyder M, Iyer, VR. Mapping the chromosomal targets of STAT1 by Sequence Tag Analysis of Genomic Enrichment (STAGE). Genome Res. 2007 Jun;17(6):910-6.

Boyle AP, Davis S, Shulha HP, Meltzer P, Margulies EH, Weng Z, Furey TS, Crawford GE. High-resolution mapping and characterization of open chromatin across the genome. Cell. 2008 Jan 25;132(2):311-22.

Boyle AP, Guinney J, Crawford GE, and Furey TS. F-Seq: a feature density estimator for high-throughput sequence tags. Bioinformatics. 2008 Nov 1;24(21):2537-8.

Buck MJ, Nobel AB, Lieb JD. ChIPOTle: a user-friendly tool for the analysis of ChIP-chip data. Genome Biol. 2005;6(11):R97.

Crawford GE, Davis S, Scacheri PC, Renaud G, Halawi MJ, Erdos MR, Green R, Meltzer PS, Wolfsberg TG, Collins FS. DNase-chip: a high-resolution method to identify DNase I hypersensitive sites using tiled microarrays. Nat Methods. 2006 Jul;3(7):503-9.

Crawford GE, Holt IE, Whittle J, Webb BD, Tai D, Davis S, Margulies EH, Chen Y, Bernat JA, Ginsburg D et al. Genome-wide mapping of DNase hypersensitive sites using massively parallel signature sequencing (MPSS). Genome Res. 2006 Jan;16(1):123-31.

ENCODE Project Consortium. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature. 2007 Jun 14;447(7146):799-816.

ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 2012 Sep 6;489(7414):57-74.

Giresi PG, Kim J, McDaniell RM, Iyer VR, Lieb JD. FAIRE (Formaldehyde-Assisted Isolation of Regulatory Elements) isolated active regulatory elements in human chromatin. Genome Res. 2007 Jun;17(6):877-85.

Giresi PG, Lieb JD. Isolation of active regulatory elements from eukaryotic chromatin using FAIRE (Formaldehyde Assisted Isolation of Regulatory Elements). Methods. 2009 Jul;48(3):233-9.

Li H, Ruan J, and Durbin R. Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res. 2008 Nov;18(11):1851-8.

Song L and Crawford GE. DNase-seq: a high-resolution technique for mapping active gene regulatory elements across the genome from mammalian cells. Cold Spring Harb. Protoc.; 2010;Issue 2.

Data Release Policy

Data users may freely use ENCODE data, but may not, without prior consent, submit publications that use an unpublished ENCODE dataset until nine months following the release of the dataset. This date is listed in the Restricted Until column on the track configuration page and the download page. The full data release policy for ENCODE is available here.

There is no restriction on the use of these specific tracks.

Contact

Terry Furey

Top↑