Software and Resources for Analyzing ENCODE data

HaploReg
Explores annotations of the noncoding genome at variants on haplotype blocks, such as candidate regulatory SNPs at disease-associated loci. Under Set Options tab, set Browse ENCODE button to "on" and select an LD threshold and reference population. Under Build Query Tab, enter a SNP (rsXXXXX), a set of SNPs, a genomic region, or select a GWAS from the drop down menu. HaploReg returns SNPs in LD with query SNPs, their frequency in 4 populations from 1000 Genomes Phase1, and also tells you what evidence ENCODE has found for regulatory protein binding (mouse over to see the protein names), chromatin structure (mouse over to see the cell types with DNase hypersensitivity), the chromatin state of the region (the chromatin state can predict an enhancer or promoter), and putative transcription factor binding motifs that are altered by the variant. Clicking on the SNP name hyperlink reveals further details, including cell type metadata and the mechanism of disruption/creation of TF binding regulatory motifs (showing the PWM matched and its alignment to the local sequence context). SNPs are also intersected with cross-species conserved elements, chromatin states from the Roadmap Epigenomics Consortium, and lead eQTLs from the GTEx Project browser.

Ward LD, Kellis M. HaploReg: a resource for exploring chromatin states, conservation, and regulatory motif alterations within sets of genetically linked variants. Nucleic Acids Res. 2012 Jan;40(Database issue):D930-4.

RegulomeDB
Identifies DNA features and regulatory elements in non-coding regions of the human genome. One can enter dbSNP IDs, BED files, VCF files, or GFF3 files. A score is returned assessing the evidence for regulatory potential. Clicking on the score reveals the data supporting the inference, by data type and cell type. One can also click on hyperlinks to see the SNP or the region in the UCSC browser, ENSEMBL browser, and dbSNP.

Boyle AP, Hong EL, Hariharan M, Cheng Y, Schaub MA, Kasowski M, Karczewski KJ, Park J, Hitz BC, Weng S et al. Annotation of functional variation in personal genomes using RegulomeDB. Genome Res. 2012 Sep;22(9):1790-7.

PIQ: Protein Interaction Quantification
PIQ is a computational method that models the magnitude and shape of genome-wide DNase profiles to facilitate the identification of transcription factor (TF) binding sites. The input of PIQ is one or more DNase-seq experiments, the genome sequence of the organism assayed and a list of motifs represented as position weight matrices (PWMs) that describe candidate TF binding sites. PIQ uses machine learning methods to normalize input DNase-seq data and then predicts TF binding by detecting both the shape and magnitude of DNase profiles specific to each TF. The output of PIQ is the probability of occupancy for each candidate binding site in the genome, along with aggregate TF-specific scores (e.g. metrics for TF-specific chromatin opening).

Sherwood RI, Hashimoto T, O'Donnell CW, Lewis S, Barkal AA, van Hoff JP, Karun V, Jaakkola T, Gifford DK. Discovery of directional and nondirectional pioneer transcription factors by modeling DNase profile magnitude and shape. Nat Biotechnol. 2014 Feb;32(2):171-8. PMID: 24441470; PMC: PMC3951735

Factorbook
A wiki-style resource that organizes all the information associated with each transcription factor (TF), including the ChIP-seq peaks, discovered motifs, TF-TF interactions, and the chromatin features (histone modification patterns, DNase I cleavage, and nucleosome positioning) around the ChIP-seq peaks. Will be updated as the project proceeds. The Factorbook display of this information is transcription factor anchored and dynamic.

Wang J, Zhuang J, Iyer S, Lin X, Whitfield TW, Greven MC, Pierce BG, Dong X, Kundaje A, Cheng Y et al. Sequence features and chromatin structure around the genomic regions bound by 119 human transcription factors. Genome Res. 2012 Sep;22(9):1798-812.

ENCODE-motifs
A database that uncovers the molecular basis of TF binding in the human genome based on regulatory motif analysis of all Transcription Factors (TFs) grouped by family. This allows browsing of all known motifs for each factor, curated from TRANSFAC, Jaspar, and Protein Binding Microarray (PBM) experiments, and their enrichment and instances within corresponding TF binding experiments. It also provides a list of novel regulatory motifs discovered by systematic application of several motif discovery tools (including MEME, MDscan, Weeder, AlignACE) and evaluated based on their enrichment relative to control motifs within TF-bound regions. ENCODE-motifs also provides a genome-wide map of regulatory motif instances in the human genome for both known and novel motifs.

Kheradpour P, Kellis M. Systematic discovery and characterization of regulatory motifs in ENCODE TF binding experiments. Nucleic Acids Res. 2014 Mar 1;42(5):2976-87. PMID: 24335146; PMC: PMC3950668

Spark
Spark is an interactive pattern discovery and visualization approach designed with epigenomic data in mind. Spark can reveal both known and novel epigentic signatures.

Nielsen CB, Younesy H, O'Geen H, Xu X, Jackson AR, Milosavljevic A, Wang T, Costello JF, Hirst M, Farnham PJ, Jones SJ. Spark: a navigational paradigm for genomic data exploration. Genome Res. 2012 Nov;22(11):2262-9. PMID: 22960372; PMCID: PMC3483555

Regulatory Elements Database
Using an intuitive interface, you can 1) identify DNaseI-hypersensitive sites (DHS) within a genomic region of interest, 2) predict the target gene for DHS of interest, 3) predict the DHS that regulate a gene of interest, 4) identify clusters of similarly regulated DHS, that may have related function, 5) identify enriched motifs for transcription factors that may bind in these similarly regulated DHS, and 6) identify DHS that contain a DNA sequence motif for a transcription factor of interest. The Regulatory Elements Database provides access to roughly 2.8 million DNaseI-hypersensitive sites and their signal in 112 human samples, as well as Affymetrix microarray expression data for the same cell-types.

Sheffield NC, Thurman RE, Song L, Safi A, Stamatoyannopoulos JA, Lenhard B, Crawford GE, Furey TS. Patterns of regulatory activity across diverse human cell types predict tissue identity, transcription factor binding, and long-range interactions. Genome Res. 2013 May;23(5):777-88. PMID: 23482648; PMCID: PMC3638134

Tutorial on functional annotation of non-coding sequences
Briefly describes how to display the basic ENCODE tracks, using the UCSC genome browser, at SNPs identified from GWAS studies.

Mortlock DP, Pregizer S. Identifying functional annotation for noncoding genomic sequences. Curr Protoc Hum Genet. 2012 Jan;Chapter 1:Unit1.10.

See also the ENCODE Education & Outreach page.

Tools for analyzing ENCODE data