sources EPDnew Track Settings
 
Source of refTSS: liftOver EPDnew

Track collection: Sources of Reference Transcription Start Sites (refTSS)

+  Description
+  All tracks in this collection (6)

Display mode:   
Data schema/format description and download
Assembly: Human Dec. 2013 (GRCh38/hg38)
Data last updated at UCSC: 2018-12-28 03:30:52

Description

Transcription starts at genomic positions called transcription start sites (TSSs) to produce RNAs, and is mainly regulated by genomic elements and transcription factors binding around these TSSs. This indicates that TSSs may be a better unit to integrate various data sources related to transcriptional events, including regulation and production of RNAs. However, although several TSS datasets and promoter atlases are available, a comprehensive reference set that integrates all known TSSs is lacking. Thus, we constructed a reference dataset of TSSs (refTSS) for the human and mouse genomes by collecting publicly available TSS annotations and promoter resources. The data set consists of genomic coordinates of TSS peaks, their gene annotations, quality check results, and conservation between human and mouse. We also developed a web interface to browse the refTSS (http://reftss.clst.riken.jp/).

Methods

We collected publicly available human and mouse 5'-end sequencing data from public repositories and databases as described in PMID: 31075273. After obtaining the 5'-end sequence data, we applied the following process:

  1. Reprocessing of 5'-end sequence data
  2. Conversion of the genomic coordinates to the latest genome assembly
  3. Reprocessing of raw sequence reads
  4. Reprocessing of the mapped reads in BAM format
  5. Integration of TSS data
  6. Quality evaluation and classification of refTSS peaks

Since the largest TSS set is the FANTOM5 promoter atlas, we merged all other data with the FANTOM5 set. Finally, we used publicly available annotation to annotate the refTSS

The structure of individual tracks provides genomic coordinates of TSS peaks for the refTSS and the processed source 5' end data set:

  • Hub name: refTSS
  • Assemblies: hg38, mm10
    1. hg38 hub contains the following tracks:
      • refTSS
      • FANTOM5
      • RMAPAGE
      • DRA000914
      • ENCODE_CAGE
      • dbTSS
    2. mm10 hub contains the following tracks:
      • refTSS
      • FANTOM5
      • EPDnew
      • DRA000914

Data files

The data files (BED / text) is available for download from http://reftss.clst.riken.jp/datafiles/.

Credits

Track hub is prepared by Shuhei Noguchi, Laboratory for Large-Scale Biomedical Data Technology, RIKEN IMS.

Please send us any questions regarding to this trackHub and underlying data.

References

Abugessaisa, I., Noguchi S, Hasegawa A, Kondo A, Kawaji H, Carninci P, Kasukawa T. (2019). "refTSS: A Reference Data Set for Human and Mouse Transcription Start Sites." J Mol Biol. doi: 10.1016/j.jmb.2019.04.045. PMID: 31075273.