Schema for Cancer Transc Expr - Transcript-level Expression in 33 TCGA Cancer Tissues (GENCODE v23)
  Database: hg38    Primary Table: tcgaTranscExpr Data last updated: 2017-08-30
Big Bed File: /gbdb/hg38/tcga/tcgaTranscExpr.bb
Item Count: 189,719
Format description: BED6+5 barChart format, with name, name2 identifying gene name and accession
fieldexampledescription
chromchr1Reference sequence chromosome or scaffold
chromStart166042243Start position in chromosome
chromEnd166042350End position in chromosome
nameENST00000384611Accession
score0Score from 0-1000, typically derived from total of median value from all tissues
strand++ or - for strand. Use . if not applicable
name2RNA5SP64HUGO gene name
expCount32Number of tissues
expScores0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0Comma separated list of median expression values per tissue
_dataOffset2350457812Offset of sample data in data matrix file, for boxplot on details page
_dataLen44974Length of sample data row in data matrix file

Sample Rows
 
chromchromStartchromEndnamescorestrandname2expCountexpScores_dataOffset_dataLen
chr1166042243166042350ENST000003846110+RNA5SP64320.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0235045781244974
chr1166057425166166225ENST00000435676666-FAM78B320.21,0.25,1.67,0.42,0.12,0.17,2.39,0.42,0.45,0.32,0.18,0.20,0.17,0.23,0.43,0.14,0.90,1.03,0.24,0.14,2.33,0.26,0.32,0.20,0.23,0.2 ...408999534999158
chr1166059688166166381ENST00000456900222-FAM78B320.0,0.0,0.10,0.02,0.0,0.0,0.35,0.03,0.04,0.0,0.0,0.0,0.0,0.0,9.05,0.0,0.06,0.05,0.0,0.0,0.24,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0 ...498110829873073
chr1166059900166166234ENST00000441649444-FAM78B320.05,0.12,0.16,0.15,0.06,0.10,0.37,0.08,0.18,0.07,0.11,0.07,0.07,0.11,0.14,0.02,0.20,0.12,0.07,0.08,0.84,0.08,0.31,0.06,0.08,0.0 ...434092417294805
chr1166070018166166715ENST00000354422555-FAM78B320.11,0.22,0.0,0.13,0.12,0.19,0.26,0.11,0.16,0.09,0.31,0.09,0.13,0.42,0.27,0.11,0.30,0.04,0.09,0.16,1.67,0.24,0.41,0.17,0.15,0.20 ...130906016292048
chr1166070019166166969ENST00000338353111-FAM78B320.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.10,0.0,0.0,0.0,0.0,0.0,0.20,0.0,0.0,0.0,0.0,0.0,0.0,0.17,0.0,0.0,0 ...110826187356928
chr1166081182166087483ENST00000366136666+RP11-375H19.2320.24,0.39,0.60,0.52,0.48,0.33,0.34,0.29,0.55,0.44,0.52,0.52,0.49,0.30,0.50,0.18,0.54,0.32,0.42,0.31,0.77,0.42,0.50,0.44,0.46,0.5 ...161744386397442
chr1166147781166165095ENST00000451784666-RP11-9L18.3320.33,0.56,0.41,0.49,0.43,0.30,0.39,0.41,0.54,0.45,0.22,0.62,0.64,0.36,0.42,0.55,0.54,0.36,0.38,0.29,0.44,0.78,0.56,0.47,0.39,0.4 ...476608665896965
chr1166154742166154798ENST000004011330-MIR921320.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0280398645845573
chr1166334883166335762ENST00000425271444+RP11-479J7.2320.0,0.09,0.17,0.11,0.0,0.0,1.26,0.0,0.06,0.0,0.0,0.08,0.06,0.08,0.16,0.06,0.69,0.08,0.0,0.0,0.41,0.04,0.08,0.0,0.0,0.06,0.0,0.13 ...365351506576024

Cancer Transc Expr (tcgaTranscExpr) Track Description
 

Description

The Cancer Genome Atlas (TCGA), a collaboration between the National Cancer Institute (NCI) and National Human Genome Research Institute (NHGRI), has generated comprehensive, multi-dimensional maps of the key genomic changes in 33 types of cancer. The TCGA dataset, 2.5 petabytes of data describing tumor tissue and matched normal tissues from more than 11,000 patients, is publically available and has been used widely by the research community.

The Cancer Genome Atlas is a NIH-funded project to catalog genetic mutations responsible for cancer. The data shown here is RNA-seq expression data produced by the consortium.

For questions or feedback on the data, please contact TCGA.

TCGA Gene Expression

The gene track shows RNA expression level for each TCGA tissue in GENCODE canonical genes. The gene scores are a total of all transcripts in that gene.

TCGA Transcript Expression

The transcript track shows RNA expression levels for each TCGA tissue using GENCODE v23 transcripts.

Display Conventions

In Full and Pack display modes, expression for each genomic item (gene/transcript) is represented by a colored bar chart, where the height of each bar represents the median expression level across all samples for a tissue, and the bar color indicates the tissue.

The bar chart display has the same width and tissue order for all genomic items. Mouse hover over a bar will show the tissue and median expression levels. The Squish display mode draws a rectangle for each gene, colored to indicate the tissue with highest expression level if it contributes more than 10% to the overall expression (and colored black if no tissue predominates). In Dense mode, the darkness of the grayscale rectangle displayed for the gene reflects the total median expression level across all tissues.

This track was designed to be used in conjunction with the GTEx expression tracks that can act as a control.

The color of each cancer was derived by mapping the tissue of origin to the closest GTEx tissue, then taking the GTEx tissue's color. Five cancers did not have a matching GTEx tissue and were assigned a rainbow color scheme; these cancers are Cholangiocarcinoma, Esophageal carcinoma, Head and Neck squamous cell carcinoma, Sarcoma and Uveal Melanoma.

The ordering of the cancers is based on the alphabetical ordering of their GTEx tissues. The five cancers that did not match were ordered alphabetically.

Methods

TCGA chose cancers for study based on two broad criteria; poor prognosis/overall public health impact and availability of human tumor and matched normal tissue samples that meet TCGA standards.

RNA sequencing was performed using a polyA library and the Illumina HiSeq 2000 platform. All RNA sequencing was performed by UNC.

Sequence reads for this track were quantified to the hg38/GRCh38 human genome using kallisto assisted by the GENCODE v23 transcriptome definition. Read quantification was performed at UCSC by the Computational Genomics lab, using the Toil pipeline. The resulting kallisto files were combined to generate a transcript per million (tpm) expression matrix using the UCSC tool, kallistoToMatrix. By totaling the TPM values for all transcripts associated to the canonical transcript/gene, a condensed gene per million (gpm) matrix was made. For both matrices average expression values for each tissue were calculated and used to generate a bed6+5 file that is the base of each track. This was done using the UCSC tool, expMatrixToBarchartBed. The bed track was then converted to a bigBed file using the UCSC tool, bedToBigBed.

Credits

Data shown here are in whole based upon data generated by the TCGA Research Network. John Vivian, Melissa Cline, and Benedict Paten of the UCSC Computational Genomics lab were responsible for the sequence read quantification used to produce this track. Chris Eisenhart and Kate Rosenbloom of the UCSC Genome Browser group were responsible for data file post-processing, track configuration and display type.

References

J. Vivian et al., Rapid and efficient analysis of 20,000 RNA-seq samples with Toil bioRxiv bioRxiv, vol. 2, p. 62497, 2016.