Schema for CSHL Long RNA-seq - Long RNA-seq from ENCODE/Cold Spring Harbor Lab

JavaScript is disabled in your web browser

You must have JavaScript enabled in your web browser to use the Genome Browser

Database: mm9 Primary Table: wgEncodeCshlLongRnaSeqWbrainE14halfAlnRep1
BAM File: /gbdb/mm9/bbi/wgEncodeCshlLongRnaSeqWbrainE14halfAlnRep1.bam
Format description: The fields of a SAM short read alignment, the text version of BAM.
See the SAM Format Specification for more details

field	description
`qName`	Query template name - name of a read
`flag`	Flags. 0x10 set for reverse complement. See SAM docs for others.
`rName`	Reference sequence name (often a chromosome)
`pos`	1 based position
`mapQ`	Mapping quality 0-255, 255 is best
`cigar`	CIGAR encoded alignment string.
`rNext`	Ref sequence for next (mate) read. '=' if same as rName, '*' if no mate
`pNext`	Position (1-based) of next (mate) sequence. May be -1 or 0 if no mate
`tLen`	Size of DNA template for mated pairs. -size for one of mate pairs
`seq`	Query template sequence
`qual`	ASCII of Phred-scaled base QUALity+33. Just '*' if no quality scores
`tagTypeVals`	Tab-delimited list of tag:type:value optional extra fields

Sample Rows

qName	flag	rName	pos	mapQ	cigar	rNext	pNext	tLen	seq	qual
HWI-ST984:60:D09B8ACXX:5:1104:16241:45845	179	chr1	3000916	255	101M	=	3000997	182	GGTGTCTCTGTTCTGAGTATTATGGCCTGCACCTGTTATTCTGGCTGTTGTGGCCTGTACTTAAGCAGGAAAGTCAAGCAGAATTGAGGGTAGGGGCCCAG	CCBFFFFFHHHHHJJHHGHHJJJJJGJJIIGIIJIJJIJJIJJJJJJIIJGIJIJJIEHDEHCGHIIIJEGIJDHCEFEHFCEFFDCEDD;A?BDD:?BBB
HWI-ST984:60:D09B8ACXX:5:2102:5944:143250	179	chr1	3000916	255	101M	=	3000997	182	GGTGTCTCTGTTCTGAGTATTATGGCCTGCACCTGTTATTCTGGCTGTTGTGGCCTGTACTTAAGCAGGAAAGTCAAGCAGATTTGAGGGTAGGGGACCAG	?<@DD?>D?DF>F?E<<2AC?EHEA9A<A;AF9FB:<:??FAFDF?D8:B))8D;/6B###########################################
HWI-ST984:60:D09B8ACXX:5:1104:16241:45845	115	chr1	3000997	255	101M	=	3000916	-182	AATTGAGGGTAGGGGCCCAGCAGTGGGATAGGTGGTGAGATAGGTGGGGAGGTGCTAGAATGGGTCTTTGGGGACTGAGTTTAGGGAGTGGGACAATTCTA	DCC?BDAADDDDDDDBDCDECEDEFFFFFHHHHHJJJJJJJJJJIJJIHFIHIIJJJJJJJJIIJJIJJJJJJIJJJJJJIJJJJJIJHHHHHFFFFFCC@
HWI-ST984:60:D09B8ACXX:5:2102:5944:143250	115	chr1	3000997	255	101M	=	3000916	-182	AATTGAGGGTAGGGGCCCAGCAGTGGGATAGGTGGTGAGATAGGTGGGGAGGTGCTAGAATGGGTCTTTGGGGACTGAGTTTAGGGAGTGGGACAATTCTA	#############?;1>3;;62A;:?7.)>C?3>G@7):EF=/4F@1GEFBAD<D?B:038DC:)1FGFBA+<9ACGHEB?A<++++FDFCDD?A=<@@
HWI-ST984:60:D09B8ACXX:5:1203:1500:183711	179	chr1	3002175	255	1S100M	=	3002266	192	ATGCGGATGACCTGCCTTTGTGTCTTTTTGACTAGCTGGCTCATTAGTGTAGCTGCCTTTGTTCTTTTAGGTCCATGAAGCCCCTCATACAATTCATATTG	@@@FFFFFHHGHHGIIIIIIAGHIGJJJIJJJJFJIJIJIGGIJIIIFHIIIJIJIHIJJIDGHIJJHHHHEHEFDFFD@ECDDBDBDDC>>>@>A5;>C@
HWI-ST984:60:D09B8ACXX:5:1203:1500:183711	115	chr1	3002266	255	101M	=	3002175	-192	TTCATATTGTGAGAAATTATGTATTCTTGAACTCATGTTTTCAGAATTCTTTCATACAGTCTTAAGGGCTGTCGTGAAGACCACAGTGTTCACCACCTTGC	;;>>>5;>3DC@@D@CCDCA>CCEEFDCEC=HHHGGEEHEJJIJJIHG@FFFF@EGJJIGCIJIJIJJJJJJJIGIHHEAIHEIIIIGFHHFHD:FFFCBB
HWI-ST984:60:D09B8ACXX:5:1304:4625:194410	89	chr1	3003077	255	11S32M58S	*	0	0	GTGACCTCTCCCCTCAGCTTTCTTGCTTGTTTTTTTTTTTTTTAATGAGGCGCCCCCCAAGGATATCTACTCTCTCCCCCCCCCGCGCCTCTCTTCCGATC	#####################################################################################################
HWI-ST984:60:D09B8ACXX:5:2308:15229:44636	89	chr1	3003087	250	17S29M55S	*	0	0	CTTCCACCCCACGAGTCCTTGCTTGTTTTTTTTTTTTTTTTTTTAATGACGCGGCCCCCCCCGAGATCCACCCTCTCCCCCCACCCGCCCCTCTCCCGATC	#####################################################################################################
HWI-ST984:60:D09B8ACXX:5:1201:16656:136267	145	chr1	3005828	254	101M	*	0	0	GTCCAATTTTCTGAGGAACCGCCAGACTGATTTCCAGAGTGGTTGTACAAGCTGCCAATCCAACCACCATTGGAGGAGGCTTCCCCTTTCTCAACATCCTC	@??AADD;??<+?C?E<AFF@GHIGE;DFC940?B?BDEDDF97D?8CGE###################################################
HWI-ST984:60:D09B8ACXX:5:1108:18027:6082	179	chr1	3006376	255	101M	=	3006509	234	CCTACTTTCTCCTCTGTAAGTTTCAGTGTCTCTGGTTTTATGTGGAGTTCCTTAATCCACTTAGATTTGACCTTAGTACAAGGAGATAGGAATGGATCAAT	CCCFFFFFHHHHHIJJJJJJIGIJHIIIIIHIJJIHIJJGJJIJJJJIJJJJJJIJJIJJJJIIJJJJJIJJJJIHIIJJJIIHHHHHEFFFFFEEEEEDD

CSHL Long RNA-seq (wgEncodeCshlLongRnaSeq) Track Description


	Description These tracks were generated by the ENCODE Consortium. They contain information about mouse RNAs greater than 200 nucleotides in length obtained as short reads off the Illumina platform. Data are available from biological replicates. Display Conventions and Configuration This track is a multi-view composite track that contains multiple data types (views). For each view, there are multiple subtracks that display individually on the browser. Instructions for configuring multi-view tracks are here. To show only selected subtracks, uncheck the boxes next to the tracks that you wish to hide. Color differences among the views are arbitrary. They provide a visual cue for distinguishing between the different cell types and compartments. Contigs The Contigs represent blocks of overlapping mapped reads from the pooled biological replicates. Raw Signals The Plus Raw Signal and Minus Raw Signal views show the density of mapped reads on the plus and minus strands (wiggle format), respectively. Alignments The Alignments view shows individual reads mapped from biological replicates to the genome and indicates where bases may mismatch. Every mapped read is displayed, i.e. uncollapsed. The alignment file follows the standard SAM format of Bowtie output. See the Bowtie Manual for more information about the SAM Bowtie output (including other tags) and the SAM Format Specification for more information on the SAM/BAM file format. Splice Junctions Subset of aligned reads that cross splice junctions. Specific column specifications can be found in the supplemental directory. Metadata for a particular subtrack can be found by clicking the down arrow in the list of subtracks. Additional views are available on the Downloads page. Methods Tissue Samples Individual tissues were harvested from mouse strain C57BL/6J at different timepoints according to ENCODE cell culture protocols. Whenever possible, biological replicates were obtained from littermates. Library Preparation The published cDNA sequencing protocol was used. This protocol generates directional libraries and reports the transcripts' strand of origin. Exogenous RNA spike-ins were added to each endogenous RNA isolate and carried through library construction and sequencing. The spike-in sequence and the concentrations are available for download in the supplemental directory. Sequencing and Mapping The libraries were sequenced on the Illumina platform (either GAIIx or Hi-Seq) in mate-pair fashion (either pair-end 76 or pair-end 101) to an average depth of 100 million mate-pairs. The data were mapped against mm9 using Spliced Transcript Alignment and Reconstruction (STAR) written by Alex Dobin (CSHL). More information about STAR, including the parameters used for these data, is available from the Gingeras lab. For each experiment, there are additional element data views data files available for download. These elements were assessed for reproducibility using a nonparametric irreproducible detection (IDR) rate script. The IDR values for each element are included in the files for end-users to use as a threshold. An IDR value of 0.1 means that the probability of detecting that element in a third experiment equivalent in depth to the sum of the bioreplicates is 90%. In addition, expression values for annotated genes, transcripts and exons were computed. Further explanation of these files is available for download in the supplemental directory. Verification FPKM (fragments per kilobase of exon per million fragments mapped) values were calculated for annotated exons and Spearman correlation coefficients were computed. In general, Rho values are greater than 0.90 between biological replicates. Release Notes This is release 3 (Sept 2012) of this track. It adds data for bladder, cerebellum, CNS, cortex, frontal lobe, limb, liver, placenta, and whole brain. The samples for CNS, liver, limb and whole brain vary over age (developmental stage). This release also contains replacement BAM files for the previous ones had the second read reverse complemented. Credits These data were generated and analyzed by the transcriptome group at Cold Spring Harbor Laboratories and the Center for Genomic Regulation (CRG in Barcelona), who are participants in the ENCODE Transcriptome Group. Contacts: Carrie Davis (experimental), Roderic Guigo and lab (data processing), Tom Gingeras (primary investigator) References Jiang L, Schlesinger F, Davis CA, Zhang Y, Li R, Salit M, Gingeras TR, Oliver B. Synthetic spike-in standards for RNA-seq experiments. Genome Res. 2011 Sep;21(9):1543-51. PMID: 21816910; PMC: PMC3166838 Parkhomchuk D, Borodina T, Amstislavskiy V, Banaru M, Hallen L, Krobitsch S, Lehrach H, Soldatov A. Transcriptome analysis by strand-specific sequencing of complementary DNA. Nucleic Acids Res. 2009 Oct;37(18):e123. PMID: 19620212; PMC: PMC2764448 Data Release Policy Data users may freely use ENCODE data, but may not, without prior consent, submit publications that use an unpublished ENCODE dataset until nine months following the release of the dataset. This date is listed in the Restricted Until column, above. The full data release policy for ENCODE is available here.

Description

Display Conventions and Configuration

Methods

Tissue Samples

Library Preparation

Sequencing and Mapping

Verification

Release Notes

Credits

References

Data Release Policy