Schema for Yale Pseudo60 - Yale Pseudogenes based on Ensembl Release 60
  Database: mm9    Primary Table: pseudoYale60    Row Count: 19,082   Data last updated: 2010-12-23
Format description: A gene prediction.
On download server: MariaDB table dump directory
fieldexampleSQL type info description
bin 608smallint(5) unsigned range Indexing field to speed chromosome range queries.
name PGOMOU00000255058varchar(255) values Name of gene
chrom chr1varchar(255) values Reference sequence chromosome or scaffold
strand +char(1) values + or - for strand
txStart 3044196int(10) unsigned range Transcription start position (or end position for minus strand item)
txEnd 3045304int(10) unsigned range Transcription end position (or start position for minus strand item)
cdsStart 0int(10) unsigned range Coding region start (or end position for minus strand item)
cdsEnd 0int(10) unsigned range Coding region end (or start position for minus strand item)
exonCount 1int(10) unsigned range Number of exons
exonStarts 3044196,longblob   Exon start positions (or end positions for minus strand item)
exonEnds 3045304,longblob   Exon end positions (or start positions for minus strand item)

Sample Rows
 
binnamechromstrandtxStarttxEndcdsStartcdsEndexonCountexonStartsexonEnds
608PGOMOU00000255058chr1+304419630453040013044196,3045304,
609PGOMOU00000125763chr1-322484132252350013224841,3225235,
609PGOMOU00000126381chr1+324281432433170013242814,3243317,
611PGOMOU00000126382chr1+352187535228170013521875,3522817,
617PGOMOU00000125764chr1-424631842505950024246318,4248919,4246525,4250595,
619PGOMOU00000126383chr1+451298745168150024512987,4514511,4513698,4516815,
619PGOMOU00000125765chr1-452492045253670014524920,4525367,
620PGOMOU00000126384chr1+460055146014840014600551,4601484,
620PGOMOU00000125766chr1-467801146794850014678011,4679485,
620PGOMOU00000125767chr1-468229946835050014682299,4683505,

Note: all start coordinates in our database are 0-based, not 1-based. See explanation here.

Yale Pseudo60 (pseudoYale60) Track Description
 

Description

This track shows pseudogenes identified by the Yale Pseudogene Pipeline. Pseudogenes are defined in this analysis as genomic sequences that are similar to known genes with various inactivating disablements (e.g., premature stop codons or frameshifts) in their "putative" protein coding regions. Pseudogenes are flagged as either recently processed, recently duplicated, or of uncertain origin (either ancient fragments or resulting from a single-exon parent). NOTE: There are 4 pseudogenes missing - these had overlapping coordinates in the blocks representing exons and their identifiers are:

  • PGOMOU00000130313
  • PGOMOU00000139101
  • PGOMOU00000136201
  • PGOMOU00000128816

Methods

Briefly, the protein sequences of known human genes (as annotated by Ensembl Release 60) were used to search for similarities, not overlapping with known genes. It was determined whether the matching sequences were disabled copies of genes based on the occurrences of premature stop codons or frameshifts. The intron-exon structure of the functional gene was further used to infer whether a pseudogene was recently duplicated or processed. A duplicated pseudogene retains the intron-exon structure of its parent functional gene, whereas a processed pseudogene shows evidence that this structure has been spliced out. Small pseudogene sequences that cannot be confidently assigned to either the processed or duplicated category may be ancient fragments. Further details are in the references below.

Credits

These data were generated by the pseudogene annotation group in the Gerstein Lab at Yale University.

References

More information is available from Pseudogene.org.

Zhang Z, Harrison PM, Liu Y, Gerstein M. Millions of years of evolution preserved: a comprehensive catalog of the processed pseudogenes in the human genome. Genome Res. 2003 Dec;13(12):2541-58. PMID: 14656962; PMC: PMC403796

Zheng D, Zhang Z, Harrison PM, Karro J, Carriero N, Gerstein M. Integrated pseudogene annotation for human chromosome 22: evidence for transcription. J Mol Biol. 2005 May 27;349(1):27-45. PMID: 15876366