CESAR utilizes existing whole genome alignments to detect conserved coding exons and then maps gene annotations from one (reference) genome to many aligned (query) genomes. Since genome alignments contain thousands of spurious frameshifts and splice site mutations in exons that are truly conserved, CESAR aligns the exon again ("realign"), considering the reading frame and splice site position of the exon. The resulting alignment will preserve the reading frame and splice sites if the query sequence contains an intact exon.
CESAR detects 91% of shifted splice sites and aligns the shifted splice site to the reference splice site. Such exon mappings are very specific as 99% of the human exons that lack inactivating mutations in mouse after realigning match annotated mouse exons.
CESAR was applied to the UCSC 100-way alignment that aligns 99 vertebrates to the human hg19 genome. All 188,758 human coding exons of 19,865 UCSC knownGenes (longest isoform) were realigned with CESAR. All intact exons from the same gene were grouped into a gene model.
The coordinates of the intact exons and annotated genes (genePred format) for the 99 vertebrates and the 100-way realignment (maf format, 7.9 GB) are available at http://bds.mpi-cbg.de/hillerlab/CESAR/ for download.
This track was produced by Virag Sharma and the Hiller Lab at the Max Planck Institute of Molecular Cell Biology and Genetics. For questions regarding this hub, please contact Michael Hiller.
Virag Sharma, Anas Elghafari, and Michael Hiller. Coding Exon-Structure Aware Realigner (CESAR) utilizes genome alignments for accurate comparative gene annotation. Nucleic Acids Research, doi: 10.1093/nar/gkw210, 2016