Note: Released Jul. 28, 2020
These tracks show curated SARS-CoV-2 protein-coding genes conserved within the Sarbecovirus subgenus as determined using PhyloCSF , FRESCo , and other comparative genomics methods, consistent with experimental evidence in SARS-CoV-2. For a complete description of the evidence, see .
The PhyloCSF Genes track shows the conserved protein-coding genes, namely ORF1a, ORF1ab, S, ORF3a, ORF3c (a.k.a. ORF3h, ORF3a*, and 3a.iORF1), E, M, ORF6, ORF7a, ORF7b, ORF8, N, and ORF9b (a.k.a. ORF9a).
The PhyloCSF Rejected Genes track shows other genes that have been proposed that do not show the signature of conserved protein-coding genes or persuasive experimental evidence, and are thus unlikely to be actual protein-coding genes, namely ORF3d, ORF3b, ORF14 (a.k.a. ORF9b, ORF9c), and ORF10. (Two different ORFs have been referred to as ORF3b. We refer to the one with coordinates 25524-25697 as ORF3d. The other, with coordinates 25814-25882, is the 23-codon ortholog of the 5' end of SARS-CoV-1 ORF3b, ending at an in-frame stop codon that is not present in SARS-CoV-1.)
The raw data can be explored interactively with the
Table Browser or combined with other datasets in the
Data Integrator tool.
For automated analysis, the genome annotation is stored in
a bigBed file that can be downloaded from
the download server.
be converted from binary to ASCII text by our command-line tool bigBedToBed.
Instructions for downloading this command can be found on our
The tool can also be used to obtain features within a given range without downloading the file,
bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/wuhCor1/bbi/phyloGenes/PhyloCSFgenes.bb -chrom=NC_045512v2 -start=0 -end=29902 stdout
Please refer to our
mailing list archives
for questions, or our
Data Access FAQ
for more information.
Questions should be directed to Irwin Jungreis.
If you use the SARS-CoV-2 PhyloCSF Genes Track Hub, please cite Jungreis et al. 2020 .
 Lin MF, Jungreis I, and Kellis M (2011). PhyloCSF: a comparative genomics method to distinguish protein-coding and non-coding regions. Bioinformatics 27:i275-i282 (ISMB/ECCB 2011).
 Sealfon RS, Lin MF, Jungreis I, Wolf MY, Kellis M, Sabeti PC (2015). FRESCo: finding regions of excess synonymous constraint in diverse viruses. Genome Biol. doi: 10.1186/s13059-015-0603-7.
 Jungreis I, Sealfon R, Kellis M (2020). Sarbecovirus comparative genomics elucidates gene content of SARS-CoV-2 and functional impact of COVID-19 pandemic mutations. bioRxiv doi:10.1101/2020.06.02.130955.