Schema for Nextstrain Parsimony - Parsimony Scores for Nextstrain Variants and Phylogenetic Tree
  Database: wuhCor1
The data for this track is provided by a file in BigWig format.
Data URL: /gbdb/wuhCor1/nextstrain/nextstrainParsimony.bw

Nextstrain Parsimony (nextstrainParsimony) Track Description
 

Description

Nextstrain.org displays data about single nucleotide variant alleles in the SARS-CoV-2 RNA and protein sequences that have occurred in different samples of the virus during the current 2019/2020 outbreak. Nextstrain has a powerful user interface for viewing the time stamped phylogenetic tree that it infers from the patterns of variants in sequences worldwide. Nextstrain maintains an ongoing pipeline that continuously obtains SARS-CoV-2 genome sequences and metadata from GISAID, aligns them against the reference genome (NC_045512.2), collects single-nucleotide variants (SNVs), and infers a phylogenetic tree.

A parsimony score can be computed for each mutation as the minimum number of nucleotide changes along branches of the tree that would lead to the observed sample genotypes at the leaves of the tree. For example, if there is a branch for which all leaves have a mutation, and no other leaves of the tree have the mutation, then the mutation presumably occurred once on that branch and the parsimony score would be one. However, when a mutation appears on leaves belonging to several branches whose other leaves do not have the mutation, then the mutation would need to occur on multiple branches in the tree, increasing the parsimony score. Mutations with a parsimony score that is relatively high, especially when compared to alternate allele count (the number of samples/leaves with the mutation), may be of interest when identifying systematic errors and/or sites of recurrent mutations.

This track shows the parsimony score of each SNV reported by Nextstrain as a bar graph with the height indicating the score. (The Nextstrain Variants track displays the phylogenetic tree and sample genotypes from which the parsimony scores were generated.

Methods

Nextstrain downloads SARS-CoV-2 genomes from GISAID as they are submitted by labs worldwide. The sequences are processed by an automated pipeline and annotations are written to a data file that UCSC downloads and extracts annotations for display. UCSC computes parsimony scores using the phylogenetic tree and variants extracted from Nextstrain.

Data Access

You can download the bigWig file underlying this track (nextstrainParsimony.bw) from our Download Server. The data can be explored interactively with the Table Browser or the Data Integrator. The data can be accessed from scripts through our API.

Nextstrain.org offers phylogenetic trees and metadata files: scroll to the bottom of the page and click "DOWNLOAD DATA", and a dialog with download options appears.

Credits

This work is made possible by the open sharing of genetic data by research groups from all over the world. We gratefully acknowledge their contributions. Special thanks to nextstrain.org for sharing its analysis of genomes collected by GISAID.

References

Hadfield J, Megill C, Bell SM, Huddleston J, Potter B, Callender C, Sagulenko P, Bedford T, Neher RA. Nextstrain: real-time tracking of pathogen evolution. Bioinformatics. 2018 Dec 1;34(23):4121-4123. PMID: 29790939; PMC: PMC6247931