Description
This track displays human-centric multiple sequence alignments in the
ENCODE regions for the 28 vertebrates included in the
September 2005 ENCODE MSA freeze,
based on comparative sequence data generated for the ENCODE project
as well as whole-genome assemblies residing at UCSC, as listed:
- human (May 2004, hg17)
- armadillo (NISC and May 2005 Broad Assisted Assembly v 1.0)
- baboon (NISC)
- chicken (Feb 2004, galGal2)
- chimp (Nov 2003, panTro1)
- colobus_monkey (NISC)
- cow (BCM)
- dog (July 2004, canFam1)
- dusky_titi (NISC)
- elephant (NISC and May 2005 Broad Assisted Assembly v 1.0)
- fugu (Aug 2002, fr1)
- galago (NISC)
- hedgehog (NISC)
- macaque (Jan 2005, rheMac1)
- marmoset (NISC)
- monodelphis (Oct 2004, monDom1)
- mouse (Mar 2005, mm6)
- mouse_lemur (NISC)
- owl_monkey (NISC)
- platypus (NISC and Aug 2005 Mullikin Phusion Assembly of WUGSC Traces)
- rabbit (NISC and May 2005 Broad Assisted Assembly v 1.0)
- rat (June 2003, rn3)
- rfbat (NISC)
- shrew (NISC and Sep 2005 Mullikin Phusion Assembly of Broad Traces)
- tenrec (Apr 2005 Mullikin Phusion Assembly of Broad Traces)
- tetraodon (Feb 2004, tetNig1)
- xenopus (Oct 2004, xenTro1)
- zebrafish (June 2004, danRer2)
The alignments in this track were generated using the
Mercator
orthology mapping program and the
MAVID
multiple global alignment program.
The Genome Browser companion tracks, MAVID Cons and MAVID Elements,
display conservation scoring and conserved elements for these alignments based
on various conservation methods.
Display Conventions and Configuration
In full display mode, this track shows pairwise alignments
of each species aligned to the human genome.
In dense mode, the alignments are depicted using a gray-scale
density gradient. The checkboxes in the track configuration section allow
the exclusion of species from the pairwise display.
When zoomed-in to the base-display level, the track shows the base
composition of each alignment. The numbers and symbols on the
"Gaps" line indicate the lengths of gaps in the human sequence at
those alignment positions relative to the longest non-human sequence. If there is
sufficient space in the display, the size of the gap is shown; if not, and if
the gap size is a multiple of 3, a "*" is displayed,
otherwise "+" is shown.
To view detailed information about the
alignments at a specific position, zoom in the display to 30,000 or fewer
bases, then click on the alignment.
Methods
Mercator was first used to identify the colinear and orthologous
segments in the sequences given for each ENCODE region.
Input to Mercator was generated by using Genscan to predict genes in all
sequences, Blat to compare predicted coding exons, and MUMmer to identify
non-coding exact matches between all
pairs of sequences. The output of Mercator was a small-scale one-to-one
orthology map for each ENCODE region, as well as a
set of alignment constraints based on matched landmarks (e.g., exons
and long non-coding exact matches).
MAVID was then used to construct a global
multiple alignment of each colinear orthologous segment set
specified in the orthology map. As part of its input, MAVID used a
phylogenetic tree determined from alignments of four-fold degenerate
sites in the ENCODE regions.
Credits
Generation of the MAVID alignments was engineered by Colin Dewey at the
Pachter Lab Comparative Genomics Group at UC Berkeley.
Mercator was written by Colin Dewey and Lior Pachter.
MAVID was authored by Nicholas Bray and Lior Pachter.
The phylogenetic tree is based on Murphy et al. (2001).
References
Bray, N. and Pachter, L.
MAVID: Constrained Ancestral Alignment of Multiple Sequences.
Genome Res 14(4), 693-699 (2004).
Burge, C. and Karlin, S.
Prediction of complete gene structures in human genomic DNA.
J Mol Biol 268(1), 78-94 (1997).
Dewey, C.N. and Pachter, L.
Mercator: multiple whole-genome orthology map construction.
In preparation.
Kent, W.J.
BLAT-the BLAST-like alignment tool.
Genome Res 12(4), 656-664 (2002).
Kurtz, S., Phillippy, A., Delcher, A.L., Smoot, M., Shumway, M., Antonescu, C.
and Salzberg, S.L.
Versatile and open software for comparing large genomes.
Genome Biol 5(2), R12 (2004).
Murphy, W.J., et al.
Resolution of the early placental mammal radiation using Bayesian phylogenetics. Science 294(5550), 2348-51 (2001).
|
|