Schema for PeptideAtlas - Peptide sequences identified from MS spectra of 971 samples by PeptideAtlas
  Database: hg19    Primary Table: peptideAtlas2014    Row Count: 1,189,306   Data last updated: 2014-11-21
Format description: Browser extensible data
On download server: MariaDB table dump directory
fieldexampleSQL type description
bin 586smallint(5) unsigned Indexing field to speed chromosome range queries.
chrom chr1varchar(255) Reference sequence chromosome or scaffold
chromStart 138946int(10) unsigned Start position in chromosome
chromEnd 138967int(10) unsigned End position in chromosome
name PAp01816162varchar(255) Name of item
score 3int(10) unsigned Optional score, nominal range 0-1000
strand -char(1) + or -
thickStart 138946int(10) unsigned Start of where display should be thick (start codon)
thickEnd 138967int(10) unsigned End of where display should be thick (stop codon)
reserved 0int(10) unsigned Used as itemRgb as of 2004-11-22
blockCount 1int(10) unsigned Number of blocks
blockSizes 21longblob Comma separated list of block sizes
chromStarts 0longblob Start positions relative to chromStart

Connected Tables and Joining Fields
        hgFixed.peptideAtlas2014Peptides.accession (via peptideAtlas2014.name)

Sample Rows
 
binchromchromStartchromEndnamescorestrandthickStartthickEndreservedblockCountblockSizeschromStarts
586chr1138946138967PAp018161623-13894613896701210
591chr1874493874674PAp029705183+8744938746740216,200,161
591chr1874716874764PAp015710731+87471687476401480
591chr1876567876618PAp015577121+87656787661801510
591chr1876645877516PAp015575751+8766458775160241,10,870
591chr1877854877972PAp049215311+8778548779720214,340,84
591chr1877939877966PAp016471774+87793987796601270
591chr1877966878053PAp049298211+87796687805301870
591chr1877972878053PAp045791593+87797287805301810
591chr1877990878053PAp047534071+87799087805301630

Note: all start coordinates in our database are 0-based, not 1-based. See explanation here.

PeptideAtlas (peptideAtlas2014) Track Description
 

Description

PeptideAtlas collects raw mass spectrometry proteomics datasets from laboratories around the world and reprocesses them in a uniform bioinformatics workflow using the Trans-Proteomic Pipeline . This track displays peptide identifications from the PeptideAtlas August 2014 (Build 433) Human build. This build, based on 971 samples containing 420,607,360 spectra, identified 1,021,823 distinct peptides, covering 15,136 canonical proteins.

Each PeptideAtlas build comprises a set of reprocessed experiments from a single species or subset of samples (such has human plasma) from a species. Processed results are filtered to a quality level such that there is a 1% false discovery rate at the protein level. All peptide identifications of sufficient quality to enter a build are mapped to the Ensembl genome (v75) using the Ensembl toolkit. Genomic coordinates for all identified peptides to all their Ensembl protein, transcript, and gene mappings, including intron spans, as calculated by the Ensembl toolkit are stored in the PeptideAtlas database.

All peptide sequences in the August 2014 human build (including unmapped sequences) are available for download in FASTA format.

Methods

Mass spectrometer spectra are compared to theoretical spectra (SEQUEST, X!Tandem) or actual spectra (SpectraST) to identify possible peptides. These peptide identifications are scored and filtered (using PeptideProphet) to retain only the highest scoring identifications. The filtered sequences are compared to protein sequence databases (for human, Ensembl, IPI, and Swiss-Prot). The CDS coordinates relative to protein start of matched sequences are used to then calculate genomic coordinates. The protein identifications are then clustered and annotated using ProteinProphet, and stored in the SBEAMS database, where they assigned a unique identifer of the form PAp[8 digit number], e.g. PAp00000001. The processing pipeline is summarized in the graphic below.

PeptideAtlas Methods

Credits

Eric Deutsch, Zhi Sun, and the PeptideAtlas team at the Institute for Systems Biology, Seattle.

References

Desiere F, Deutsch EW, King NL, Nesvizhskii AI, Mallick P, Eng J, Chen S, Eddes J, Loevenich SN, Aebersold R. The PeptideAtlas project. Nucleic Acids Res. 2006 Jan 1;34(Database issue):D655-8. PMID: 16381952; PMC: PMC1347403

Farrah T, Deutsch EW, Omenn GS, Sun Z, Watts JD, Yamamoto T, Shteynberg D, Harris MM, Moritz RL. State of the human proteome in 2013 as viewed through PeptideAtlas: comparing the kidney, urine, and plasma proteomes for the biology- and disease-driven Human Proteome Project. J Proteome Res. 2014 Jan 3;13(1):60-75. PMID: 24261998; PMC: PMC3951210

Keller A, Nesvizhskii AI, Kolker E, Aebersold R. Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Anal Chem. 2002 Oct 15;74(20):5383-92. PMID: 12403597

Nesvizhskii AI, Keller A, Kolker E, Aebersold R. A statistical model for identifying proteins by tandem mass spectrometry. Anal Chem. 2003 Sep 1;75(17):4646-58. PMID: 14632076