Please use this identifier to cite or link to this item:
Title: Refining Transcriptome Gene Catalogs by MS-validation of expressed proteins
Authors: Tse, SPK 
Beauchemin, M
Morse, D
Lo, SCL 
Keywords: Dinoflagellate
Issue Date: 2018
Publisher: Wiley-VCH
Source: Proteomics, 2018, v. 18, no. 1, 1700271 How to cite?
Journal: Proteomics 
Abstract: Protein sequence identification by tandem mass spectroscopy (LC-MS/MS) identifies thousands of protein sequences even in complex mixtures, and provides valuable insight into the biological functions of different cells. For non-model organisms, transcriptomes are generally used to allow peptide identification, an important addition to their use as a gene catalog allowing the potential metabolic activities of cells to be determined. We used LC-MS/MS data to identify which of the six possible reading frames in the transcriptome was actually used by the cell to make protein, and asked whether this would have an impact on downstream analyses using the dataset. We combined results from several LC-MS/MS experiments designed to identify peptide sequences in extracts from the dinoflagellate Lingulodinium polyedra using a 74 655-sequence transcriptome. We compiled a list of 6628 translated nucleic acid sequences that contained the ensemble of peptide matches (termed MS-validated sequences) and assessed the similarity in downstream analyses between this data set and the 6628 nucleic acid sequences from which they were derived. When compared with BLASTx analyses of the DNA sequences, the MS-validated protein-sequences-analyzed using BLASTp showed differences in gene ontology, had more identified BLAST hits, and contained more KEGG pathway enzymes. The MS-validated protein sequences also differ from datasets containing longest open reading frame (ORF) protein sequences. We also note a poor correlation between the levels of protein and mRNA abundance, a comparison not previously performed for dinoflagellates. The differences observed between analyses of MS-validated protein sequence and nucleic acid sequence datasets suggest use of the former may provide a more accurate representation of cellular capacity than the latter. Developing MS-validated protein sequence datasets may also speed interpretation of MS-MS spectra in bottom up proteomics experiments.
ISSN: 1615-9853
EISSN: 1615-9861
DOI: 10.1002/pmic.201700271
Appears in Collections:Journal/Magazine Article

View full-text via PolyU eLinks SFX Query
Show full item record


Citations as of Sep 11, 2018


Citations as of Sep 18, 2018

Page view(s)

Citations as of Sep 18, 2018

Google ScholarTM



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.