IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1010702.html
   My bibliography  Save this article

Protein prediction models support widespread post-transcriptional regulation of protein abundance by interacting partners

Author

Listed:
  • Himangi Srivastava
  • Michael J Lippincott
  • Jordan Currie
  • Robert Canfield
  • Maggie P Y Lam
  • Edward Lau

Abstract

Protein and mRNA levels correlate only moderately. The availability of proteogenomics data sets with protein and transcript measurements from matching samples is providing new opportunities to assess the degree to which protein levels in a system can be predicted from mRNA information. Here we examined the contributions of input features in protein abundance prediction models. Using large proteogenomics data from 8 cancer types within the Clinical Proteomic Tumor Analysis Consortium (CPTAC) data set, we trained models to predict the abundance of over 13,000 proteins using matching transcriptome data from up to 958 tumor or normal adjacent tissue samples each, and compared predictive performances across algorithms, data set sizes, and input features. Over one-third of proteins (4,648) showed relatively poor predictability (elastic net r ≤ 0.3) from their cognate transcripts. Moreover, we found widespread occurrences where the abundance of a protein is considerably less well explained by its own cognate transcript level than that of one or more trans locus transcripts. The incorporation of additional trans-locus transcript abundance data as input features increasingly improved the ability to predict sample protein abundance. Transcripts that contribute to non-cognate protein abundance primarily involve those encoding known or predicted interaction partners of the protein of interest, including not only large multi-protein complexes as previously shown, but also small stable complexes in the proteome with only one or few stable interacting partners. Network analysis further shows a complex proteome-wide interdependency of protein abundance on the transcript levels of multiple interacting partners. The predictive model analysis here therefore supports that protein-protein interaction including in small protein complexes exert post-transcriptional influence on proteome compositions more broadly than previously recognized. Moreover, the results suggest mRNA and protein co-expression analysis may have utility for finding gene interactions and predicting expression changes in biological systems.Author summary: The abundance of mRNA is often measured as a surrogate variable of protein levels, but how well the mRNA level of different genes correlate with their protein across samples remains incompletely understood. Here we trained machine learning models over large RNA sequencing and mass spectrometry data from up to 8 cancer types in the CPTAC data sets to evaluate how well protein level variances across samples can be predicted from their transcripts. Despite voluminous data, up to one-third of genes shows poor mRNA-protein correlation suggesting their protein abundance is not primarily regulated from cognate transcripts. The inclusion of mRNA level information from protein interaction partners into the prediction models substantially improved prediction performance for a subset of genes, suggesting their protein abundance may be primarily regulated post-transcriptionally through protein-protein interactions. Notably, these proteins involve not only subunits of large multi-protein complexes such as the ribosome as previously suspected, but many proteins that form stable interactions with one or few other partners, including the propionyl-CoA carboxylase, mitochondrial calcium uniporter, calcineurin, and others. The results add to emerging evidence of independent regulation of protein levels from their cognate transcripts and suggest avenues to improve the interpretation of transcriptomics data.

Suggested Citation

  • Himangi Srivastava & Michael J Lippincott & Jordan Currie & Robert Canfield & Maggie P Y Lam & Edward Lau, 2022. "Protein prediction models support widespread post-transcriptional regulation of protein abundance by interacting partners," PLOS Computational Biology, Public Library of Science, vol. 18(11), pages 1-27, November.
  • Handle: RePEc:plo:pcbi00:1010702
    DOI: 10.1371/journal.pcbi.1010702
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1010702
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1010702&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1010702?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1010702. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.