IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1006516.html
   My bibliography  Save this article

A Bayesian mixture modelling approach for spatial proteomics

Author

Listed:
  • Oliver M Crook
  • Claire M Mulvey
  • Paul D W Kirk
  • Kathryn S Lilley
  • Laurent Gatto

Abstract

Analysis of the spatial sub-cellular distribution of proteins is of vital importance to fully understand context specific protein function. Some proteins can be found with a single location within a cell, but up to half of proteins may reside in multiple locations, can dynamically re-localise, or reside within an unknown functional compartment. These considerations lead to uncertainty in associating a protein to a single location. Currently, mass spectrometry (MS) based spatial proteomics relies on supervised machine learning algorithms to assign proteins to sub-cellular locations based on common gradient profiles. However, such methods fail to quantify uncertainty associated with sub-cellular class assignment. Here we reformulate the framework on which we perform statistical analysis. We propose a Bayesian generative classifier based on Gaussian mixture models to assign proteins probabilistically to sub-cellular niches, thus proteins have a probability distribution over sub-cellular locations, with Bayesian computation performed using the expectation-maximisation (EM) algorithm, as well as Markov-chain Monte-Carlo (MCMC). Our methodology allows proteome-wide uncertainty quantification, thus adding a further layer to the analysis of spatial proteomics. Our framework is flexible, allowing many different systems to be analysed and reveals new modelling opportunities for spatial proteomics. We find our methods perform competitively with current state-of-the art machine learning methods, whilst simultaneously providing more information. We highlight several examples where classification based on the support vector machine is unable to make any conclusions, while uncertainty quantification using our approach provides biologically intriguing results. To our knowledge this is the first Bayesian model of MS-based spatial proteomics data.Author summary: Sub-cellular localisation of proteins provides insights into sub-cellular biological processes. For a protein to carry out its intended function it must be localised to the correct sub-cellular environment, whether that be organelles, vesicles or any sub-cellular niche. Correct sub-cellular localisation ensures the biochemical conditions for the protein to carry out its molecular function are met, as well as being near its intended interaction partners. Therefore, mis-localisation of proteins alters cell biochemistry and can disrupt, for example, signalling pathways or inhibit the trafficking of material around the cell. The sub-cellular distribution of proteins is complicated by proteins that can reside in multiple micro-environments, or those that move dynamically within the cell. Methods that predict protein sub-cellular localisation often fail to quantify the uncertainty that arises from the complex and dynamic nature of the sub-cellular environment. Here we present a Bayesian methodology to analyse protein sub-cellular localisation. We explicitly model our data and use Bayesian inference to quantify uncertainty in our predictions. We find our method is competitive with state-of-the-art machine learning methods and additionally provides uncertainty quantification. We show that, with this additional information, we can make deeper insights into the fundamental biochemistry of the cell.

Suggested Citation

  • Oliver M Crook & Claire M Mulvey & Paul D W Kirk & Kathryn S Lilley & Laurent Gatto, 2018. "A Bayesian mixture modelling approach for spatial proteomics," PLOS Computational Biology, Public Library of Science, vol. 14(11), pages 1-29, November.
  • Handle: RePEc:plo:pcbi00:1006516
    DOI: 10.1371/journal.pcbi.1006516
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1006516
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1006516&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1006516?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Pietro Coretto & Christian Hennig, 2016. "Robust Improper Maximum Likelihood: Tuning, Computation, and a Comparison With Other Methods for Robust Gaussian Clustering," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 111(516), pages 1648-1659, October.
    2. Lisa M Breckels & Sean B Holden & David Wojnar & Claire M Mulvey & Andy Christoforou & Arnoud Groen & Matthew W B Trotter & Oliver Kohlbacher & Kathryn S Lilley & Laurent Gatto, 2016. "Learning from Heterogeneous Data Sources: An Application in Spatial Proteomics," PLOS Computational Biology, Public Library of Science, vol. 12(5), pages 1-26, May.
    3. Gneiting, Tilmann & Raftery, Adrian E., 2007. "Strictly Proper Scoring Rules, Prediction, and Estimation," Journal of the American Statistical Association, American Statistical Association, vol. 102, pages 359-378, March.
    4. Chris Fraley & Adrian E. Raftery, 2007. "Bayesian Regularization for Normal Mixture Estimation and Model-Based Clustering," Journal of Classification, Springer;The Classification Society, vol. 24(2), pages 155-181, September.
    5. Andy Christoforou & Claire M. Mulvey & Lisa M. Breckels & Aikaterini Geladaki & Tracey Hurrell & Penelope C. Hayward & Thomas Naake & Laurent Gatto & Rosa Viner & Alfonso Martinez Arias & Kathryn S. L, 2016. "A draft map of the mouse pluripotent stem cell spatial proteome," Nature Communications, Nature, vol. 7(1), pages 1-12, April.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Oliver M Crook & Aikaterini Geladaki & Daniel J H Nightingale & Owen L Vennard & Kathryn S Lilley & Laurent Gatto & Paul D W Kirk, 2020. "A semi-supervised Bayesian approach for simultaneous protein sub-cellular localisation assignment and novelty detection," PLOS Computational Biology, Public Library of Science, vol. 16(11), pages 1-21, November.
    2. Oliver M. Crook & Colin T. R. Davies & Lisa M. Breckels & Josie A. Christopher & Laurent Gatto & Paul D. W. Kirk & Kathryn S. Lilley, 2022. "Inferring differential subcellular localisation in comparative spatial proteomics using BANDLE," Nature Communications, Nature, vol. 13(1), pages 1-21, December.
    3. Nicola M. Moloney & Konstantin Barylyuk & Eelco Tromer & Oliver M. Crook & Lisa M. Breckels & Kathryn S. Lilley & Ross F. Waller & Paula MacGregor, 2023. "Mapping diversity in African trypanosomes using high resolution spatial proteomics," Nature Communications, Nature, vol. 14(1), pages 1-16, December.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Oliver M Crook & Aikaterini Geladaki & Daniel J H Nightingale & Owen L Vennard & Kathryn S Lilley & Laurent Gatto & Paul D W Kirk, 2020. "A semi-supervised Bayesian approach for simultaneous protein sub-cellular localisation assignment and novelty detection," PLOS Computational Biology, Public Library of Science, vol. 16(11), pages 1-21, November.
    2. Oliver M. Crook & Colin T. R. Davies & Lisa M. Breckels & Josie A. Christopher & Laurent Gatto & Paul D. W. Kirk & Kathryn S. Lilley, 2022. "Inferring differential subcellular localisation in comparative spatial proteomics using BANDLE," Nature Communications, Nature, vol. 13(1), pages 1-21, December.
    3. Azar, Pablo D. & Micali, Silvio, 2018. "Computational principal agent problems," Theoretical Economics, Econometric Society, vol. 13(2), May.
    4. Angelica Gianfreda & Francesco Ravazzolo & Luca Rossini, 2023. "Large Time‐Varying Volatility Models for Hourly Electricity Prices," Oxford Bulletin of Economics and Statistics, Department of Economics, University of Oxford, vol. 85(3), pages 545-573, June.
    5. Davide Pettenuzzo & Francesco Ravazzolo, 2016. "Optimal Portfolio Choice Under Decision‐Based Model Combinations," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 31(7), pages 1312-1332, November.
    6. Rubio, F.J. & Steel, M.F.J., 2011. "Inference for grouped data with a truncated skew-Laplace distribution," Computational Statistics & Data Analysis, Elsevier, vol. 55(12), pages 3218-3231, December.
    7. Hwang, Eunju, 2022. "Prediction intervals of the COVID-19 cases by HAR models with growth rates and vaccination rates in top eight affected countries: Bootstrap improvement," Chaos, Solitons & Fractals, Elsevier, vol. 155(C).
    8. R de Fondeville & A C Davison, 2018. "High-dimensional peaks-over-threshold inference," Biometrika, Biometrika Trust, vol. 105(3), pages 575-592.
    9. Armantier, Olivier & Treich, Nicolas, 2013. "Eliciting beliefs: Proper scoring rules, incentives, stakes and hedging," European Economic Review, Elsevier, vol. 62(C), pages 17-40.
    10. Domenico Piccolo & Rosaria Simone, 2019. "The class of cub models: statistical foundations, inferential issues and empirical evidence," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 28(3), pages 389-435, September.
    11. Finn Lindgren, 2015. "Comments on: Comparing and selecting spatial predictors using local criteria," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 24(1), pages 35-44, March.
    12. Roberto Rocci & Stefano Antonio Gattone & Roberto Di Mari, 2018. "A data driven equivariant approach to constrained Gaussian mixture modeling," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 12(2), pages 235-260, June.
    13. Laura Liu & Hyungsik Roger Moon & Frank Schorfheide, 2023. "Forecasting with a panel Tobit model," Quantitative Economics, Econometric Society, vol. 14(1), pages 117-159, January.
    14. Warne, Anders, 2023. "DSGE model forecasting: rational expectations vs. adaptive learning," Working Paper Series 2768, European Central Bank.
    15. James Mitchell & Aubrey Poon & Dan Zhu, 2022. "Constructing Density Forecasts from Quantile Regressions: Multimodality in Macro-Financial Dynamics," Working Papers 22-12R, Federal Reserve Bank of Cleveland, revised 11 Apr 2023.
    16. Rafael Frongillo, 2022. "Quantum Information Elicitation," Papers 2203.07469, arXiv.org.
    17. Karimi, Majid & Zaerpour, Nima, 2022. "Put your money where your forecast is: Supply chain collaborative forecasting with cost-function-based prediction markets," European Journal of Operational Research, Elsevier, vol. 300(3), pages 1035-1049.
    18. Peysakhovich, Alexander & Plagborg-Møller, Mikkel, 2012. "A note on proper scoring rules and risk aversion," Economics Letters, Elsevier, vol. 117(1), pages 357-361.
    19. Sucharitha, Rahul Srinivas & Lee, Seokcheon, 2022. "GMM clustering for in-depth food accessibility pattern exploration and prediction model of food demand behavior," Socio-Economic Planning Sciences, Elsevier, vol. 83(C).
    20. Ranadeep Daw & Christopher K. Wikle, 2023. "REDS: Random ensemble deep spatial prediction," Environmetrics, John Wiley & Sons, Ltd., vol. 34(1), February.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1006516. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.