Author
Listed:
- Stephen Coleman
- Lisa Breckels
- Ross F Waller
- Kathryn S Lilley
- Chris Wallace
- Oliver M Crook
- Paul D W Kirk
Abstract
The subcellular localisation of proteins is a key determinant of their function. High-throughput analyses of these localisations can be performed using mass spectrometry-based spatial proteomics, which enables us to examine the localisation and relocalisation of proteins. Furthermore, complementary data sources can provide additional sources of functional or localisation information. Examples include protein annotations and other high-throughput ‘omic assays. Integrating these modalities can provide new insights as well as additional confidence in results, but existing approaches for integrative analyses of spatial proteomics datasets, such as concatenation-based methods and transfer learning approaches like KNN-TL, are limited in the types of data they can integrate and do not quantify uncertainty in their predictions. Here we propose a semi-supervised Bayesian approach (wherein model parameters are inferred from both labeled marker proteins and unlabeled data while quantifying prediction uncertainty) to integrate spatial proteomics datasets with other data sources, to improve the inference of protein sub-cellular localisation. We demonstrate our approach outperforms other transfer-learning methods and has greater flexibility in the data it can model - including categorical annotations (e.g., Gene Ontology terms), continuous measurements (e.g., protein abundance), and temporal profiles (e.g., time-series expression data). To demonstrate the flexibility of our approach, we apply our method to integrate spatial proteomics data generated for the parasite Toxoplasma gondii with time-series gene expression data generated over its cell cycle. Our findings suggest that proteins linked to invasion organelles are associated with expression programs that peak at the end of the first cell-cycle. Furthermore, this integrative analysis divides the dense granule proteins into heterogeneous populations suggestive of potentially different functions. Our method is disseminated via the mdir R package available on the lead author’s Github.Author summary: Proteins are located in subcellular environments to ensure that they are near their interaction partners and occur in the correct biochemical environment to function. Where a protein is located can be determined from a number of data sources. To integrate diverse datasets together we develop an integrative Bayesian model to combine the information from several datasets in a principled manner. We learn how similar the dataset are as part of the modelling process and demonstrate the benefits of integrating mass-spectrometry based spatial proteomics data with timecourse gene-expression datasets.
Suggested Citation
Stephen Coleman & Lisa Breckels & Ross F Waller & Kathryn S Lilley & Chris Wallace & Oliver M Crook & Paul D W Kirk, 2025.
"Semi-supervised Bayesian integration of multiple spatial proteomics datasets,"
PLOS Computational Biology, Public Library of Science, vol. 21(12), pages 1-26, December.
Handle:
RePEc:plo:pcbi00:1013799
DOI: 10.1371/journal.pcbi.1013799
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1013799. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.