IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0042057.html
   My bibliography  Save this article

Effect of Reference Genome Selection on the Performance of Computational Methods for Genome-Wide Protein-Protein Interaction Prediction

Author

Listed:
  • Vijaykumar Yogesh Muley
  • Akash Ranjan

Abstract

Background: Recent progress in computational methods for predicting physical and functional protein-protein interactions has provided new insights into the complexity of biological processes. Most of these methods assume that functionally interacting proteins are likely to have a shared evolutionary history. This history can be traced out for the protein pairs of a query genome by correlating different evolutionary aspects of their homologs in multiple genomes known as the reference genomes. These methods include phylogenetic profiling, gene neighborhood and co-occurrence of the orthologous protein coding genes in the same cluster or operon. These are collectively known as genomic context methods. On the other hand a method called mirrortree is based on the similarity of phylogenetic trees between two interacting proteins. Comprehensive performance analyses of these methods have been frequently reported in literature. However, very few studies provide insight into the effect of reference genome selection on detection of meaningful protein interactions. Methods: We analyzed the performance of four methods and their variants to understand the effect of reference genome selection on prediction efficacy. We used six sets of reference genomes, sampled in accordance with phylogenetic diversity and relationship between organisms from 565 bacteria. We used Escherichia coli as a model organism and the gold standard datasets of interacting proteins reported in DIP, EcoCyc and KEGG databases to compare the performance of the prediction methods. Conclusions: Higher performance for predicting protein-protein interactions was achievable even with 100–150 bacterial genomes out of 565 genomes. Inclusion of archaeal genomes in the reference genome set improves performance. We find that in order to obtain a good performance, it is better to sample few genomes of related genera of prokaryotes from the large number of available genomes. Moreover, such a sampling allows for selecting 50–100 genomes for comparable accuracy of predictions when computational resources are limited.

Suggested Citation

  • Vijaykumar Yogesh Muley & Akash Ranjan, 2012. "Effect of Reference Genome Selection on the Performance of Computational Methods for Genome-Wide Protein-Protein Interaction Prediction," PLOS ONE, Public Library of Science, vol. 7(7), pages 1-13, July.
  • Handle: RePEc:plo:pone00:0042057
    DOI: 10.1371/journal.pone.0042057
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0042057
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0042057&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0042057?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Benjamin A Shoemaker & Anna R Panchenko, 2007. "Deciphering Protein–Protein Interactions. Part II. Computational Methods to Predict Protein and Domain Interaction Partners," PLOS Computational Biology, Public Library of Science, vol. 3(4), pages 1-7, April.
    2. Anton J. Enright & Ioannis Iliopoulos & Nikos C. Kyrpides & Christos A. Ouzounis, 1999. "Protein interaction maps for complete genomes based on gene fusion events," Nature, Nature, vol. 402(6757), pages 86-90, November.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Chuanhua Xing & David B Dunson, 2011. "Bayesian Inference for Genomic Data Integration Reduces Misclassification Rate in Predicting Protein-Protein Interactions," PLOS Computational Biology, Public Library of Science, vol. 7(7), pages 1-10, July.
    2. Saeid Rasti & Chrysafis Vogiatzis, 2019. "A survey of computational methods in protein–protein interaction networks," Annals of Operations Research, Springer, vol. 276(1), pages 35-87, May.
    3. Xinyi Liu & Bin Liu & Zhimin Huang & Ting Shi & Yingyi Chen & Jian Zhang, 2012. "SPPS: A Sequence-Based Method for Predicting Probability of Protein-Protein Interaction Partners," PLOS ONE, Public Library of Science, vol. 7(1), pages 1-6, January.
    4. Colizza, Vittoria & Flammini, Alessandro & Maritan, Amos & Vespignani, Alessandro, 2005. "Characterization and modeling of protein–protein interaction networks," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 352(1), pages 1-27.
    5. Sayed Mohammad Ebrahim Sahraeian & Byung-Jun Yoon, 2012. "A Network Synthesis Model for Generating Protein Interaction Network Families," PLOS ONE, Public Library of Science, vol. 7(8), pages 1-14, August.
    6. Saket Navlakha & Anthony Gitter & Ziv Bar-Joseph, 2012. "A Network-based Approach for Predicting Missing Pathway Interactions," PLOS Computational Biology, Public Library of Science, vol. 8(8), pages 1-13, August.
    7. Beatriz García-Jiménez & David Juan & Iakes Ezkurdia & Eduardo Andrés-León & Alfonso Valencia, 2010. "Inference of Functional Relations in Predicted Protein Networks with a Machine Learning Approach," PLOS ONE, Public Library of Science, vol. 5(4), pages 1-10, April.
    8. Guilherme T Valente & Marcio L Acencio & Cesar Martins & Ney Lemke, 2013. "The Development of a Universal In Silico Predictor of Protein-Protein Interactions," PLOS ONE, Public Library of Science, vol. 8(5), pages 1-11, May.
    9. Wei Zhang & Jia Xu & Yuanyuan Li & Xiufen Zou, 2017. "A new two-stage method for revealing missing parts of edges in protein-protein interaction networks," PLOS ONE, Public Library of Science, vol. 12(5), pages 1-22, May.
    10. Jana Kludas & Mikko Arvas & Sandra Castillo & Tiina Pakula & Merja Oja & Céline Brouard & Jussi Jäntti & Merja Penttilä & Juho Rousu, 2016. "Machine Learning of Protein Interactions in Fungal Secretory Pathways," PLOS ONE, Public Library of Science, vol. 11(7), pages 1-20, July.
    11. Chittibabu Guda & Brian R King & Lipika R Pal & Purnima Guda, 2009. "A Top-Down Approach to Infer and Compare Domain-Domain Interactions across Eight Model Organisms," PLOS ONE, Public Library of Science, vol. 4(3), pages 1-15, March.
    12. Hai-Bo Zhang & Xiao-Bao Ding & Jie Jin & Wen-Ping Guo & Qiao-Lei Yang & Peng-Cheng Chen & Heng Yao & Li Ruan & Yu-Tian Tao & Xin Chen, 2022. "Predicted mouse interactome and network-based interpretation of differentially expressed genes," PLOS ONE, Public Library of Science, vol. 17(4), pages 1-16, April.
    13. Zhu-Hong You & Keith C C Chan & Pengwei Hu, 2015. "Predicting Protein-Protein Interactions from Primary Protein Sequences Using a Novel Multi-Scale Local Feature Representation Scheme and the Random Forest," PLOS ONE, Public Library of Science, vol. 10(5), pages 1-19, May.
    14. Benjamin A Shoemaker & Anna R Panchenko, 2007. "Deciphering Protein–Protein Interactions. Part II. Computational Methods to Predict Protein and Domain Interaction Partners," PLOS Computational Biology, Public Library of Science, vol. 3(4), pages 1-7, April.
    15. Xue Wang & Yuejin Wu & Rujing Wang & Yuanyuan Wei & Yuanmiao Gui, 2019. "A novel matrix of sequence descriptors for predicting protein-protein interactions from amino acid sequences," PLOS ONE, Public Library of Science, vol. 14(6), pages 1-12, June.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0042057. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.