IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1003882.html
   My bibliography  Save this article

Likelihood-Based Gene Annotations for Gap Filling and Quality Assessment in Genome-Scale Metabolic Models

Author

Listed:
  • Matthew N Benedict
  • Michael B Mundy
  • Christopher S Henry
  • Nicholas Chia
  • Nathan D Price

Abstract

Genome-scale metabolic models provide a powerful means to harness information from genomes to deepen biological insights. With exponentially increasing sequencing capacity, there is an enormous need for automated reconstruction techniques that can provide more accurate models in a short time frame. Current methods for automated metabolic network reconstruction rely on gene and reaction annotations to build draft metabolic networks and algorithms to fill gaps in these networks. However, automated reconstruction is hampered by database inconsistencies, incorrect annotations, and gap filling largely without considering genomic information. Here we develop an approach for applying genomic information to predict alternative functions for genes and estimate their likelihoods from sequence homology. We show that computed likelihood values were significantly higher for annotations found in manually curated metabolic networks than those that were not. We then apply these alternative functional predictions to estimate reaction likelihoods, which are used in a new gap filling approach called likelihood-based gap filling to predict more genomically consistent solutions. To validate the likelihood-based gap filling approach, we applied it to models where essential pathways were removed, finding that likelihood-based gap filling identified more biologically relevant solutions than parsimony-based gap filling approaches. We also demonstrate that models gap filled using likelihood-based gap filling provide greater coverage and genomic consistency with metabolic gene functions compared to parsimony-based approaches. Interestingly, despite these findings, we found that likelihoods did not significantly affect consistency of gap filled models with Biolog and knockout lethality data. This indicates that the phenotype data alone cannot necessarily be used to discriminate between alternative solutions for gap filling and therefore, that the use of other information is necessary to obtain a more accurate network. All described workflows are implemented as part of the DOE Systems Biology Knowledgebase (KBase) and are publicly available via API or command-line web interface.Author Summary: Genome-scale metabolic modeling is a powerful approach that allows one to computationally simulate a variety of metabolic phenotypes. However, manually constructing accurate metabolic networks is extremely time intensive and it is thus desirable to have automated computational methods for providing high-quality metabolic networks. Incomplete knowledge of biological chemistries leads to missing, ambiguous, or inaccurate gene annotations, and thus gives rise to incomplete metabolic networks. Computational algorithms for filling these gaps in a metabolic model rely on network topology based approaches that can result in solutions that are inconsistent with existing genomic data. We developed an algorithm that directly incorporates genomic evidence into the decision-making process for gap filling reactions. This algorithm both maximizes the consistency of gap filled reactions with available genomic data and identifies candidate genes for gap filled reactions. The algorithm has been integrated into KBase's metabolic modeling service, an automated metabolic network reconstruction framework that includes the ModelSEED automated metabolic reconstruction tools.

Suggested Citation

  • Matthew N Benedict & Michael B Mundy & Christopher S Henry & Nicholas Chia & Nathan D Price, 2014. "Likelihood-Based Gene Annotations for Gap Filling and Quality Assessment in Genome-Scale Metabolic Models," PLOS Computational Biology, Public Library of Science, vol. 10(10), pages 1-14, October.
  • Handle: RePEc:plo:pcbi00:1003882
    DOI: 10.1371/journal.pcbi.1003882
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003882
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1003882&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1003882?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Alexandra M Schnoes & Shoshana D Brown & Igor Dodevski & Patricia C Babbitt, 2009. "Annotation Error in Public Databases: Misannotation of Molecular Function in Enzyme Superfamilies," PLOS Computational Biology, Public Library of Science, vol. 5(12), pages 1-13, December.
    2. Markus J Herrgård & Stephen S Fong & Bernhard Ø Palsson, 2006. "Identification of Genome-Scale Metabolic Network Models Using Experimentally Measured Flux Profiles," PLOS Computational Biology, Public Library of Science, vol. 2(7), pages 1-11, July.
    3. Rafael U. Ibarra & Jeremy S. Edwards & Bernhard O. Palsson, 2002. "Escherichia coli K-12 undergoes adaptive evolution to achieve in silico predicted optimal growth," Nature, Nature, vol. 420(6912), pages 186-189, November.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Rui Fa & Domenico Cozzetto & Cen Wan & David T Jones, 2018. "Predicting human protein function with multi-task deep neural networks," PLOS ONE, Public Library of Science, vol. 13(6), pages 1-16, June.
    2. Avraham E Mayo & Yaakov Setty & Seagull Shavit & Alon Zaslaver & Uri Alon, 2006. "Plasticity of the cis-Regulatory Input Function of a Gene," PLOS Biology, Public Library of Science, vol. 4(4), pages 1-1, March.
    3. Michal Brylinski & Daswanth Lingam, 2012. "eThread: A Highly Optimized Machine Learning-Based Approach to Meta-Threading and the Modeling of Protein Tertiary Structures," PLOS ONE, Public Library of Science, vol. 7(11), pages 1-12, November.
    4. Marcelo Rivas-Astroza & Raúl Conejeros, 2020. "Metabolic flux configuration determination using information entropy," PLOS ONE, Public Library of Science, vol. 15(12), pages 1-19, December.
    5. Thomas J Sharpton & Samantha J Riesenfeld & Steven W Kembel & Joshua Ladau & James P O'Dwyer & Jessica L Green & Jonathan A Eisen & Katherine S Pollard, 2011. "PhylOTU: A High-Throughput Procedure Quantifies Microbial Community Diversity and Resolves Novel Taxa from Metagenomic Data," PLOS Computational Biology, Public Library of Science, vol. 7(1), pages 1-13, January.
    6. Akira R Kinjo & Haruki Nakamura, 2012. "Composite Structural Motifs of Binding Sites for Delineating Biological Functions of Proteins," PLOS ONE, Public Library of Science, vol. 7(2), pages 1-11, February.
    7. Umberto Lucia & Giulia Grisolia, 2018. "Cyanobacteria and Microalgae : Thermoeconomic Considerations in Biofuel Production," Energies, MDPI, vol. 11(1), pages 1-16, January.
    8. Markus J Herrgård & Stephen S Fong & Bernhard Ø Palsson, 2006. "Identification of Genome-Scale Metabolic Network Models Using Experimentally Measured Flux Profiles," PLOS Computational Biology, Public Library of Science, vol. 2(7), pages 1-11, July.
    9. Iván Domenzain & Benjamín Sánchez & Mihail Anton & Eduard J. Kerkhoven & Aarón Millán-Oropeza & Céline Henry & Verena Siewers & John P. Morrissey & Nikolaus Sonnenschein & Jens Nielsen, 2022. "Reconstruction of a catalogue of genome-scale metabolic models with enzymatic constraints using GECKO 2.0," Nature Communications, Nature, vol. 13(1), pages 1-13, December.
    10. Elisa Boari de Lima & Wagner Meira Júnior & Raquel Cardoso de Melo-Minardi, 2016. "Isofunctional Protein Subfamily Detection Using Data Integration and Spectral Clustering," PLOS Computational Biology, Public Library of Science, vol. 12(6), pages 1-32, June.
    11. Claudio Altafini & Giuseppe Facchetti, 2015. "Metabolic Adaptation Processes That Converge to Optimal Biomass Flux Distributions," PLOS Computational Biology, Public Library of Science, vol. 11(9), pages 1-13, September.
    12. Wing-Cheong Wong & Sebastian Maurer-Stroh & Frank Eisenhaber, 2010. "More Than 1,001 Problems with Protein Domain Databases: Transmembrane Regions, Signal Peptides and the Issue of Sequence Homology," PLOS Computational Biology, Public Library of Science, vol. 6(7), pages 1-19, July.
    13. Andras Gyorgy, 2023. "Competition and evolutionary selection among core regulatory motifs in gene expression control," Nature Communications, Nature, vol. 14(1), pages 1-12, December.
    14. Lucia, Umberto, 2012. "Irreversibility in biophysical and biochemical engineering," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 391(23), pages 5997-6007.
    15. William R Harcombe & Nigel F Delaney & Nicholas Leiby & Niels Klitgord & Christopher J Marx, 2013. "The Ability of Flux Balance Analysis to Predict Evolution of Central Metabolism Scales with the Initial Distance to the Optimum," PLOS Computational Biology, Public Library of Science, vol. 9(6), pages 1-11, June.
    16. Yuval Bussi & Ruti Kapon & Ziv Reich, 2021. "Large-scale k-mer-based analysis of the informational properties of genomes, comparative genomics and taxonomy," PLOS ONE, Public Library of Science, vol. 16(10), pages 1-27, October.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1003882. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.