IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0288526.html
   My bibliography  Save this article

Analyzing the correlation between protein expression and sequence-related features of mRNA and protein in Escherichia coli K-12 MG1655 model

Author

Listed:
  • Nhat HM Truong
  • Nam T. Vo
  • Binh T Nguyen
  • Son T Huynh
  • Hoang D Nguyen

Abstract

It was necessary to have a tool that could predict the amount of protein and optimize the gene sequences to produce recombinant proteins efficiently. The Transim model published by Tuller et al. in 2018 can calculate the translation rate in E. coli using features on the mRNA sequence, achieving a Spearman correlation with the amount of protein per mRNA of 0.36 when tested on the dataset of operons’ first genes in E. coli K-12 MG1655 genome. However, this Spearman correlation was not high, and the model did not fully consider the features of mRNA and protein sequences. Therefore, to enhance the prediction capability, our study firstly tried expanding the testing dataset, adding genes inside the operon, and using the microarray of the mRNA expression data set, thereby helping to improve the correlation of translation rate with the amount of protein with more than 0.42. Next, the applicability of 6 traditional machine learning models to calculate a "new translation rate" was examined using initiation rate and elongation rate as inputs. The result showed that the SVR algorithm had the most correlated new translation rates, with Spearman correlation improving to R = 0.6699 with protein level output and to R = 0.6536 with protein level per mRNA. Finally, the study investigated the degree of improvement when combining more features with the new translation rates. The results showed that the model’s predictive ability to produce a protein per mRNA reached R = 0.6660 when using six features, while the correlation of this model’s final translation rate to protein level was up to R = 0.6729. This demonstrated the model’s capability to predict protein expression of a gene, rather than being limited to predicting expression by an mRNA and showed the model’s potential for development into gene expression predicting tools.

Suggested Citation

  • Nhat HM Truong & Nam T. Vo & Binh T Nguyen & Son T Huynh & Hoang D Nguyen, 2024. "Analyzing the correlation between protein expression and sequence-related features of mRNA and protein in Escherichia coli K-12 MG1655 model," PLOS ONE, Public Library of Science, vol. 19(2), pages 1-19, February.
  • Handle: RePEc:plo:pone00:0288526
    DOI: 10.1371/journal.pone.0288526
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0288526
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0288526&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0288526?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Markus W. Covert & Eric M. Knight & Jennifer L. Reed & Markus J. Herrgard & Bernhard O. Palsson, 2004. "Integrating high-throughput and computational data elucidates bacterial networks," Nature, Nature, vol. 429(6987), pages 92-96, May.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Pan-Jun Kim & Nathan D Price, 2011. "Genetic Co-Occurrence Network across Sequenced Microbes," PLOS Computational Biology, Public Library of Science, vol. 7(12), pages 1-9, December.
    2. Cheemeng Tan & Robert Phillip Smith & Ming-Chi Tsai & Russell Schwartz & Lingchong You, 2014. "Phenotypic Signatures Arising from Unbalanced Bacterial Growth," PLOS Computational Biology, Public Library of Science, vol. 10(8), pages 1-10, August.
    3. Joel A Paulson & Marc Martin-Casas & Ali Mesbah, 2019. "Fast uncertainty quantification for dynamic flux balance analysis using non-smooth polynomial chaos expansions," PLOS Computational Biology, Public Library of Science, vol. 15(8), pages 1-35, August.
    4. Jeremiah J Faith & Boris Hayete & Joshua T Thaden & Ilaria Mogno & Jamey Wierzbowski & Guillaume Cottarel & Simon Kasif & James J Collins & Timothy S Gardner, 2007. "Large-Scale Mapping and Validation of Escherichia coli Transcriptional Regulation from a Compendium of Expression Profiles," PLOS Biology, Public Library of Science, vol. 5(1), pages 1-13, January.
    5. repec:plo:pcbi00:1006835 is not listed on IDEAS
    6. Eamon Duede & Victor Zhorin, 2016. "Convergence of Economic Growth and the Great Recession as Seen From a Celestial Observatory," Papers 1604.04312, arXiv.org, revised Aug 2016.
    7. Markus Maucher & David Kracht & Steffen Schober & Martin Bossert & Hans Kestler, 2014. "Inferring Boolean functions via higher-order correlations," Computational Statistics, Springer, vol. 29(1), pages 97-115, February.
    8. Scott A Becker & Bernhard O Palsson, 2008. "Context-Specific Metabolic Networks Are Consistent with Experiments," PLOS Computational Biology, Public Library of Science, vol. 4(5), pages 1-10, May.
    9. repec:plo:pbio00:1000115 is not listed on IDEAS
    10. Niels Klitgord & Daniel Segrè, 2010. "Environments that Induce Synthetic Microbial Ecosystems," PLOS Computational Biology, Public Library of Science, vol. 6(11), pages 1-17, November.
    11. repec:plo:pcbi00:1002021 is not listed on IDEAS
    12. Christian L Barrett & Bernhard O Palsson, 2006. "Iterative Reconstruction of Transcriptional Regulatory Networks: An Algorithmic Approach," PLOS Computational Biology, Public Library of Science, vol. 2(5), pages 1-10, May.
    13. repec:plo:pone00:0013080 is not listed on IDEAS

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0288526. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.