IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1004074.html
   My bibliography  Save this article

Machine Learning Assisted Design of Highly Active Peptides for Drug Discovery

Author

Listed:
  • Sébastien Giguère
  • François Laviolette
  • Mario Marchand
  • Denise Tremblay
  • Sylvain Moineau
  • Xinxia Liang
  • Éric Biron
  • Jacques Corbeil

Abstract

The discovery of peptides possessing high biological activity is very challenging due to the enormous diversity for which only a minority have the desired properties. To lower cost and reduce the time to obtain promising peptides, machine learning approaches can greatly assist in the process and even partly replace expensive laboratory experiments by learning a predictor with existing data or with a smaller amount of data generation. Unfortunately, once the model is learned, selecting peptides having the greatest predicted bioactivity often requires a prohibitive amount of computational time. For this combinatorial problem, heuristics and stochastic optimization methods are not guaranteed to find adequate solutions. We focused on recent advances in kernel methods and machine learning to learn a predictive model with proven success. For this type of model, we propose an efficient algorithm based on graph theory, that is guaranteed to find the peptides for which the model predicts maximal bioactivity. We also present a second algorithm capable of sorting the peptides of maximal bioactivity. Extensive analyses demonstrate how these algorithms can be part of an iterative combinatorial chemistry procedure to speed up the discovery and the validation of peptide leads. Moreover, the proposed approach does not require the use of known ligands for the target protein since it can leverage recent multi-target machine learning predictors where ligands for similar targets can serve as initial training data. Finally, we validated the proposed approach in vitro with the discovery of new cationic antimicrobial peptides. Source code freely available at http://graal.ift.ulaval.ca/peptide-design/.Author Summary: Part of the complexity of drug discovery is the sheer chemical diversity to explore combined to all requirements a compound must meet to become a commercial drug. Hence, it makes sense to automate this chemical exploration endeavor in a wise, informed, and efficient fashion. Here, we focused on peptides as they have properties that make them excellent drug starting points. Machine learning techniques may replace expensive in-vitro laboratory experiments by learning an accurate model of it. However, computational models also suffer from the combinatorial explosion due to the enormous chemical diversity. Indeed, applying the model to every peptides would take an astronomical amount of computer time. Therefore, given a model, is it possible to determine, using reasonable computational time, the peptide that has the best properties and chance for success? This exact question is what motivated our work. We focused on recent advances in kernel methods and machine learning to learn a model that already had excellent results. We demonstrate that this class of model has mathematical properties that makes it possible to rapidly identify and sort the best peptides. Finally, in-vitro and in-silico results are provided to support and validate this theoretical discovery.

Suggested Citation

  • Sébastien Giguère & François Laviolette & Mario Marchand & Denise Tremblay & Sylvain Moineau & Xinxia Liang & Éric Biron & Jacques Corbeil, 2015. "Machine Learning Assisted Design of Highly Active Peptides for Drug Discovery," PLOS Computational Biology, Public Library of Science, vol. 11(4), pages 1-21, April.
  • Handle: RePEc:plo:pcbi00:1004074
    DOI: 10.1371/journal.pcbi.1004074
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004074
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1004074&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1004074?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Jin Y. Yen, 1971. "Finding the K Shortest Loopless Paths in a Network," Management Science, INFORMS, vol. 17(11), pages 712-716, July.
    2. Eugene L. Lawler, 1972. "A Procedure for Computing the K Best Solutions to Discrete Optimization Problems and Its Application to the Shortest Path Problem," Management Science, INFORMS, vol. 18(7), pages 401-405, March.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Georgia Melagraki & Evangelos Ntougkos & Vagelis Rinotas & Christos Papaneophytou & Georgios Leonis & Thomas Mavromoustakos & George Kontopidis & Eleni Douni & Antreas Afantitis & George Kollias, 2017. "Cheminformatics-aided discovery of small-molecule Protein-Protein Interaction (PPI) dual inhibitors of Tumor Necrosis Factor (TNF) and Receptor Activator of NF-κB Ligand (RANKL)," PLOS Computational Biology, Public Library of Science, vol. 13(4), pages 1-27, April.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Fernández, Elena & Pozo, Miguel A. & Puerto, Justo & Scozzari, Andrea, 2017. "Ordered Weighted Average optimization in Multiobjective Spanning Tree Problem," European Journal of Operational Research, Elsevier, vol. 260(3), pages 886-903.
    2. Francesca Guerriero & Roberto Musmanno & Valerio Lacagnina & Antonio Pecorella, 2001. "A Class of Label-Correcting Methods for the K Shortest Paths Problem," Operations Research, INFORMS, vol. 49(3), pages 423-429, June.
    3. Pushak, Yasha & Hare, Warren & Lucet, Yves, 2016. "Multiple-path selection for new highway alignments using discrete algorithms," European Journal of Operational Research, Elsevier, vol. 248(2), pages 415-427.
    4. Maren Martens & Martin Skutella, 2009. "Flows with unit path capacities and related packing and covering problems," Journal of Combinatorial Optimization, Springer, vol. 18(3), pages 272-293, October.
    5. Guazzelli, Cauê Sauter & Cunha, Claudio B., 2018. "Exploring K-best solutions to enrich network design decision-making," Omega, Elsevier, vol. 78(C), pages 139-164.
    6. Sedeño-Noda, Antonio & González-Martín, Carlos, 2010. "On the K shortest path trees problem," European Journal of Operational Research, Elsevier, vol. 202(3), pages 628-635, May.
    7. Pascoal, Marta M.B. & Sedeño-Noda, Antonio, 2012. "Enumerating K best paths in length order in DAGs," European Journal of Operational Research, Elsevier, vol. 221(2), pages 308-316.
    8. Antonio Sedeño-Noda, 2016. "Ranking One Million Simple Paths in Road Networks," Asia-Pacific Journal of Operational Research (APJOR), World Scientific Publishing Co. Pte. Ltd., vol. 33(05), pages 1-20, October.
    9. Sebastio, Stefano & Trivedi, Kishor S. & Wang, Dazhi & Yin, Xiaoyan, 2014. "Fast computation of bounds for two-terminal network reliability," European Journal of Operational Research, Elsevier, vol. 238(3), pages 810-823.
    10. Quang Minh Bui & Bernard Gendron & Margarida Carvalho, 2022. "A Catalog of Formulations for the Network Pricing Problem," INFORMS Journal on Computing, INFORMS, vol. 34(5), pages 2658-2674, September.
    11. Carvalho, Margarida & Lodi, Andrea, 2023. "A theoretical and computational equilibria analysis of a multi-player kidney exchange program," European Journal of Operational Research, Elsevier, vol. 305(1), pages 373-385.
    12. Huili Zhang & Yinfeng Xu & Xingang Wen, 2015. "Optimal shortest path set problem in undirected graphs," Journal of Combinatorial Optimization, Springer, vol. 29(3), pages 511-530, April.
    13. Daria Dzyabura & Srikanth Jagabathula, 2018. "Offline Assortment Optimization in the Presence of an Online Channel," Management Science, INFORMS, vol. 64(6), pages 2767-2786, June.
    14. Cowell, Robert G., 2013. "A simple greedy algorithm for reconstructing pedigrees," Theoretical Population Biology, Elsevier, vol. 83(C), pages 55-63.
    15. Melchiori, Anna & Sgalambro, Antonino, 2020. "A branch and price algorithm to solve the Quickest Multicommodity k-splittable Flow Problem," European Journal of Operational Research, Elsevier, vol. 282(3), pages 846-857.
    16. Luss, Hanan & Wong, Richard T., 2005. "Graceful reassignment of excessively long communications paths in networks," European Journal of Operational Research, Elsevier, vol. 160(2), pages 395-415, January.
    17. Rinaldi, Marco & Viti, Francesco, 2017. "Exact and approximate route set generation for resilient partial observability in sensor location problems," Transportation Research Part B: Methodological, Elsevier, vol. 105(C), pages 86-119.
    18. Timothy M. Sweda & Irina S. Dolinskaya & Diego Klabjan, 2017. "Adaptive Routing and Recharging Policies for Electric Vehicles," Transportation Science, INFORMS, vol. 51(4), pages 1326-1348, November.
    19. Chen, Bi Yu & Chen, Xiao-Wei & Chen, Hui-Ping & Lam, William H.K., 2020. "Efficient algorithm for finding k shortest paths based on re-optimization technique," Transportation Research Part E: Logistics and Transportation Review, Elsevier, vol. 133(C).
    20. Doan, Xuan Vinh, 2022. "Distributionally robust optimization under endogenous uncertainty with an application in retrofitting planning," European Journal of Operational Research, Elsevier, vol. 300(1), pages 73-84.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1004074. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.