Author
Listed:
- Kristina Edfeldt
(Karolinska University Hospital and Karolinska Institutet)
- Aled M. Edwards
(University of Toronto)
- Ola Engkvist
(Chalmers University of Technology)
- Judith Günther
(Computational Molecular Design)
- Matthew Hartley
(Wellcome Genome Campus)
- David G. Hulcoop
(Wellcome Genome Campus
Wellcome Genome Campus)
- Andrew R. Leach
(Wellcome Genome Campus)
- Brian D. Marsden
(University of Oxford)
- Amelie Menge
(Johann Wolfgang Goethe University, Frankfurt am Main, 60438, Germany & Structural Genomics Consortium (SGC), Buchmann Institute for Life Sciences, Johann Wolfgang Goethe University)
- Leonie Misquitta
(National Institutes of Health)
- Susanne Müller
(Johann Wolfgang Goethe University, Frankfurt am Main, 60438, Germany & Structural Genomics Consortium (SGC), Buchmann Institute for Life Sciences, Johann Wolfgang Goethe University)
- Dafydd R. Owen
(Development & Medical)
- Kristof T. Schütt
(Machine Learning & Computational Sciences)
- Nicholas Skelton
(Genentech, Inc.)
- Andreas Steffen
(Machine Learning & Computational Sciences)
- Alexander Tropsha
(University of North Carolina)
- Erik Vernet
(Novo Nordisk A/S)
- Yanli Wang
(National Institutes of Health)
- James Wellnitz
(University of North Carolina)
- Timothy M. Willson
(University of North Carolina at Chapel Hill)
- Djork-Arné Clevert
(Machine Learning & Computational Sciences)
- Benjamin Haibe-Kains
(University of Toronto
University Health Network
University of Toronto
Vector Institute for Artificial Intelligence)
- Lovisa Holmberg Schiavone
(AstraZeneca)
- Matthieu Schapira
(University of Toronto
University of Toronto)
Abstract
The Structural Genomics Consortium is an international open science research organization with a focus on accelerating early-stage drug discovery, namely hit discovery and optimization. We, as many others, believe that artificial intelligence (AI) is poised to be a main accelerator in the field. The question is then how to best benefit from recent advances in AI and how to generate, format and disseminate data to enable future breakthroughs in AI-guided drug discovery. We present here the recommendations of a working group composed of experts from both the public and private sectors. Robust data management requires precise ontologies and standardized vocabulary while a centralized database architecture across laboratories facilitates data integration into high-value datasets. Lab automation and opening electronic lab notebooks to data mining push the boundaries of data sharing and data modeling. Important considerations for building robust machine-learning models include transparent and reproducible data processing, choosing the most relevant data representation, defining the right training and test sets, and estimating prediction uncertainty. Beyond data-sharing, cloud-based computing can be harnessed to build and disseminate machine-learning models. Important vectors of acceleration for hit and chemical probe discovery will be (1) the real-time integration of experimental data generation and modeling workflows within design-make-test-analyze (DMTA) cycles openly, and at scale and (2) the adoption of a mindset where data scientists and experimentalists work as a unified team, and where data science is incorporated into the experimental design.
Suggested Citation
Kristina Edfeldt & Aled M. Edwards & Ola Engkvist & Judith Günther & Matthew Hartley & David G. Hulcoop & Andrew R. Leach & Brian D. Marsden & Amelie Menge & Leonie Misquitta & Susanne Müller & Dafydd, 2024.
"A data science roadmap for open science organizations engaged in early-stage drug discovery,"
Nature Communications, Nature, vol. 15(1), pages 1-10, December.
Handle:
RePEc:nat:natcom:v:15:y:2024:i:1:d:10.1038_s41467-024-49777-x
DOI: 10.1038/s41467-024-49777-x
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:natcom:v:15:y:2024:i:1:d:10.1038_s41467-024-49777-x. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.