IDEAS home Printed from https://ideas.repec.org/a/nat/natcom/v16y2025i1d10.1038_s41467-025-57031-1.html
   My bibliography  Save this article

Systematic representation and optimization enable the inverse design of cross-species regulatory sequences in bacteria

Author

Listed:
  • Pengcheng Zhang

    (Tsinghua University)

  • Qixiu Du

    (Tsinghua University)

  • Ye Wang

    (Tsinghua University
    Columbia University)

  • Lei Wei

    (Tsinghua University)

  • Xiaowo Wang

    (Tsinghua University)

Abstract

Regulatory sequences encode crucial gene expression signals, yet the sequence characteristics that determine their functionality across species remain obscure. Deep generative models have demonstrated considerable potential in various inverse design applications, especially in engineering genetic elements. Here, we introduce DeepCROSS, a generative artificial intelligence framework for the inverse design of cross-species and species-preferred 5’ regulatory sequences in bacteria. DeepCROSS constructs a meta-representation using 1.8 million regulatory sequences from thousands of bacterial genomes to depict the general constraints of regulatory sequences, employs artificial intelligence-guided massively parallel reporter assay experiments in E. coli and P. aeruginosa to explore the potential sequence space, and performs multi-task optimization to obtain de novo regulatory sequences. The optimized regulatory sequences achieve similar or better performance to functional natural regulatory sequences, with high success rates and low sequence similarities with the natural genome. Collectively, DeepCROSS efficiently navigates the sequence-function landscape and enables the inverse design of cross-species and species-preferred 5’ regulatory sequences.

Suggested Citation

  • Pengcheng Zhang & Qixiu Du & Ye Wang & Lei Wei & Xiaowo Wang, 2025. "Systematic representation and optimization enable the inverse design of cross-species regulatory sequences in bacteria," Nature Communications, Nature, vol. 16(1), pages 1-14, December.
  • Handle: RePEc:nat:natcom:v:16:y:2025:i:1:d:10.1038_s41467-025-57031-1
    DOI: 10.1038/s41467-025-57031-1
    as

    Download full text from publisher

    File URL: https://www.nature.com/articles/s41467-025-57031-1
    File Function: Abstract
    Download Restriction: no

    File URL: https://libkey.io/10.1038/s41467-025-57031-1?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Manish Kushwaha & Howard M. Salis, 2015. "A portable expression resource for engineering cross-species genetic circuits and pathways," Nature Communications, Nature, vol. 6(1), pages 1-11, November.
    2. Timothy C. Yu & Winnie L. Liu & Marcia S. Brinck & Jessica E. Davis & Jeremy Shek & Grace Bower & Tal Einav & Kimberly D. Insigne & Rob Phillips & Sriram Kosuri & Guillaume Urtecho, 2021. "Multiplexed characterization of rationally designed promoter architectures deconstructs combinatorial logic for IPTG-inducible systems," Nature Communications, Nature, vol. 12(1), pages 1-14, December.
    3. Andrew W. Senior & Richard Evans & John Jumper & James Kirkpatrick & Laurent Sifre & Tim Green & Chongli Qin & Augustin Žídek & Alexander W. R. Nelson & Alex Bridgland & Hugo Penedones & Stig Petersen, 2020. "Improved protein structure prediction using potentials from deep learning," Nature, Nature, vol. 577(7792), pages 706-710, January.
    4. Ashkaan K. Fahimipour & Thilo Gross, 2020. "Mapping the bacterial metabolic niche space," Nature Communications, Nature, vol. 11(1), pages 1-8, December.
    5. Stefan M. Gaida & Nicholas R. Sandoval & Sergios A. Nicolaou & Yili Chen & Keerthi P. Venkataramanan & Eleftherios T. Papoutsakis, 2015. "Expression of heterologous sigma factors enables functional screening of metagenomic and heterologous genomic libraries," Nature Communications, Nature, vol. 6(1), pages 1-10, November.
    6. Jinsen Li & Tsu-Pei Chiu & Remo Rohs, 2024. "Predicting DNA structure using a deep learning method," Nature Communications, Nature, vol. 15(1), pages 1-12, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Lauren L. Porter & Allen K. Kim & Swechha Rimal & Loren L. Looger & Ananya Majumdar & Brett D. Mensh & Mary R. Starich & Marie-Paule Strub, 2022. "Many dissimilar NusG protein domains switch between α-helix and β-sheet folds," Nature Communications, Nature, vol. 13(1), pages 1-12, December.
    2. Amir Pandi & Christoph Diehl & Ali Yazdizadeh Kharrazi & Scott A. Scholz & Elizaveta Bobkova & Léon Faure & Maren Nattermann & David Adam & Nils Chapin & Yeganeh Foroughijabbari & Charles Moritz & Nic, 2022. "A versatile active learning workflow for optimization of genetic and metabolic networks," Nature Communications, Nature, vol. 13(1), pages 1-15, December.
    3. Zachary C. Drake & Justin T. Seffernick & Steffen Lindert, 2022. "Protein complex prediction using Rosetta, AlphaFold, and mass spectrometry covalent labeling," Nature Communications, Nature, vol. 13(1), pages 1-9, December.
    4. Tian Lan & Huan Wang & Qi An, 2024. "Enabling high throughput deep reinforcement learning with first principles to investigate catalytic reaction mechanisms," Nature Communications, Nature, vol. 15(1), pages 1-11, December.
    5. Nicolae Sapoval & Amirali Aghazadeh & Michael G. Nute & Dinler A. Antunes & Advait Balaji & Richard Baraniuk & C. J. Barberan & Ruth Dannenfelser & Chen Dun & Mohammadamin Edrisi & R. A. Leo Elworth &, 2022. "Current progress and open challenges for applying deep learning across the biosciences," Nature Communications, Nature, vol. 13(1), pages 1-12, December.
    6. Krzysztof Rusek & Agnieszka Kleszcz & Albert Cabellos-Aparicio, 2022. "Bayesian inference of spatial and temporal relations in AI patents for EU countries," Papers 2201.07168, arXiv.org.
    7. Krzysztof Rusek & Agnieszka Kleszcz & Albert Cabellos-Aparicio, 2023. "Bayesian inference of spatial and temporal relations in AI patents for EU countries," Scientometrics, Springer;Akadémiai Kiadó, vol. 128(6), pages 3313-3335, June.
    8. Benoit Stijlemans & Patrick Baetselier & Inge Molle & Laurence Lecordier & Erika Hendrickx & Ema Romão & Cécile Vincke & Wendy Baetens & Steve Schoonooghe & Gholamreza Hassanzadeh-Ghassabeh & Hannelie, 2024. "Q586B2 is a crucial virulence factor during the early stages of Trypanosoma brucei infection that is conserved amongst trypanosomatids," Nature Communications, Nature, vol. 15(1), pages 1-18, December.
    9. Niklas W. A. Gebauer & Michael Gastegger & Stefaan S. P. Hessmann & Klaus-Robert Müller & Kristof T. Schütt, 2022. "Inverse design of 3d molecular structures with conditional generative neural networks," Nature Communications, Nature, vol. 13(1), pages 1-11, December.
    10. Lisa Van den Broeck & Dinesh Kiran Bhosale & Kuncheng Song & Cássio Flavio Fonseca de Lima & Michael Ashley & Tingting Zhu & Shanshuo Zhu & Brigitte Van De Cotte & Pia Neyt & Anna C. Ortiz & Tiffany R, 2023. "Functional annotation of proteins for signaling network inference in non-model species," Nature Communications, Nature, vol. 14(1), pages 1-14, December.
    11. Felix Lorenz & Jonas Willwersch & Marcelo Cajias & Franz Fuerst, 2023. "Interpretable machine learning for real estate market analysis," Real Estate Economics, American Real Estate and Urban Economics Association, vol. 51(5), pages 1178-1208, September.
    12. Januschowski, Tim & Wang, Yuyang & Torkkola, Kari & Erkkilä, Timo & Hasson, Hilaf & Gasthaus, Jan, 2022. "Forecasting with trees," International Journal of Forecasting, Elsevier, vol. 38(4), pages 1473-1481.
    13. Anita Dornes & Lisa Marie Schmidt & Christopher-Nils Mais & John C. Hook & Jan Pané-Farré & Dieter Kressler & Kai Thormann & Gert Bange, 2024. "Polar confinement of a macromolecular machine by an SRP-type GTPase," Nature Communications, Nature, vol. 15(1), pages 1-11, December.
    14. Hajkowicz, Stefan & Naughtin, Claire & Sanderson, Conrad & Schleiger, Emma & Karimi, Sarvnaz & Bratanova, Alexandra & Bednarz, Tomasz, 2022. "Artificial intelligence for science – adoption trends and future development pathways," MPRA Paper 115464, University Library of Munich, Germany.
    15. Gang Li & Chenbi Li & Chengli Wang & Zeheng Wang, 2024. "Suboptimal capability of individual machine learning algorithms in modeling small-scale imbalanced clinical data of local hospital," PLOS ONE, Public Library of Science, vol. 19(2), pages 1-13, February.
    16. Qiufen Chen & Yuanzhao Guo & Jiuhong Jiang & Jing Qu & Li Zhang & Han Wang, 2023. "The Relative Distance Prediction of Transmembrane Protein Surface Residue Based on Improved Residual Networks," Mathematics, MDPI, vol. 11(3), pages 1-16, January.
    17. Agnese I. Curatolo & Ofer Kimchi & Carl P. Goodrich & Ryan K. Krueger & Michael P. Brenner, 2023. "A computational toolbox for the assembly yield of complex and heterogeneous structures," Nature Communications, Nature, vol. 14(1), pages 1-13, December.
    18. Tao Ni & Qiuyao Jiang & Pei Cing Ng & Juan Shen & Hao Dou & Yanan Zhu & Julika Radecke & Gregory F. Dykes & Fang Huang & Lu-Ning Liu & Peijun Zhang, 2023. "Intrinsically disordered CsoS2 acts as a general molecular thread for α-carboxysome shell assembly," Nature Communications, Nature, vol. 14(1), pages 1-9, December.
    19. Raphael R Eguchi & Christian A Choe & Po-Ssu Huang, 2022. "Ig-VAE: Generative modeling of protein structure by direct 3D coordinate generation," PLOS Computational Biology, Public Library of Science, vol. 18(6), pages 1-18, June.
    20. Noelia Ferruz & Steffen Schmidt & Birte Höcker, 2022. "ProtGPT2 is a deep unsupervised language model for protein design," Nature Communications, Nature, vol. 13(1), pages 1-10, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:natcom:v:16:y:2025:i:1:d:10.1038_s41467-025-57031-1. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.