IDEAS home Printed from https://ideas.repec.org/a/nat/natcom/v14y2023i1d10.1038_s41467-023-36476-2.html
   My bibliography  Save this article

Multilingual translation for zero-shot biomedical classification using BioTranslator

Author

Listed:
  • Hanwen Xu

    (University of Washington)

  • Addie Woicik

    (University of Washington)

  • Hoifung Poon

    (Microsoft Research)

  • Russ B. Altman

    (Stanford University
    Stanford University
    Chan Zuckerberg Biohub)

  • Sheng Wang

    (University of Washington)

Abstract

Existing annotation paradigms rely on controlled vocabularies, where each data instance is classified into one term from a predefined set of controlled vocabularies. This paradigm restricts the analysis to concepts that are known and well-characterized. Here, we present the novel multilingual translation method BioTranslator to address this problem. BioTranslator takes a user-written textual description of a new concept and then translates this description to a non-text biological data instance. The key idea of BioTranslator is to develop a multilingual translation framework, where multiple modalities of biological data are all translated to text. We demonstrate how BioTranslator enables the identification of novel cell types using only a textual description and how BioTranslator can be further generalized to protein function prediction and drug target identification. Our tool frees scientists from limiting their analyses within predefined controlled vocabularies, enabling them to interact with biological data using free text.

Suggested Citation

  • Hanwen Xu & Addie Woicik & Hoifung Poon & Russ B. Altman & Sheng Wang, 2023. "Multilingual translation for zero-shot biomedical classification using BioTranslator," Nature Communications, Nature, vol. 14(1), pages 1-13, December.
  • Handle: RePEc:nat:natcom:v:14:y:2023:i:1:d:10.1038_s41467-023-36476-2
    DOI: 10.1038/s41467-023-36476-2
    as

    Download full text from publisher

    File URL: https://www.nature.com/articles/s41467-023-36476-2
    File Function: Abstract
    Download Restriction: no

    File URL: https://libkey.io/10.1038/s41467-023-36476-2?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Sheng Wang & Angela Oliveira Pisco & Aaron McGeever & Maria Brbic & Marinka Zitnik & Spyros Darmanis & Jure Leskovec & Jim Karkanias & Russ B. Altman, 2021. "Leveraging the Cell Ontology to classify unseen cell types," Nature Communications, Nature, vol. 12(1), pages 1-11, December.
    2. Mathew J. Garnett & Elena J. Edelman & Sonja J. Heidorn & Chris D. Greenman & Anahita Dastur & King Wai Lau & Patricia Greninger & I. Richard Thompson & Xi Luo & Jorge Soares & Qingsong Liu & Francesc, 2012. "Systematic identification of genomic markers of drug sensitivity in cancer cells," Nature, Nature, vol. 483(7391), pages 570-575, March.
    3. Kevin W Boyack & David Newman & Russell J Duhon & Richard Klavans & Michael Patek & Joseph R Biberstine & Bob Schijvenaars & André Skupin & Nianli Ma & Katy Börner, 2011. "Clustering More than Two Million Biomedical Publications: Comparing the Accuracies of Nine Text-Based Similarity Approaches," PLOS ONE, Public Library of Science, vol. 6(3), pages 1-11, March.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Paul Donner, 2021. "Validation of the Astro dataset clustering solutions with external data," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(2), pages 1619-1645, February.
    2. Ding, Hui & Zhang, Jian & Zhang, Riquan, 2022. "Nonparametric variable screening for multivariate additive models," Journal of Multivariate Analysis, Elsevier, vol. 192(C).
    3. Ballester, Omar & Penner, Orion, 2022. "Robustness, replicability and scalability in topic modelling," Journal of Informetrics, Elsevier, vol. 16(1).
    4. L. Mathur & B. Szalai & N. H. Du & R. Utharala & M. Ballinger & J. J. M. Landry & M. Ryckelynck & V. Benes & J. Saez-Rodriguez & C. A. Merten, 2022. "Combi-seq for multiplexed transcriptome-based profiling of drug combinations using deterministic barcoding in single-cell droplets," Nature Communications, Nature, vol. 13(1), pages 1-15, December.
    5. Renchu Guan & Chen Yang & Maurizio Marchese & Yanchun Liang & Xiaohu Shi, 2014. "Full Text Clustering and Relationship Network Analysis of Biomedical Publications," PLOS ONE, Public Library of Science, vol. 9(9), pages 1-9, September.
    6. Shixuan Liu & Camille Ezran & Michael F. Z. Wang & Zhengda Li & Kyle Awayan & Jonathan Z. Long & Iwijn De Vlaminck & Sheng Wang & Jacques Epelbaum & Christin S. Kuo & Jérémy Terrien & Mark A. Krasnow , 2024. "An organism-wide atlas of hormonal signaling based on the mouse lemur single-cell transcriptome," Nature Communications, Nature, vol. 15(1), pages 1-27, December.
    7. Francesco Giovanni Avallone & Alberto Quagli & Paola Ramassa, 2022. "Interdisciplinary research by accounting scholars: An exploratory study," FINANCIAL REPORTING, FrancoAngeli Editore, vol. 2022(2), pages 5-34.
    8. Rey-Long Liu, 2015. "Passage-Based Bibliographic Coupling: An Inter-Article Similarity Measure for Biomedical Articles," PLOS ONE, Public Library of Science, vol. 10(10), pages 1-22, October.
    9. Alicia Lara-Clares & Juan J Lastra-Díaz & Ana Garcia-Serrano, 2021. "Protocol for a reproducible experimental survey on biomedical sentence similarity," PLOS ONE, Public Library of Science, vol. 16(3), pages 1-28, March.
    10. Chen, Liang & Xu, Shuo & Zhu, Lijun & Zhang, Jing & Xu, Haiyun & Yang, Guancan, 2022. "A semantic main path analysis method to identify multiple developmental trajectories," Journal of Informetrics, Elsevier, vol. 16(2).
    11. Ai Linh Nguyen & Wenyuan Liu & Khiam Aik Khor & Andrea Nanetti & Siew Ann Cheong, 2022. "Strategic differences between regional investments into graphene technology and how corporations and universities manage patent portfolios," Papers 2208.03719, arXiv.org.
    12. Fei Shu & Yue Ma & Junping Qiu & Vincent Larivière, 2020. "Classifications of science and their effects on bibliometric evaluations," Scientometrics, Springer;Akadémiai Kiadó, vol. 125(3), pages 2727-2744, December.
    13. Peter Sjögårde & Fereshteh Didegah, 2022. "The association between topic growth and citation impact of research publications," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(4), pages 1903-1921, April.
    14. Lin Zhang & Beibei Sun & Fei Shu & Ying Huang, 2022. "Comparing paper level classifications across different methods and systems: an investigation of Nature publications," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(12), pages 7633-7651, December.
    15. Manuel A. Vázquez & Jorge Pereira-Delgado & Jesús Cid-Sueiro & Jerónimo Arenas-García, 2022. "Validation of scientific topic models using graph analysis and corpus metadata," Scientometrics, Springer;Akadémiai Kiadó, vol. 127(9), pages 5441-5458, September.
    16. G. Gambardella & G. Viscido & B. Tumaini & A. Isacchi & R. Bosotti & D. di Bernardo, 2022. "A single-cell analysis of breast cancer cell lines to study tumour heterogeneity and drug response," Nature Communications, Nature, vol. 13(1), pages 1-12, December.
    17. Shi, Chengchun & Xu, Tianlin & Bergsma, Wicher & Li, Lexin, 2021. "Double generative adversarial networks for conditional independence testing," LSE Research Online Documents on Economics 112550, London School of Economics and Political Science, LSE Library.
    18. Lovro Šubelj & Nees Jan van Eck & Ludo Waltman, 2016. "Clustering Scientific Publications Based on Citation Relations: A Systematic Comparison of Different Methods," PLOS ONE, Public Library of Science, vol. 11(4), pages 1-23, April.
    19. Milad Dehghani & Ki Joon Kim, 2019. "Past and Present Research on Wearable Technologies: Bibliometric and Cluster Analyses of Published Research from 2000 to 2016," International Journal of Innovation and Technology Management (IJITM), World Scientific Publishing Co. Pte. Ltd., vol. 16(01), pages 1-21, February.
    20. Juste Raimbault, 2019. "Exploration of an interdisciplinary scientific landscape," Scientometrics, Springer;Akadémiai Kiadó, vol. 119(2), pages 617-641, May.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:natcom:v:14:y:2023:i:1:d:10.1038_s41467-023-36476-2. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.