IDEAS home Printed from https://ideas.repec.org/a/nat/natcom/v14y2023i1d10.1038_s41467-023-39279-7.html
   My bibliography  Save this article

Ultra-fast and accurate electron ionization mass spectrum matching for compound identification with million-scale in-silico library

Author

Listed:
  • Qiong Yang

    (Central South University)

  • Hongchao Ji

    (Chinese Academy of Agricultural Sciences)

  • Zhenbo Xu

    (Central South University)

  • Yiming Li

    (Central South University)

  • Pingshan Wang

    (Central South University)

  • Jinyu Sun

    (Central South University)

  • Xiaqiong Fan

    (Central South University)

  • Hailiang Zhang

    (Central South University)

  • Hongmei Lu

    (Central South University)

  • Zhimin Zhang

    (Central South University)

Abstract

Spectrum matching is the most common method for compound identification in mass spectrometry (MS). However, some challenges limit its efficiency, including the coverage of spectral libraries, the accuracy, and the speed of matching. In this study, a million-scale in-silico EI-MS library is established. Furthermore, an ultra-fast and accurate spectrum matching (FastEI) method is proposed to substantially improve accuracy using Word2vec spectral embedding and boost the speed using the hierarchical navigable small-world graph (HNSW). It achieves 80.4% recall@10 accuracy (88.3% with 5 Da mass filter) with a speedup of two orders of magnitude compared with the weighted cosine similarity method (WCS). When FastEI is applied to identify the molecules beyond NIST 2017 library, it achieves 50% recall@1 accuracy. FastEI is packaged as a standalone and user-friendly software for common users with limited computational backgrounds. Overall, FastEI combined with a million-scale in-silico library facilitates compound identification as an accurate and ultra-fast tool.

Suggested Citation

  • Qiong Yang & Hongchao Ji & Zhenbo Xu & Yiming Li & Pingshan Wang & Jinyu Sun & Xiaqiong Fan & Hailiang Zhang & Hongmei Lu & Zhimin Zhang, 2023. "Ultra-fast and accurate electron ionization mass spectrum matching for compound identification with million-scale in-silico library," Nature Communications, Nature, vol. 14(1), pages 1-11, December.
  • Handle: RePEc:nat:natcom:v:14:y:2023:i:1:d:10.1038_s41467-023-39279-7
    DOI: 10.1038/s41467-023-39279-7
    as

    Download full text from publisher

    File URL: https://www.nature.com/articles/s41467-023-39279-7
    File Function: Abstract
    Download Restriction: no

    File URL: https://libkey.io/10.1038/s41467-023-39279-7?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Florian Huber & Lars Ridder & Stefan Verhoeven & Jurriaan H Spaaks & Faruk Diblen & Simon Rogers & Justin J J van der Hooft, 2021. "Spec2Vec: Improved mass spectral similarity scoring through learning of structural relationships," PLOS Computational Biology, Public Library of Science, vol. 17(2), pages 1-18, February.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Wout Bittremieux & Nicole E. Avalon & Sydney P. Thomas & Sarvar A. Kakhkhorov & Alexander A. Aksenov & Paulo Wender P. Gomes & Christine M. Aceves & Andrés Mauricio Caraballo-Rodríguez & Julia M. Gaug, 2023. "Open access repository-scale propagated nearest neighbor suspect spectral library for untargeted metabolomics," Nature Communications, Nature, vol. 14(1), pages 1-15, December.
    2. Nicholas J. Morehouse & Trevor N. Clark & Emily J. McMann & Jeffrey A. Santen & F. P. Jake Haeckl & Christopher A. Gray & Roger G. Linington, 2023. "Annotation of natural product compound families using molecular networking topology and structural similarity fingerprinting," Nature Communications, Nature, vol. 14(1), pages 1-10, December.
    3. Zhiwei Zhou & Mingdu Luo & Haosong Zhang & Yandong Yin & Yuping Cai & Zheng-Jiang Zhu, 2022. "Metabolite annotation from knowns to unknowns through knowledge-guided multi-layer metabolic networking," Nature Communications, Nature, vol. 13(1), pages 1-15, December.
    4. Daniel G. C. Treen & Mingxun Wang & Shipei Xing & Katherine B. Louie & Tao Huan & Pieter C. Dorrestein & Trent R. Northen & Benjamin P. Bowen, 2022. "SIMILE enables alignment of tandem mass spectra with statistical significance," Nature Communications, Nature, vol. 13(1), pages 1-10, December.
    5. Niek F. de Jonge & Joris J. R. Louwen & Elena Chekmeneva & Stephane Camuzeaux & Femke J. Vermeir & Robert S. Jansen & Florian Huber & Justin J. J. van der Hooft, 2023. "MS2Query: reliable and scalable MS2 mass spectra-based analogue search," Nature Communications, Nature, vol. 14(1), pages 1-12, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:natcom:v:14:y:2023:i:1:d:10.1038_s41467-023-39279-7. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.