IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0260293.html
   My bibliography  Save this article

A machine learning-based typing scheme refinement for Listeria monocytogenes core genome multilocus sequence typing with high discriminatory power for common source outbreak tracking

Author

Listed:
  • Yen-Yi Liu
  • Chih-Chieh Chen

Abstract

Background: As whole-genome sequencing for pathogen genomes becomes increasingly popular, the typing methods of gene-by-gene comparison, such as core genome multilocus sequence typing (cgMLST) and whole-genome multilocus sequence typing (wgMLST), are being routinely implemented in molecular epidemiology. However, some intrinsic problems remain. For example, genomic sequences with varying read depths, read lengths, and assemblers influence the genome assemblies, introducing error or missing alleles into the generated allelic profiles. These errors and missing alleles might create “specious discrepancy” among closely related isolates, thus making accurate epidemiological interpretation challenging. In addition, the rapid growth of the cgMLST allelic profile database can cause problems related to storage and maintenance as well as long query search times. Methods: We attempted to resolve these issues by decreasing the scheme size to reduce the occurrence of error and missing alleles, alleviate the storage burden, and improve the query search time. The challenge in this approach is maintaining the typing resolution when using fewer loci. We achieved this by using a popular artificial intelligence technique, XGBoost, coupled with Shapley additive explanations for feature selection. Finally, 370 loci from the original 1701 cgMLST loci of Listeria monocytogenes were selected. Results: Although the size of the final scheme (LmScheme_370) was approximately 80% lower than that of the original cgMLST scheme, its discriminatory power, tested for 35 outbreaks, was concordant with that of the original cgMLST scheme. Although we used L. monocytogenes as a demonstration in this study, the approach can be applied to other schemes and pathogens. Our findings might help elucidate gene-by-gene–based epidemiology.

Suggested Citation

  • Yen-Yi Liu & Chih-Chieh Chen, 2021. "A machine learning-based typing scheme refinement for Listeria monocytogenes core genome multilocus sequence typing with high discriminatory power for common source outbreak tracking," PLOS ONE, Public Library of Science, vol. 16(11), pages 1-13, November.
  • Handle: RePEc:plo:pone00:0260293
    DOI: 10.1371/journal.pone.0260293
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0260293
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0260293&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0260293?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0260293. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.