IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0209018.html
   My bibliography  Save this article

Mucopolysaccharidosis type II detection by Naïve Bayes Classifier: An example of patient classification for a rare disease using electronic medical records from the Canadian Primary Care Sentinel Surveillance Network

Author

Listed:
  • Behrouz Ehsani-Moghaddam
  • John A Queenan
  • Jennifer MacKenzie
  • Richard V Birtwhistle

Abstract

Identifying patients with rare diseases associated with common symptoms is challenging. Hunter syndrome, or Mucopolysaccharidosis type II is a progressive rare disease caused by a deficiency in the activity of the lysosomal enzyme, iduronate 2-sulphatase. It is inherited in an X-linked manner resulting in males being significantly affected. Expression in females varies with the majority being unaffected although symptoms may emerge over time. We developed a Naïve Bayes classification (NBC) algorithm utilizing the clinical diagnosis and symptoms of patients contained within their de-identified and unstructured electronic medical records (EMR) extracted by the Canadian Primary Care Sentinel Surveillance Network (CPCSSN). To do so, we created a training dataset using published results in the scientific literature and from all MPS II symptoms and applied the training dataset and its independent features to compute the conditional posterior probabilities of having MPS II disease as a categorical dependent variable for 506497 male patients. The classifier identified 125 patients with the highest likelihood for having the disease and 18 features were selected to be necessary for forecasting. Next, a Recursive Backward Feature Elimination algorithm was employed, for optimal input features of the NBC model, using a k-fold Cross-Validation with 3 replicates. The accuracy of the final model was estimated by the Validation Set Approach technique and the bootstrap resampling. We also investigated that whether the NBC is as accurate as three other Bayesian networks. The Naïve Bayes Classifier appears to be an efficient algorithm in assisting physicians with the diagnosis of Hunter syndrome allowing optimal patient management.

Suggested Citation

  • Behrouz Ehsani-Moghaddam & John A Queenan & Jennifer MacKenzie & Richard V Birtwhistle, 2018. "Mucopolysaccharidosis type II detection by Naïve Bayes Classifier: An example of patient classification for a rare disease using electronic medical records from the Canadian Primary Care Sentinel Surv," PLOS ONE, Public Library of Science, vol. 13(12), pages 1-17, December.
  • Handle: RePEc:plo:pone00:0209018
    DOI: 10.1371/journal.pone.0209018
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0209018
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0209018&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0209018?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0209018. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.