IDEAS home Printed from https://ideas.repec.org/a/spr/drugsa/v44y2021i9d10.1007_s40264-021-01084-w.html
   My bibliography  Save this article

Augmenting Product Defect Surveillance Through Web Crawling and Machine Learning in Singapore

Author

Listed:
  • Pei San Ang

    (Health Sciences Authority)

  • Desmond Chun Hwee Teo

    (Health Sciences Authority)

  • Sreemanee Raaj Dorajoo

    (Health Sciences Authority)

  • Mukundaram Prem Kumar

    (Health Sciences Authority)

  • Yi Hao Chan

    (Health Sciences Authority)

  • Chih Tzer Choong

    (Health Sciences Authority)

  • Doris Sock Tin Phuah

    (Health Sciences Authority)

  • Dorothy Hooi Myn Tan

    (Health Sciences Authority)

  • Filina Meixuan Tan

    (Health Sciences Authority)

  • Huilin Huang

    (Health Sciences Authority)

  • Maggie Siok Hwee Tan

    (Health Sciences Authority)

  • Michelle Sau Yuen Ng

    (Health Sciences Authority)

  • Jalene Wang Woon Poh

    (Health Sciences Authority)

Abstract

Introduction Substandard medicines are medicines that fail to meet their quality standards and/or specifications. Substandard medicines can lead to serious safety issues affecting public health. With the increasing number of pharmaceuticals and the complexity of the pharmaceutical manufacturing supply chain, monitoring for substandard medicines via manual environmental scanning can be laborious and time consuming. Methods A web crawler was developed to automatically detect and extract alerts on substandard medicines published on the Internet by regulatory agencies. The crawled data were labelled as related to substandard medicines or not. An expert-derived keyword-based classification algorithm was compared against machine learning algorithms to identify substandard medicine alerts on two validation datasets (n = 4920 and n = 2458) from a later time period than training data. Models were comparatively assessed for recall, precision and their F1 scores (harmonic mean of precision and recall). Results The web crawler routinely extracted alerts from the 46 web pages belonging to nine regulatory agencies. From October 2019 to May 2020, 12,156 unique alerts were crawled of which 7378 (60.7%) alerts were set aside for validation and contained 1160 substandard medicine alerts (15.7%). An ensemble approach of combining machine learning and keywords achieved the best recall (94% and 97%), precision (85% and 80%) and F1 scores (89% and 88%) on temporal validation. Conclusions Combining robust web crawler programmes with rigorously tested filtering algorithms based on machine learning and keyword models can automate and expand horizon scanning capabilities for issues relating to substandard medicines.

Suggested Citation

  • Pei San Ang & Desmond Chun Hwee Teo & Sreemanee Raaj Dorajoo & Mukundaram Prem Kumar & Yi Hao Chan & Chih Tzer Choong & Doris Sock Tin Phuah & Dorothy Hooi Myn Tan & Filina Meixuan Tan & Huilin Huang , 2021. "Augmenting Product Defect Surveillance Through Web Crawling and Machine Learning in Singapore," Drug Safety, Springer, vol. 44(9), pages 939-948, September.
  • Handle: RePEc:spr:drugsa:v:44:y:2021:i:9:d:10.1007_s40264-021-01084-w
    DOI: 10.1007/s40264-021-01084-w
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s40264-021-01084-w
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s40264-021-01084-w?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:drugsa:v:44:y:2021:i:9:d:10.1007_s40264-021-01084-w. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com/economics/journal/40264 .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.