IDEAS home Printed from https://ideas.repec.org/a/plo/pdig00/0000973.html
   My bibliography  Save this article

Assessing the generalisation of artificial intelligence across mammography manufacturers

Author

Listed:
  • Alistair J Hickman
  • Sandra Gomes
  • Lucy M Warren
  • Nadia AS Smith
  • Caroline Shenton-Taylor

Abstract

The aim of this study was to determine whether differences between manufacturer of mammogram images effects performance of artificial intelligence tools for classifying breast density. Processed mammograms from 10,156 women were used to train and validate three deep learning algorithms using three retrospective datasets: Hologic, General Electric, Mixed (equal numbers of Hologic, General Electric and Siemens images) and tested on four independent witheld test sets (Hologic, General Electric, Mixed and Siemens). The area under the receiver operating characteristic curve (AUC) was compared. Women aged 47-73 with normal breasts (routine recall - no cancer) and Volpara ground truth were selected from the OPTIMAM Mammography Image Database for the years 2012-2015. 95 % confidence intervals are used for significance testing in the results with a Bayesian Signed Rank test used to rank the overall performance of the models. Best single test performance is seen when a model is trained and tested on images from a single manufacturer (Hologic train/test: 0.98 and General Electric train/test: 0.97), however the same models performed significantly worse on any other manufacturer images (General Electric AUCs: 0.68 & 0.63; Hologic AUCs: 0.56 & 0.90). The model trained on the mixed dataset exhibited the best overall performance. Better performance occurs when training and test sets contain the same manufacturer distributions and better generalisation occurs when more manufacturers are included in training. Models in clinical use should be trained on data representing the different vendors of mammogram machines used across screening programs. This is clinically relevant as models will be impacted by changes and upgrades to mammogram machines in screening centres.Author summary: A number of manufacturers of mammogram machines are in use within the NHS Breast Screening Program. Naturally some of these manufacturers use different technologies to acquire the mammograms. These mammograms are made readable through the application of processing to the raw information from the X-ray detector, which is known to vary both inter- and intra- manufacturer. The aim of this study was to assess whether these differences impact the performance of AI classification algorithms. We trained three binary classifiers on three different datasets, two from single manufacturers and one with an even mix of three manufacturers. Models trained on single manufacturer data could not generalise their knowledge to manufacturers unseen in training. The model trained on three manufacturers was the best overall performer. In general models must be trained on images from any manufacturers in the desired clinical setting as there are sufficient differences between manufacturers that AI algorithms cannot transfer their knowledge to a mammogram from an unseen manufacturer. Models must also be monitored and kept up to date to reflect any changes to mammogram machines within the clinical setting.

Suggested Citation

  • Alistair J Hickman & Sandra Gomes & Lucy M Warren & Nadia AS Smith & Caroline Shenton-Taylor, 2025. "Assessing the generalisation of artificial intelligence across mammography manufacturers," PLOS Digital Health, Public Library of Science, vol. 4(8), pages 1-12, August.
  • Handle: RePEc:plo:pdig00:0000973
    DOI: 10.1371/journal.pdig.0000973
    as

    Download full text from publisher

    File URL: https://journals.plos.org/digitalhealth/article?id=10.1371/journal.pdig.0000973
    Download Restriction: no

    File URL: https://journals.plos.org/digitalhealth/article/file?id=10.1371/journal.pdig.0000973&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pdig.0000973?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pdig00:0000973. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: digitalhealth (email available below). General contact details of provider: https://journals.plos.org/digitalhealth .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.