IDEAS home Printed from https://ideas.repec.org/a/plo/pdig00/0000753.html
   My bibliography  Save this article

Privacy-preserving AUC computation in distributed machine learning with PHT-meDIC

Author

Listed:
  • Marius de Arruda Botelho
  • Cem Ata Baykara
  • Ali Burak Ünal
  • Nico Pfeifer
  • Mete Akgün

Abstract

Ensuring privacy in distributed machine learning while computing the Area Under the Curve (AUC) is a significant challenge because pooling sensitive test data is often not allowed. Although cryptographic methods can address some of these concerns, they may compromise either scalability or accuracy. In this paper, we present two privacy-preserving solutions for secure AUC computation across multiple institutions: (1) an exact global AUC method that handles ties in prediction scores and scales linearly with the number of samples, and (2) an approximation method that substantially reduces runtime while maintaining acceptable accuracy. Our protocols leverage a combination of homomorphic encryption (modified Paillier), symmetric and asymmetric cryptography, and randomized encoding to preserve the confidentiality of true labels and model predictions. We integrate these methods into the Personal Health Train (PHT)-meDIC platform, a distributed machine learning environment designed for healthcare, to demonstrate their correctness and feasibility. Results using both real-world and synthetic datasets confirm the accuracy of our approach: the exact method computes the true AUC without revealing private inputs, and the approximation provides a balanced trade-off between computational efficiency and precision. All relevant code and data is publicly available at https://github.com/PHT-meDIC/PP-AUC, facilitating straightforward adoption and further development within broader distributed learning ecosystems.Author summary: A commonly used metric to evaluate the performance of machine learning models is the Area Under the Curve (AUC). Calculating the AUC in distributed machine learning settings is challenging because data cannot be shared between institutions due to privacy concerns. To address this, we developed two privacy-preserving methods: one that calculates the exact AUC securely and another that provides faster approximations with high accuracy. These methods use advanced encryption techniques to protect sensitive data while enabling secure collaboration. We tested them in a real-world healthcare platform called PHT-meDIC and demonstrated their effectiveness. The code and data is publicly available at https://github.com/PHT-meDIC/PP-AUC to support wider adoption.

Suggested Citation

  • Marius de Arruda Botelho & Cem Ata Baykara & Ali Burak Ünal & Nico Pfeifer & Mete Akgün, 2025. "Privacy-preserving AUC computation in distributed machine learning with PHT-meDIC," PLOS Digital Health, Public Library of Science, vol. 4(11), pages 1-21, November.
  • Handle: RePEc:plo:pdig00:0000753
    DOI: 10.1371/journal.pdig.0000753
    as

    Download full text from publisher

    File URL: https://journals.plos.org/digitalhealth/article?id=10.1371/journal.pdig.0000753
    Download Restriction: no

    File URL: https://journals.plos.org/digitalhealth/article/file?id=10.1371/journal.pdig.0000753&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pdig.0000753?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pdig00:0000753. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: digitalhealth (email available below). General contact details of provider: https://journals.plos.org/digitalhealth .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.