Author
Listed:
- Tshikala Eddie Lulamba
- Themba Mutemaringa
- Nicki Tiffin
Abstract
Structured patient data generated within the health data ecosystem are shared both internally for operational use and also externally for research and public health benefit. Protecting individual privacy and health data confidentiality in these contexts relies on data de-identification and anonymisation, although there are no universally accepted standards for these processes and the techniques involved can be technically complex. We present practical recommendations grounded in the principle of data minimisation—avoiding unnecessary granularity and identifying variables that could lead to re-identification when combined with other datasets. We provide practical guidance for anonymising and perturbing structured health data in ways that support compliance with data protection laws, describing technical and operational methods for reducing re-identification risk that include rounding numerical values, replacing precise values with ranges, adding jitter to numeric fields, aggregating data, management of date values and separating sensitive fields from identifying data to prevent linkage leading to re-identification. While some methods require advanced technical knowledge, we focus here on accessible strategies that can be implemented without specialist expertise, recognising the importance of the legal and governance frameworks in which anonymisation occurs. These guidelines support researchers, data managers and institutions in sharing health data responsibly, maintaining data utility while upholding privacy and promoting ethical and legal data stewardship for data-driven health research.Author summary: Healthcare systems and health research programmes collect large amounts of patient data that are often shared both within organisations and across institutional boundaries. Health data are highly sensitive, and it is essential to ensure that individuals cannot be identified or recognised through the use of their health information. Data de-identification and anonymisation are the most common approaches for protecting individuals’ privacy and confidentiality in these settings, but there are no universal standards for these processes and they can be technically complex to apply. Here we describe practical, accessible technical and operational security measures that can be used to de-identify and anonymise structured health data in ways that comply with data protection laws. These practical guidelines can support data analysts and researchers working with sensitive health data, including those without prior experience in data anonymisation, to implement effective privacy-preserving techniques, including perturbation, for large, structured health-related datasets.
Suggested Citation
Tshikala Eddie Lulamba & Themba Mutemaringa & Nicki Tiffin, 2025.
"Ten quick tips for protecting health data using de-identification and perturbation of structured datasets,"
PLOS Computational Biology, Public Library of Science, vol. 21(9), pages 1-16, September.
Handle:
RePEc:plo:pcbi00:1013507
DOI: 10.1371/journal.pcbi.1013507
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1013507. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.