IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0291581.html
   My bibliography  Save this article

An open-source probabilistic record linkage process for records with family-level information: Simulation study and applied analysis

Author

Listed:
  • John Prindle
  • Himal Suthar
  • Emily Putnam-Hornstein

Abstract

Research with administrative records involves the challenge of limited information in any single data source to answer policy-related questions. Record linkage provides researchers with a tool to supplement administrative datasets with other information about the same people when identified in separate sources as matched pairs. Several solutions are available for undertaking record linkage, producing linkage keys for merging data sources for positively matched pairs of records. In the current manuscript, we demonstrate a new application of the Python RecordLinkage package to family-based record linkages with machine learning algorithms for probability scoring, which we call probabilistic record linkage for families (PRLF). First, a simulation of administrative records identifies PRLF accuracy with variations in match and data degradation percentages. Accuracy is largely influenced by degradation (e.g., missing data fields, mismatched values) compared to the percentage of simulated matches. Second, an application of data linkage is presented to compare regression model estimate performance across three record linkage solutions (PRLF, ChoiceMaker, and Link Plus). Our findings indicate that all three solutions, when optimized, provide similar results for researchers. Strengths of our process, such as the use of ensemble methods, to improve match accuracy are discussed. We then identify caveats of record linkage in the context of administrative data.

Suggested Citation

  • John Prindle & Himal Suthar & Emily Putnam-Hornstein, 2023. "An open-source probabilistic record linkage process for records with family-level information: Simulation study and applied analysis," PLOS ONE, Public Library of Science, vol. 18(10), pages 1-16, October.
  • Handle: RePEc:plo:pone00:0291581
    DOI: 10.1371/journal.pone.0291581
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0291581
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0291581&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0291581?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Putnam-Hornstein, E. & Cleves, M.A. & Licht, R. & Needell, B., 2013. "Risk of fatal injury in young children following abuse allegations: Evidence from a prospective, population-based study," American Journal of Public Health, American Public Health Association, vol. 103(10), pages 39-44.
    2. Anzia, Sarah F. & Jares, Jake Alton & Malhotra, Neil, 2022. "Does Receiving Government Assistance Shape Political Attitudes? Evidence from Agricultural Producers," American Political Science Review, Cambridge University Press, vol. 116(4), pages 1389-1406, November.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Gailey, Samantha, 2022. "Moving to greener pastures: Health selection into neighborhood green space among a highly mobile and diverse population in California," Social Science & Medicine, Elsevier, vol. 315(C).
    2. Green, Beth L. & Ayoub, Catherine & Bartlett, Jessica Dym & Furrer, Carrie & Von Ende, Adam & Chazan-Cohen, Rachel & Klevens, Joanne & Nygren, Peggy, 2015. "It's not as simple as it sounds: Problems and solutions in accessing and using administrative child welfare data for evaluating the impact of early childhood interventions," Children and Youth Services Review, Elsevier, vol. 57(C), pages 40-49.
    3. Henderson, Gillian & Jones, Christine & Woods, Ruth, 2017. "Sibling birth order, use of statutory measures and patterns of placement for children in public care: Implications for international child protection systems and research," Children and Youth Services Review, Elsevier, vol. 82(C), pages 321-328.
    4. Zanti, Sharon & Berkowitz, Emily & Katz, Matthew & Nelson, Amy Hawn & Burnett, T.C. & Culhane, Dennis & Zhou, Yixi, 2022. "Leveraging integrated data for program evaluation: Recommendations from the field," Evaluation and Program Planning, Elsevier, vol. 95(C).
    5. Kraus, David R. & Baxter, Elizabeth E. & Alexander, Pamela C. & Bentley, Jordan H., 2015. "The Treatment Outcome Package (TOP): A multi-dimensional level of care matrix for child welfare," Children and Youth Services Review, Elsevier, vol. 57(C), pages 171-178.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0291581. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.