IDEAS home Printed from https://ideas.repec.org/p/cen/wpaper/26-06.html

Non-Random Assignment of Individual Identifiers and Selection into Linked Data: Implications for Research

Author

Listed:
  • Kyle Raze
  • Nicole Perales
  • Liana Christin Landivar

Abstract

The U.S. Census Bureau’s Person Identification Validation System facilitates anonymous linkages between survey and administrative records by assigning Protected Identification Keys (PIKs) to person records. While PIK assignment is generally accurate, some person records are not successfully assigned a PIK, which can lead to sample selection bias in analyses of linked data. Using the American Community Survey (ACS) and the Current Population Survey Annual Social and Economic Supplement (CPS ASEC) between 2005 and 2022, we corroborate and extend existing findings on the drivers of PIK assignment, showing that the rate of PIK assignment varies widely across socio-demographic subgroups. Using earnings as a test case, we then show that limiting a survey sample of wage earners to person records with PIKs or successful linkages to W-2 wage records tends to overestimate self-reported wage earnings, on average, indicative of linkage-induced selection bias. In a validation exercise, we demonstrate that reweighting methods, such as inverse probability weighting or entropy balancing, can mitigate this bias.

Suggested Citation

  • Kyle Raze & Nicole Perales & Liana Christin Landivar, 2026. "Non-Random Assignment of Individual Identifiers and Selection into Linked Data: Implications for Research," Working Papers 26-06, Center for Economic Studies, U.S. Census Bureau.
  • Handle: RePEc:cen:wpaper:26-06
    as

    Download full text from publisher

    File URL: https://www2.census.gov/library/working-papers/2026/adrm/ces/CES-WP-26-06.pdf
    File Function: First version, 2026
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Kosuke Imai & Marc Ratkovic, 2014. "Covariate balancing propensity score," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 76(1), pages 243-263, January.
    2. Brittany Bond & J. David Brown & Adela Luque & Amy O’Hara, 2014. "The Nature of the Bias When Studying Only Linkable Person Records: Evidence from the American Community Survey," CARRA Working Papers 2014-08, Center for Economic Studies, U.S. Census Bureau.
    3. Katharine G. Abraham & Aaron Maitland & Suzanne M. Bianchi, 2006. "Non-response in the American Time Use Survey: Who Is Missing from the Data and How Much Does It Matter?," NBER Technical Working Papers 0328, National Bureau of Economic Research, Inc.
    4. Aigner, Dennis J., 1973. "Regression with a binary independent variable subject to errors of observation," Journal of Econometrics, Elsevier, vol. 1(1), pages 49-59, March.
    5. Danielle Sandler & Nichole Szembrot, 2019. "Maternal Labor Dynamics: Participation, Earnings, and Employer Changes," Working Papers 19-33, Center for Economic Studies, U.S. Census Bureau.
    6. Christopher R. Bollinger & Barry T. Hirsch, 2006. "Match Bias from Earnings Imputation in the Current Population Survey: The Case of Imperfect Matching," Journal of Labor Economics, University of Chicago Press, vol. 24(3), pages 483-520, July.
    7. Hainmueller, Jens, 2012. "Entropy Balancing for Causal Effects: A Multivariate Reweighting Method to Produce Balanced Samples in Observational Studies," Political Analysis, Cambridge University Press, vol. 20(1), pages 25-46, January.
    8. James P. Ziliak & Charles Hokayem & Christopher R. Bollinger, 2022. "Trends in Earnings Volatility Using Linked Administrative and Survey Data," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 41(1), pages 12-19, December.
    9. Mary Layne & Deborah Wagner & Cynthia Rothhaas, 2014. "Estimating Record Linkage False Match Rate for the Person Identification Validation System," CARRA Working Papers 2014-02, Center for Economic Studies, U.S. Census Bureau.
    10. Jennifer Bernard & Kelsey Drotning & Katie R. Genadek, 2024. "Where Are Your Parents? Exploring Potential Bias in Administrative Records on Children," Working Papers 24-18, Center for Economic Studies, U.S. Census Bureau.
    11. Kavan Kucko & Kevin Rinz & Benjamin Solow, 2017. "Labor Market Effects of the Affordable Care Act: Evidence from a Tax Notch," CARRA Working Papers 2017-07, Center for Economic Studies, U.S. Census Bureau.
    12. Matias Busso & John DiNardo & Justin McCrary, 2014. "New Evidence on the Finite Sample Properties of Propensity Score Reweighting and Matching Estimators," The Review of Economics and Statistics, MIT Press, vol. 96(5), pages 885-897, December.
    13. Bryan S. Graham & Cristine Campos De Xavier Pinto & Daniel Egel, 2012. "Inverse Probability Tilting for Moment Condition Models with Missing Data," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 79(3), pages 1053-1079.
    14. Sarah Tahamont & Zubin Jelveh & Aaron Chalfin & Shi Yan & Benjamin Hansen, 2021. "Dude, Where’s My Treatment Effect? Errors in Administrative Data Linking and the Destruction of Statistical Power in Randomized Experiments," Journal of Quantitative Criminology, Springer, vol. 37(3), pages 715-749, September.
    15. Wooldridge, Jeffrey M., 2007. "Inverse probability weighted estimation for general missing data problems," Journal of Econometrics, Elsevier, vol. 141(2), pages 1281-1301, December.
    16. Thomas B. Foster & Marta Murray-Close & Liana Christin Landivar & Mark deWolf, 2020. "An Evaluation of the Gender Wage Gap Using Linked Survey and Administrative Data," Working Papers 20-34, Center for Economic Studies, U.S. Census Bureau.
    17. Fredrik Andersson & John C. Haltiwanger & Mark J. Kutzbach & Giordano Palloni & Henry O. Pollakowski & Daniel H. Weinberg, 2013. "Childhood Housing and Adult Earnings: A Between-Siblings Analysis of Housing Vouchers and Public Housing," Working Papers 13-48, Center for Economic Studies, U.S. Census Bureau.
    18. Quentin Brummet & Denise Flanagan-Doyle & Joshua Mitchell & John Voorheis & Laura Erhard & Brett McBride, 2018. "Investigating the Use of Administrative Records in the Consumer Expenditure Survey," CARRA Working Papers 2018-01, Center for Economic Studies, U.S. Census Bureau.
    19. AIGNER, Dennis J., 1973. "Regression with a binary independent variable subject to errors of observation," LIDAM Reprints CORE 130, Université catholique de Louvain, Center for Operations Research and Econometrics (CORE).
    20. Raj Chetty & Nathaniel Hendren & Maggie R Jones & Sonya R Porter, 2020. "Race and Economic Opportunity in the United States: an Intergenerational Perspective [“Intergenerational Mobility of Immigrants in the US Over Two Centuries,”]," The Quarterly Journal of Economics, President and Fellows of Harvard College, vol. 135(2), pages 711-783.
    21. H. Spencer Banzhaf & Melissa Ruby Banzhaf, 2022. "Long-Run Adult Socio-economic Outcomes from In Utero Airborne Lead Exposure," Working Papers 22-53, Center for Economic Studies, U.S. Census Bureau.
    22. Adam Bee & Joshua Mitchell & Nikolas Mittag & Jonathan Rothbaum & Carl Sanders & Lawrence Schmidt & Matthew Unrath, 2023. "National Experimental Wellbeing Statistics - Version 1," Working Papers 23-04, Center for Economic Studies, U.S. Census Bureau.
    23. Benjamin Cerf Harris, 2014. "Within and Across County Variation in SNAP Misreporting: Evidence from Linked ACS and Administrative Records," CARRA Working Papers 2014-05, Center for Economic Studies, U.S. Census Bureau.
    24. Bruce D. Meyer & Nikolas Mittag & Robert M. Goerge, 2022. "Errors in Survey Reporting and Imputation and Their Effects on Estimates of Food Stamp Program Participation," Journal of Human Resources, University of Wisconsin Press, vol. 57(5), pages 1605-1644.
    25. Fan Li & Kari Lock Morgan & Alan M. Zaslavsky, 2018. "Balancing Covariates via Propensity Score Weighting," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 113(521), pages 390-400, January.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Bruce D. Meyer & Nikolas Mittag & Derek Wu, 2024. "Race, Ethnicity, and Measurement Error," NBER Chapters, in: Race, Ethnicity, and Economic Statistics for the 21st Century, pages 327-381, National Bureau of Economic Research, Inc.
    2. Pedro H. C. Sant'Anna & Xiaojun Song & Qi Xu, 2022. "Covariate distribution balance via propensity scores," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 37(6), pages 1093-1120, September.
    3. Ganesh Karapakula, 2023. "Stable Probability Weighting: Large-Sample and Finite-Sample Estimation and Inference Methods for Heterogeneous Causal Effects of Multivalued Treatments Under Limited Overlap," Papers 2301.05703, arXiv.org, revised Jan 2023.
    4. Phillip Heiler, 2022. "Efficient Covariate Balancing for the Local Average Treatment Effect," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 40(4), pages 1569-1582, October.
    5. Frölich, Markus & Huber, Martin & Wiesenfarth, Manuel, 2017. "The finite sample performance of semi- and non-parametric estimators for treatment effects and policy evaluation," Computational Statistics & Data Analysis, Elsevier, vol. 115(C), pages 91-102.
    6. Hugo Bodory & Lorenzo Camponovo & Martin Huber & Michael Lechner, 2020. "The Finite Sample Performance of Inference Methods for Propensity Score Matching and Weighting Estimators," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 38(1), pages 183-200, January.
    7. Cousineau, Martin & Verter, Vedat & Murphy, Susan A. & Pineau, Joelle, 2023. "Estimating causal effects with optimization-based methods: A review and empirical comparison," European Journal of Operational Research, Elsevier, vol. 304(2), pages 367-380.
    8. Sven Klaassen & Jan Rabenseifner & Jannis Kueck & Philipp Bach, 2025. "Calibration Strategies for Robust Causal Estimation: Theoretical and Empirical Insights on Propensity Score-Based Estimators," Papers 2503.17290, arXiv.org, revised May 2025.
    9. Yimin Dai & Ying Yan, 2024. "Mahalanobis balancing: A multivariate perspective on approximate covariate balancing," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 51(4), pages 1450-1471, December.
    10. Martin Cousineau & Vedat Verter & Susan A. Murphy & Joelle Pineau, 2022. "Estimating causal effects with optimization-based methods: A review and empirical comparison," Papers 2203.00097, arXiv.org.
    11. Adam Bee & Joshua Mitchell & Nikolas Mittag & Jonathan Rothbaum & Carl Sanders & Lawrence Schmidt & Matthew Unrath, 2023. "National Experimental Wellbeing Statistics - Version 1," Working Papers 23-04, Center for Economic Studies, U.S. Census Bureau.
    12. Turner, Alex J. & Fichera, Eleonora & Sutton, Matt, 2021. "The effects of in-utero exposure to influenza on mental health and mortality risk throughout the life-course," Economics & Human Biology, Elsevier, vol. 43(C).
    13. Martin Huber, 2019. "An introduction to flexible methods for policy evaluation," Papers 1910.00641, arXiv.org.
    14. Zichen Deng & Maarten Lindeboom, 2021. "Early-life Famine Exposure, Hunger Recall and Later-life Health," Tinbergen Institute Discussion Papers 21-054/V, Tinbergen Institute.
    15. Anthony Strittmatter & Conny Wunsch, 2025. "Labor market sorting and the gender pay gap revisited," Journal of Population Economics, Springer;European Society for Population Economics, vol. 38(3), pages 1-41, September.
    16. Mittag, Nikolas, 2016. "Correcting for Misreporting of Government Benefits," IZA Discussion Papers 10266, IZA Network @ LISER.
    17. Adeola Oyenubi & Martin Wittenberg, 2021. "Does the choice of balance-measure matter under genetic matching?," Empirical Economics, Springer, vol. 61(1), pages 489-502, July.
    18. Pierre Chausse & George Luta, 2017. "Casual Inference using Generalized Empirical Likelihood Methods," Working Papers 1707, University of Waterloo, Department of Economics, revised Dec 2017.
    19. Raj Chetty & John N. Friedman & Nathaniel Hendren & Maggie R. Jones & Sonya R. Porter, 2018. "The Opportunity Atlas: Mapping the Childhood Roots of Social Mobility," Working Papers 18-42, Center for Economic Studies, U.S. Census Bureau.
    20. Sean Yiu & Li Su, 2022. "Joint calibrated estimation of inverse probability of treatment and censoring weights for marginal structural models," Biometrics, The International Biometric Society, vol. 78(1), pages 115-127, March.

    More about this item

    Keywords

    ;
    ;
    ;
    ;
    ;

    JEL classification:

    • C81 - Mathematical and Quantitative Methods - - Data Collection and Data Estimation Methodology; Computer Programs - - - Methodology for Collecting, Estimating, and Organizing Microeconomic Data; Data Access
    • C83 - Mathematical and Quantitative Methods - - Data Collection and Data Estimation Methodology; Computer Programs - - - Survey Methods; Sampling Methods

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:cen:wpaper:26-06. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Dawn Anderson (email available below). General contact details of provider: https://edirc.repec.org/data/cesgvus.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.