IDEAS home Printed from
   My bibliography  Save this article

Regression analysis under incomplete linkage


  • Kim, Gunky
  • Chambers, Raymond


Most probability-based methods used to link records from two distinct data sets corresponding to the same target population do not lead to perfect linkage, i.e. there are linkage errors in the merged data. Further, the linkage is often incomplete, in the sense that many records in the two data sets remain unmatched at the completion of the linkage process. This paper introduces methods that correct for the biases due to linkage errors and incomplete linkage when carrying out regression analysis using linked data. In particular, it focuses on the case where one of the linked data sets is a sample from the target population and the other is a register, i.e. it covers the entire target population.

Suggested Citation

  • Kim, Gunky & Chambers, Raymond, 2012. "Regression analysis under incomplete linkage," Computational Statistics & Data Analysis, Elsevier, vol. 56(9), pages 2756-2770.
  • Handle: RePEc:eee:csdana:v:56:y:2012:i:9:p:2756-2770
    DOI: 10.1016/j.csda.2012.02.026

    Download full text from publisher

    File URL:
    Download Restriction: Full text for ScienceDirect subscribers only.

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    1. Jixian Wang & Peter Donnan, 2002. "Adjusting for missing record linkage in outcome studies," Journal of Applied Statistics, Taylor & Francis Journals, vol. 29(6), pages 873-884.
    2. P. Lahiri & Michael D. Larsen, 2005. "Regression Analysis With Linked Data," Journal of the American Statistical Association, American Statistical Association, vol. 100, pages 222-230, March.
    Full references (including those not matched with items on IDEAS)


    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.

    Cited by:

    1. Hendrik van Broekhuizen, 2016. "Graduate unemployment and Higher Education Institutions in South Africa," Working Papers 08/2016, Stellenbosch University, Department of Economics.
    2. Tatiana V. Komarova & Denis Nekipelov & Evgeny Yakovlev, 2011. "Identification, data combination and the risk of disclosure," CeMMAP working papers CWP38/11, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.


    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:56:y:2012:i:9:p:2756-2770. See general information about how to correct material in RePEc.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: (Dana Niculescu). General contact details of provider: .

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service hosted by the Research Division of the Federal Reserve Bank of St. Louis . RePEc uses bibliographic data supplied by the respective publishers.