IDEAS home Printed from https://ideas.repec.org/a/vrs/offsta/v31y2015i4p783-807n13.html
   My bibliography  Save this article

On Proxy Variables and Categorical Data Fusion

Author

Listed:
  • Zhang Li-Chun

    (University of Southampton, S3RI/Social Statistics and Demography, Highfield Southampton SO17 1BJ, UK and Statistics Norway, P.O. Box 8131 Dep. 0033 Oslo, Norway.)

Abstract

The problem of inference about the joint distribution of two categorical variables based on knowledge or observations of their marginal distributions, to be referred to as categorical data fusion in this paper, is relevant in statistical matching, ecological inference, market research, and several other related fields. This article organizes the use of proxy variables, to be distinguished from other auxiliary variables, both in terms of their effects on the uncertainty of fusion and the techniques of fusion. A measure of the gains of efficiency is provided, which incorporates both the identification uncertainty associated with data fusion and the sampling uncertainty that arises when the theoretical bounds of the uncertainty space are unknown and need to be estimated. Several existing techniques for generating fusion distributions (or datasets) are described and some new ones proposed. Analysis of real-life data demonstrates empirically that proxy variables can make data fusion more precise and the constructed fusion distribution more plausible.

Suggested Citation

  • Zhang Li-Chun, 2015. "On Proxy Variables and Categorical Data Fusion," Journal of Official Statistics, Sciendo, vol. 31(4), pages 783-807, December.
  • Handle: RePEc:vrs:offsta:v:31:y:2015:i:4:p:783-807:n:13
    DOI: 10.1515/jos-2015-0045
    as

    Download full text from publisher

    File URL: https://doi.org/10.1515/jos-2015-0045
    Download Restriction: no

    File URL: https://libkey.io/10.1515/jos-2015-0045?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Conti, Pier Luigi & Marella, Daniela & Scanu, Mauro, 2008. "Evaluation of matching noise for imputation techniques based on nonparametric local linear regression estimators," Computational Statistics & Data Analysis, Elsevier, vol. 53(2), pages 354-365, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Ahfock, Daniel & Pyne, Saumyadipta & McLachlan, Geoffrey J., 2022. "Statistical file-matching of non-Gaussian data: A game theoretic approach," Computational Statistics & Data Analysis, Elsevier, vol. 168(C).
    2. Nicklas Pettersson, 2013. "Bias reduction of finite population imputation by kernel methods," Statistics in Transition new series, Główny Urząd Statystyczny (Polska), vol. 14(1), pages 139-160, March.
    3. Endres Eva & Fink Paul & Augustin Thomas, 2019. "Imprecise Imputation: A Nonparametric Micro Approach Reflecting the Natural Uncertainty of Statistical Matching with Categorical Data," Journal of Official Statistics, Sciendo, vol. 35(3), pages 599-624, September.
    4. D'Alberto, R. & Raggi, M., 2018. "Statistical Matching in agricultural economics: how to integrate different farm data sources," 2018 Conference, July 28-August 2, 2018, Vancouver, British Columbia 277101, International Association of Agricultural Economists.
    5. Riccardo D’Alberto & Matteo Zavalloni & Meri Raggi & Davide Viaggi, 2018. "AES Impact Evaluation With Integrated Farm Data: Combining Statistical Matching and Propensity Score Matching," Sustainability, MDPI, vol. 10(11), pages 1-24, November.
    6. Antonio D’Ambrosio & Massimo Aria & Roberta Siciliano, 2012. "Accurate Tree-based Missing Data Imputation and Data Fusion within the Statistical Learning Paradigm," Journal of Classification, Springer;The Classification Society, vol. 29(2), pages 227-258, July.
    7. Zahra Rezaei Ghahroodi, 2023. "Statistical matching of sample survey data: application to integrate Iranian time use and labour force surveys," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 32(3), pages 1023-1051, September.
    8. Claramunt González, Juan & van Delden, Arnout & de Waal, Ton, 2023. "Assessment of the effect of constraints in a new multivariate mixed method for statistical matching," Computational Statistics & Data Analysis, Elsevier, vol. 177(C).

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:vrs:offsta:v:31:y:2015:i:4:p:783-807:n:13. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Peter Golla (email available below). General contact details of provider: https://www.sciendo.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.