IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v190y2024ics0167947323001822.html
   My bibliography  Save this article

Calibrated regression estimation using empirical likelihood under data fusion

Author

Listed:
  • Li, Wei
  • Luo, Shanshan
  • Xu, Wangli

Abstract

Data analysis based on information from different sources, typically known as the data fusion problem, is common in economic and biomedical studies. An interesting question concerns the regression of an outcome variable on certain covariates when combining two distinct datasets. These datasets consist of a primary sample containing the outcome and a subset of the covariates, and a supplemental sample comprising information only on the full set of covariates. Previous methods have proposed doubly robust estimation procedures that employ a single propensity score model for the data fusion process and a single imputation model for the covariates available only in the supplemental dataset. However, it may be questionable to assume that either model is correctly specified due to an unknown data generating process. To address this issue, an empirical likelihood based approach that calibrates multiple propensity scores and imputation models is introduced. The resulting estimator is consistent when any one of the models is correctly specified and is robust against extreme values of the fitted propensity scores. The asymptotic normality property and the estimation efficiency are also discussed. Simulation studies show that the proposed estimator has substantial advantages over existing estimators, and an assembled U.S. household expenditure data example is used for illustration.

Suggested Citation

  • Li, Wei & Luo, Shanshan & Xu, Wangli, 2024. "Calibrated regression estimation using empirical likelihood under data fusion," Computational Statistics & Data Analysis, Elsevier, vol. 190(C).
  • Handle: RePEc:eee:csdana:v:190:y:2024:i:c:s0167947323001822
    DOI: 10.1016/j.csda.2023.107871
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167947323001822
    Download Restriction: Full text for ScienceDirect subscribers only.

    File URL: https://libkey.io/10.1016/j.csda.2023.107871?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Peisong Han & Linglong Kong & Jiwei Zhao & Xingcai Zhou, 2019. "A general framework for quantile estimation with incomplete data," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 81(2), pages 305-333, April.
    2. Ridder, Geert & Moffitt, Robert, 2007. "The Econometrics of Data Combination," Handbook of Econometrics, in: J.J. Heckman & E.E. Leamer (ed.), Handbook of Econometrics, edition 1, volume 6, chapter 75, Elsevier.
    3. Weihua Cao & Anastasios A. Tsiatis & Marie Davidian, 2009. "Improving efficiency and robustness of the doubly robust estimator for a population mean with incomplete data," Biometrika, Biometrika Trust, vol. 96(3), pages 723-734.
    4. Yilin Chen & Pengfei Li & Changbao Wu, 2020. "Doubly Robust Inference With Nonprobability Survey Samples," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 115(532), pages 2011-2021, December.
    5. Peisong Han & Lu Wang, 2013. "Estimation with missing data: beyond double robustness," Biometrika, Biometrika Trust, vol. 100(2), pages 417-430.
    6. Bryan S. Graham & Cristine Campos de Xavier Pinto & Daniel Egel, 2016. "Efficient Estimation of Data Combination Models by the Method of Auxiliary-to-Study Tilting (AST)," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 34(2), pages 288-301, April.
    7. Qin, Jing & Zhang, Biao & Leung, Denis H. Y., 2009. "Empirical Likelihood in Missing Data Problems," Journal of the American Statistical Association, American Statistical Association, vol. 104(488), pages 1492-1503.
    8. Linbo Wang & Xiao-Hua Zhou & Thomas S. Richardson, 2017. "Identification and estimation of causal effects with outcomes truncated by death," Biometrika, Biometrika Trust, vol. 104(3), pages 597-612.
    9. Zitong Lu & Zhi Geng & Wei Li & Shengyu Zhu & Jinzhu Jia, 2023. "Evaluating causes of effects by posterior effects of causes," Biometrika, Biometrika Trust, vol. 110(2), pages 449-465.
    10. Richard Blundell & Luigi Pistaferri & Ian Preston, 2008. "Consumption Inequality and Partial Insurance," American Economic Review, American Economic Association, vol. 98(5), pages 1887-1921, December.
    11. Sixia Chen & David Haziza, 2017. "Multiply robust imputation procedures for zero-inflated distributions in surveys," METRON, Springer;Sapienza Università di Roma, vol. 75(3), pages 333-343, December.
    12. Shu Yang & Yunshu Zhang, 2023. "Multiply robust matching estimators of average and quantile treatment effects," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 50(1), pages 235-265, March.
    13. Jing Qin & Biao Zhang, 2007. "Empirical‐likelihood‐based inference in missing response problems and its application in observational studies," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 69(1), pages 101-122, February.
    14. Shu Yang & Jae Kwang Kim & Rui Song, 2020. "Doubly robust inference when combining probability and non‐probability samples with high dimensional data," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 82(2), pages 445-465, April.
    15. Wang, Lu & Rotnitzky, Andrea & Lin, Xihong, 2010. "Nonparametric Regression With Missing Outcomes Using Weighted Kernel Estimating Equations," Journal of the American Statistical Association, American Statistical Association, vol. 105(491), pages 1135-1146.
    16. Peisong Han, 2014. "Multiply Robust Estimation in Regression Analysis With Missing Data," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 109(507), pages 1159-1173, September.
    17. Whitney K. Newey & James L. Powell, 2003. "Instrumental Variable Estimation of Nonparametric Models," Econometrica, Econometric Society, vol. 71(5), pages 1565-1578, September.
    18. d'Haultfoeuille, Xavier, 2010. "A new instrumental method for dealing with endogenous selection," Journal of Econometrics, Elsevier, vol. 154(1), pages 1-15, January.
    19. Shu Yang & Peng Ding, 2020. "Combining Multiple Observational Data Sources to Estimate Causal Effects," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 115(531), pages 1540-1554, July.
    20. Sixia Chen & David Haziza, 2017. "Multiply robust imputation procedures for the treatment of item nonresponse in surveys," Biometrika, Biometrika Trust, vol. 104(2), pages 439-453.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Wang, Qihua & Su, Miaomiao & Wang, Ruoyu, 2021. "A beyond multiple robust approach for missing response problem," Computational Statistics & Data Analysis, Elsevier, vol. 155(C).
    2. Peisong Han & Linglong Kong & Jiwei Zhao & Xingcai Zhou, 2019. "A general framework for quantile estimation with incomplete data," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 81(2), pages 305-333, April.
    3. Xiaogang Duan & Guosheng Yin, 2017. "Ensemble Approaches to Estimating the Population Mean with Missing Response," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 44(4), pages 899-917, December.
    4. Shixiao Zhang & Peisong Han & Changbao Wu, 2023. "Calibration Techniques Encompassing Survey Sampling, Missing Data Analysis and Causal Inference," International Statistical Review, International Statistical Institute, vol. 91(2), pages 165-192, August.
    5. Chen, Sixia & Haziza, David, 2018. "Jackknife empirical likelihood method for multiply robust estimation with missing data," Computational Statistics & Data Analysis, Elsevier, vol. 127(C), pages 258-268.
    6. Chen, Sixia & Haziza, David, 2023. "A unified framework of multiply robust estimation approaches for handling incomplete data," Computational Statistics & Data Analysis, Elsevier, vol. 179(C).
    7. Shu Yang & Yunshu Zhang, 2023. "Multiply robust matching estimators of average and quantile treatment effects," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 50(1), pages 235-265, March.
    8. Peisong Han, 2014. "Multiply Robust Estimation in Regression Analysis With Missing Data," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 109(507), pages 1159-1173, September.
    9. Satoshi Hattori & Masayuki Henmi, 2014. "Stratified doubly robust estimators for the average causal effect," Biometrics, The International Biometric Society, vol. 70(2), pages 270-277, June.
    10. Su, Miaomiao & Wang, Qihua, 2022. "A convex programming solution based debiased estimator for quantile with missing response and high-dimensional covariables," Computational Statistics & Data Analysis, Elsevier, vol. 168(C).
    11. Chixiang Chen & Biyi Shen & Aiyi Liu & Rongling Wu & Ming Wang, 2021. "A multiple robust propensity score method for longitudinal analysis with intermittent missing data," Biometrics, The International Biometric Society, vol. 77(2), pages 519-532, June.
    12. Buchinsky, Moshe & Li, Fanghua & Liao, Zhipeng, 2022. "Estimation and inference of semiparametric models using data from several sources," Journal of Econometrics, Elsevier, vol. 226(1), pages 80-103.
    13. Changbao Wu & Shixiao Zhang, 2019. "Comments on: Deville and Särndal’s calibration: revisiting a 25 years old successful optimization problem," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 28(4), pages 1082-1086, December.
    14. Peisong Han, 2016. "Combining Inverse Probability Weighting and Multiple Imputation to Improve Robustness of Estimation," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 43(1), pages 246-260, March.
    15. Sixia Chen & David Haziza, 2017. "Multiply robust imputation procedures for zero-inflated distributions in surveys," METRON, Springer;Sapienza Università di Roma, vol. 75(3), pages 333-343, December.
    16. Xinyu Li & Wang Miao & Fang Lu & Xiao‐Hua Zhou, 2023. "Improving efficiency of inference in clinical trials with external control data," Biometrics, The International Biometric Society, vol. 79(1), pages 394-403, March.
    17. Hamori, Shigeyuki & Motegi, Kaiji & Zhang, Zheng, 2019. "Calibration estimation of semiparametric copula models with data missing at random," Journal of Multivariate Analysis, Elsevier, vol. 173(C), pages 85-109.
    18. Adusumilli, Karun & Otsu, Taisuke & Qiu, Chen, 2023. "Reweighted nonparametric likelihood inference for linear functionals," LSE Research Online Documents on Economics 120198, London School of Economics and Political Science, LSE Library.
    19. Karun Adusumilli & Taisuke Otsu, 2018. "Likelihood ratio inference for missing data models," STICERD - Econometrics Paper Series 599, Suntory and Toyota International Centres for Economics and Related Disciplines, LSE.
    20. Shengfang Tang & Zongwu Cai & Ying Fang & Ming Lin, 2019. "Testing Unconfoundedness Assumption Using Auxiliary Variables," WORKING PAPERS SERIES IN THEORETICAL AND APPLIED ECONOMICS 201905, University of Kansas, Department of Economics, revised Mar 2019.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:190:y:2024:i:c:s0167947323001822. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.