IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2507.22312.html
   My bibliography  Save this paper

Dimension Reduction for Conditional Density Estimation with Applications to High-Dimensional Causal Inference

Author

Listed:
  • Jianhua Mei
  • Fu Ouyang
  • Thomas T. Yang

Abstract

We propose a novel and computationally efficient approach for nonparametric conditional density estimation in high-dimensional settings that achieves dimension reduction without imposing restrictive distributional or functional form assumptions. To uncover the underlying sparsity structure of the data, we develop an innovative conditional dependence measure and a modified cross-validation procedure that enables data-driven variable selection, thereby circumventing the need for subjective threshold selection. We demonstrate the practical utility of our dimension-reduced conditional density estimation by applying it to doubly robust estimators for average treatment effects. Notably, our proposed procedure is able to select relevant variables for nonparametric propensity score estimation and also inherently reduce the dimensionality of outcome regressions through a refined ignorability condition. We evaluate the finite-sample properties of our approach through comprehensive simulation studies and an empirical study on the effects of 401(k) eligibility on savings using SIPP data.

Suggested Citation

  • Jianhua Mei & Fu Ouyang & Thomas T. Yang, 2025. "Dimension Reduction for Conditional Density Estimation with Applications to High-Dimensional Causal Inference," Papers 2507.22312, arXiv.org, revised Oct 2025.
  • Handle: RePEc:arx:papers:2507.22312
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2507.22312
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. S Yang & P Ding, 2018. "Asymptotic inference of causal effects with observational studies trimmed by the estimated propensity scores," Biometrika, Biometrika Trust, vol. 105(2), pages 487-493.
    2. Keisuke Hirano & Guido W. Imbens & Geert Ridder, 2003. "Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score," Econometrica, Econometric Society, vol. 71(4), pages 1161-1189, July.
    3. Lewbel, Arthur, 2000. "Semiparametric qualitative response model estimation with unknown heteroscedasticity or instrumental variables," Journal of Econometrics, Elsevier, vol. 97(1), pages 145-177, July.
    4. Raj Chetty & John N. Friedman & Søren Leth-Petersen & Torben Heien Nielsen & Tore Olsen, 2014. "Active vs. Passive Decisions and Crowd-Out in Retirement Savings Accounts: Evidence from Denmark," The Quarterly Journal of Economics, President and Fellows of Harvard College, vol. 129(3), pages 1141-1219.
    5. Azadkia, Mona & Chatterjee, Sourav, 2021. "A simple measure of conditional dependence," LSE Research Online Documents on Economics 125584, London School of Economics and Political Science, LSE Library.
    6. Derek Messacar, 2018. "Crowd-Out, Education, and Employer Contributions to Workplace Pensions: Evidence from Canadian Tax Records," The Review of Economics and Statistics, MIT Press, vol. 100(4), pages 648-663, October.
    7. Tyler J. VanderWeele & Ilya Shpitser, 2011. "A New Criterion for Confounder Selection," Biometrics, The International Biometric Society, vol. 67(4), pages 1406-1413, December.
    8. Sourav Chatterjee, 2021. "A New Coefficient of Correlation," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 116(536), pages 2009-2022, October.
    9. Andersen, Henrik Yde, 2018. "Do tax incentives for saving in pension accounts cause debt accumulation? Evidence from Danish register data," European Economic Review, Elsevier, vol. 106(C), pages 35-53.
    10. Imbens,Guido W. & Rubin,Donald B., 2015. "Causal Inference for Statistics, Social, and Biomedical Sciences," Cambridge Books, Cambridge University Press, number 9780521885881, November.
    11. Yanyuan Ma & Liping Zhu, 2013. "A Review on Dimension Reduction," International Statistical Review, International Statistical Institute, vol. 81(1), pages 134-150, April.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Ganesh Karapakula, 2023. "Stable Probability Weighting: Large-Sample and Finite-Sample Estimation and Inference Methods for Heterogeneous Causal Effects of Multivalued Treatments Under Limited Overlap," Papers 2301.05703, arXiv.org, revised Jan 2023.
    2. Xinwei Ma & Jingshen Wang, 2018. "Robust Inference Using Inverse Probability Weighting," Papers 1810.11397, arXiv.org, revised May 2019.
    3. Chan, Marc K. & Morris, Todd & Polidano, Cain & Vu, Ha, 2022. "Income and saving responses to tax incentives for private retirement savings," Journal of Public Economics, Elsevier, vol. 206(C).
    4. Laurence O'Brien, 2023. "The effect of tax incentives on private pension saving," IFS Working Papers W23/10, Institute for Fiscal Studies.
    5. Alexandre Belloni & Victor Chernozhukov & Denis Chetverikov & Christian Hansen & Kengo Kato, 2018. "High-dimensional econometrics and regularized GMM," CeMMAP working papers CWP35/18, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    6. Clément de Chaisemartin & Luc Behaghel, 2020. "Estimating the Effect of Treatments Allocated by Randomized Waiting Lists," Econometrica, Econometric Society, vol. 88(4), pages 1453-1477, July.
    7. Mona Azadkia & Leihao Chen & Fang Han, 2025. "Bias correction for Chatterjee's graph-based correlation coefficient," Papers 2508.09040, arXiv.org.
    8. Ruoxuan Xiong & Allison Koenecke & Michael Powell & Zhu Shen & Joshua T. Vogelstein & Susan Athey, 2021. "Federated Causal Inference in Heterogeneous Observational Data," Papers 2107.11732, arXiv.org, revised Apr 2023.
    9. Hairu Wang & Yukun Liu & Haiying Zhou, 2025. "Score test for unconfoundedness under a logistic treatment assignment model," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 77(4), pages 517-533, August.
    10. Susan Athey & Guido W. Imbens & Stefan Wager, 2018. "Approximate residual balancing: debiased inference of average treatment effects in high dimensions," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 80(4), pages 597-623, September.
    11. Pedro H. C. Sant'Anna & Xiaojun Song & Qi Xu, 2022. "Covariate distribution balance via propensity scores," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 37(6), pages 1093-1120, September.
    12. Steffen Andersen & Philippe d'Astous & Jimmy Martínez-Correa & Stephen H. Shore, 2018. "Responses to Savings Commitments: Evidence from Mortgage Run-offs," Cahiers de recherche / Working Papers 1, Institut sur la retraite et l'épargne / Retirement and Savings Institute.
    13. Sung Jae Jun & Sokbae Lee, 2024. "Causal Inference Under Outcome-Based Sampling with Monotonicity Assumptions," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 42(3), pages 998-1009, July.
    14. Jiannan Lu & Peng Ding & Tirthankar Dasgupta, 2018. "Treatment Effects on Ordinal Outcomes: Causal Estimands and Sharp Bounds," Journal of Educational and Behavioral Statistics, , vol. 43(5), pages 540-567, October.
    15. Andersen, Henrik Yde, 2021. "Pension taxation, household debt and the real economy," Nationaløkonomisk tidsskrift, Nationaløkonomisk Forening, vol. 2021(1), pages 1-14.
    16. Xun Lu, 2015. "A Covariate Selection Criterion for Estimation of Treatment Effects," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 33(4), pages 506-522, October.
    17. Yingying Dong & Arthur Lewbel, 2015. "A Simple Estimator for Binary Choice Models with Endogenous Regressors," Econometric Reviews, Taylor & Francis Journals, vol. 34(1-2), pages 82-105, February.
    18. Jinglong Zhao, 2024. "Experimental Design For Causal Inference Through An Optimization Lens," Papers 2408.09607, arXiv.org, revised Aug 2024.
    19. Molinari, Francesca, 2020. "Microeconometrics with partial identification," Handbook of Econometrics, in: Steven N. Durlauf & Lars Peter Hansen & James J. Heckman & Rosa L. Matzkin (ed.), Handbook of Econometrics, edition 1, volume 7, chapter 0, pages 355-486, Elsevier.
    20. Yihui He & Fang Han, 2023. "On propensity score matching with a diverging number of matches," Papers 2310.14142, arXiv.org, revised Nov 2023.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2507.22312. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.