IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2509.20194.html
   My bibliography  Save this paper

Identification and Semiparametric Estimation of Conditional Means from Aggregate Data

Author

Listed:
  • Cory McCartan
  • Shiro Kuriwaki

Abstract

We introduce a new method for estimating the mean of an outcome variable within groups when researchers only observe the average of the outcome and group indicators across a set of aggregation units, such as geographical areas. Existing methods for this problem, also known as ecological inference, implicitly make strong assumptions about the aggregation process. We first formalize weaker conditions for identification, which motivates estimators that can efficiently control for many covariates. We propose a debiased machine learning estimator that is based on nuisance functions restricted to a partially linear form. Our estimator also admits a semiparametric sensitivity analysis for violations of the key identifying assumption, as well as asymptotically valid confidence intervals for local, unit-level estimates under additional assumptions. Simulations and validation on real-world data where ground truth is available demonstrate the advantages of our approach over existing methods. Open-source software is available which implements the proposed methods.

Suggested Citation

  • Cory McCartan & Shiro Kuriwaki, 2025. "Identification and Semiparametric Estimation of Conditional Means from Aggregate Data," Papers 2509.20194, arXiv.org.
  • Handle: RePEc:arx:papers:2509.20194
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2509.20194
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Victor Chernozhukov & Whitney K Newey & Rahul Singh, 2022. "Debiased machine learning of global and local parameters using regularized Riesz representers [Semiparametric instrumental variable estimation of treatment response models]," The Econometrics Journal, Royal Economic Society, vol. 25(3), pages 576-601.
    2. Chen, Xiaohong, 2007. "Large Sample Sieve Estimation of Semi-Nonparametric Models," Handbook of Econometrics, in: J.J. Heckman & E.E. Leamer (ed.), Handbook of Econometrics, edition 1, volume 6, chapter 76, Elsevier.
    3. Fan Yanqin & Sherman Robert & Shum Matthew, 2016. "Estimation and Inference in an Ecological Inference Model," Journal of Econometric Methods, De Gruyter, vol. 5(1), pages 17-48, January.
    4. Breunig, Christoph, 2021. "Varying random coefficient models," Journal of Econometrics, Elsevier, vol. 221(2), pages 381-408.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Kyle Colangelo & Ying-Ying Lee, 2019. "Double debiased machine learning nonparametric inference with continuous treatments," CeMMAP working papers CWP54/19, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    2. Kyle Colangelo & Ying-Ying Lee, 2020. "Double Debiased Machine Learning Nonparametric Inference with Continuous Treatments," Papers 2004.03036, arXiv.org, revised Sep 2023.
    3. Liu, Lin & Mukherjee, Rajarshi & Robins, James M., 2024. "Assumption-lean falsification tests of rate double-robustness of double-machine-learning estimators," Journal of Econometrics, Elsevier, vol. 240(2).
    4. Rhys Bidder & Ian Dew-Becker, 2016. "Long-Run Risk Is the Worst-Case Scenario," American Economic Review, American Economic Association, vol. 106(9), pages 2494-2527, September.
    5. Yao Luo & Peijun Sang & Ruli Xiao, 2024. "Order Statistics Approaches to Unobserved Heterogeneity in Auctions," Working Papers tecipa-776, University of Toronto, Department of Economics.
    6. Chang, Jinyuan & Chen, Song Xi & Chen, Xiaohong, 2015. "High dimensional generalized empirical likelihood for moment restrictions with dependent data," Journal of Econometrics, Elsevier, vol. 185(1), pages 283-304.
    7. Alexandre Belloni & Victor Chernozhukov & Denis Chetverikov & Christian Hansen & Kengo Kato, 2018. "High-dimensional econometrics and regularized GMM," CeMMAP working papers CWP35/18, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    8. Kyle Colangelo & Ying-Ying Lee, 2019. "Double debiased machine learning nonparametric inference with continuous treatments," CeMMAP working papers CWP72/19, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    9. Peñaranda, Francisco & Sentana, Enrique, 2016. "Duality in mean-variance frontiers with conditioning information," Journal of Empirical Finance, Elsevier, vol. 38(PB), pages 762-785.
    10. Xiaohong Chen & Demian Pouzo, 2012. "Estimation of Nonparametric Conditional Moment Models With Possibly Nonsmooth Generalized Residuals," Econometrica, Econometric Society, vol. 80(1), pages 277-321, January.
    11. Chen, Le-Yu & Lee, Sokbae, 2018. "Best subset binary prediction," Journal of Econometrics, Elsevier, vol. 206(1), pages 39-56.
    12. D'Haultfoeuille, Xavier & Gaillac, Christophe & Maurel, Arnaud, 2024. "Linear Regressions with Combined Data," TSE Working Papers 24-1602, Toulouse School of Economics (TSE).
    13. Pablo Lavado & Gonzalo Rivera, 2016. "Identifying Treatment Effects with Data Combination and Unobserved Heterogeneity," Working Papers 79, Peruvian Economic Association.
    14. Hendrik Thiel & Stephan L. Thomsen, 2015. "Individual Poverty Paths and the Stability of Control-Perception," SOEPpapers on Multidisciplinary Panel Data Research 794, DIW Berlin, The German Socio-Economic Panel (SOEP).
    15. Chen, Xiaohong & Liao, Zhipeng & Sun, Yixiao, 2014. "Sieve inference on possibly misspecified semi-nonparametric time series models," Journal of Econometrics, Elsevier, vol. 178(P3), pages 639-658.
    16. Haiqing Xu, 2010. "Social Interactions: A Game Theoretic Approach," Department of Economics Working Papers 130914, The University of Texas at Austin, Department of Economics.
    17. Taisuke Otsu & Myung Hwan Seo, 2014. "Asymptotics for maximum score method under general conditions," STICERD - Econometrics Paper Series 571, Suntory and Toyota International Centres for Economics and Related Disciplines, LSE.
    18. Mustafa Koroglu & Yiguo Sun, 2016. "Functional-Coefficient Spatial Durbin Models with Nonparametric Spatial Weights: An Application to Economic Growth," Econometrics, MDPI, vol. 4(1), pages 1-16, February.
    19. Su, Liangjun & Lu, Xun, 2013. "Nonparametric dynamic panel data models: Kernel estimation and specification testing," Journal of Econometrics, Elsevier, vol. 176(2), pages 112-133.
    20. Shiu, Ji-Liang & Hu, Yingyao, 2013. "Identification and estimation of nonlinear dynamic panel data models with unobserved covariates," Journal of Econometrics, Elsevier, vol. 175(2), pages 116-131.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2509.20194. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.