IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2512.25042.html

Compound Estimation for Binomials

Author

Listed:
  • Yan Chen
  • Lihua Lei

Abstract

Many applications involve estimating the mean of multiple binomial outcomes as a common problem -- assessing intergenerational mobility of census tracts, estimating prevalence of infectious diseases across countries, and measuring click-through rates for different demographic groups. The most standard approach is to report the plain average of each outcome. Despite simplicity, the estimates are noisy when the sample sizes or mean parameters are small. In contrast, the Empirical Bayes (EB) methods are able to boost the average accuracy by borrowing information across tasks. Nevertheless, the EB methods require a Bayesian model where the parameters are sampled from a prior distribution which, unlike the commonly-studied Gaussian case, is unidentified due to discreteness of binomial measurements. Even if the prior distribution is known, the computation is difficult when the sample sizes are heterogeneous as there is no simple joint conjugate prior for the sample size and mean parameter. In this paper, we consider the compound decision framework which treats the sample size and mean parameters as fixed quantities. We develop an approximate Stein's Unbiased Risk Estimator (SURE) for the average mean squared error given any class of estimators. For a class of machine learning-assisted linear shrinkage estimators, we establish asymptotic optimality, regret bounds, and valid inference. Unlike existing work, we work with the binomials directly without resorting to Gaussian approximations. This allows us to work with small sample sizes and/or mean parameters in both one-sample and two-sample settings. We demonstrate our approach using three datasets on firm discrimination, education outcomes, and innovation rates.

Suggested Citation

  • Yan Chen & Lihua Lei, 2025. "Compound Estimation for Binomials," Papers 2512.25042, arXiv.org.
  • Handle: RePEc:arx:papers:2512.25042
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2512.25042
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. James Leiner & Boyan Duan & Larry Wasserman & Aaditya Ramdas, 2025. "Data Fission: Splitting a Single Data Point," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 120(549), pages 135-146, January.
    2. Ehm, Werner, 1991. "Binomial approximation to the Poisson binomial distribution," Statistics & Probability Letters, Elsevier, vol. 11(1), pages 7-16, January.
    3. Jiaying Gu & Nikolaos Ignatiadis & Azeem M. Shaikh, 2025. "Reasonable uncertainty: Confidence intervals in empirical Bayes discrimination detection," Papers 2508.13110, arXiv.org.
    4. Victor Chernozhukov & Denis Chetverikov & Mert Demirer & Esther Duflo & Christian Hansen & Whitney Newey & James Robins, 2018. "Double/debiased machine learning for treatment and structural parameters," Econometrics Journal, Royal Economic Society, vol. 21(1), pages 1-68, February.
    5. Jing Lei & Natalia L. Oliveira & Ryan J. Tibshirani, 2025. "Discussion of “Data Fission: Splitting a Single Data Point” by Leiner et al," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 120(549), pages 168-169, January.
    6. A. Mukherjee & K. Chen & N. Wang & J. Zhu, 2015. "On the degrees of freedom of reduced-rank estimators in multivariate regression," Biometrika, Biometrika Trust, vol. 102(2), pages 457-477.
    7. H. Karamikabir & M. Afshari & F. Lak, 2021. "Wavelet threshold based on Stein's unbiased risk estimators of restricted location parameter in multivariate normal," Journal of Applied Statistics, Taylor & Francis Journals, vol. 48(10), pages 1712-1729, July.
    8. Patrick Kline & Christopher Walters, 2021. "Reasonable Doubt: Experimental Detection of Job‐Level Employment Discrimination," Econometrica, Econometric Society, vol. 89(2), pages 765-792, March.
    9. Jiafeng Chen, 2022. "Empirical Bayes When Estimation Precision Predicts Parameters," Papers 2212.14444, arXiv.org, revised Dec 2025.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Mao, Minghai & Raiola, Antonio & Yang, Da, 2025. "Double machine learning for Oaxaca-Blinder decomposition," Economics Letters, Elsevier, vol. 255(C).
    2. Asanov, Anastasiya-Mariya & Asanov, Igor & Buenstorf, Guido, 2024. "A low-cost digital first aid tool to reduce psychological distress in refugees: A multi-country randomized controlled trial of self-help online in the first months after the invasion of Ukraine," Social Science & Medicine, Elsevier, vol. 362(C).
    3. Justin Whitehouse & Qizhao Chen & Morgane Austern & Vasilis Syrgkanis, 2025. "Inference on Optimal Policy Values and Other Irregular Functionals via Softmax Smoothing," Papers 2507.11780, arXiv.org, revised Mar 2026.
    4. Nicolaj N. Mühlbach, 2020. "Tree-based Synthetic Control Methods: Consequences of moving the US Embassy," CREATES Research Papers 2020-04, Department of Economics and Business Economics, Aarhus University.
    5. Kyle Colangelo & Ying-Ying Lee, 2019. "Double debiased machine learning nonparametric inference with continuous treatments," CeMMAP working papers CWP72/19, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    6. Ruoxuan Xiong & Allison Koenecke & Michael Powell & Zhu Shen & Joshua T. Vogelstein & Susan Athey, 2021. "Federated Causal Inference in Heterogeneous Observational Data," Papers 2107.11732, arXiv.org, revised Apr 2023.
    7. Arne Henningsen & Guy Low & David Wuepper & Tobias Dalhaus & Hugo Storm & Dagim Belay & Stefan Hirsch, 2024. "Estimating Causal Effects with Observational Data: Guidelines for Agricultural and Applied Economists," IFRO Working Paper 2024/03, University of Copenhagen, Department of Food and Resource Economics.
    8. Khanh Duong, 2024. "Is meritocracy just? New evidence from Boolean analysis and Machine learning," Journal of Computational Social Science, Springer, vol. 7(2), pages 1795-1821, October.
    9. Jelena Bradic & Weijie Ji & Yuqian Zhang, 2021. "High-dimensional Inference for Dynamic Treatment Effects," Papers 2110.04924, arXiv.org, revised May 2023.
    10. Bingnan Guo & Yuren Qian & Xinyan Guo & Hao Zhang, 2025. "Impact of Zero-Waste City Pilot Policies on Urban Energy Consumption Intensity: Causal Inference Based on Double Machine Learning," Sustainability, MDPI, vol. 17(11), pages 1-25, May.
    11. Davide Viviano & Jelena Bradic, 2019. "Synthetic learner: model-free inference on treatments over time," Papers 1904.01490, arXiv.org, revised Aug 2022.
    12. Yoganathan, Vignesh & Osburg, Victoria-Sophie, 2024. "The mind in the machine: Estimating mind perception's effect on user satisfaction with voice-based conversational agents," Journal of Business Research, Elsevier, vol. 175(C).
    13. Sallin, Aurelién, 2021. "Estimating returns to special education: combining machine learning and text analysis to address confounding," Economics Working Paper Series 2109, University of St. Gallen, School of Economics and Political Science.
    14. Chang, Zhenghao & Zhou, Hang & Ruan, Mengyu & Li, Qin, 2025. "When major customers matter: customer concentration and ESG rating disagreement," Journal of Contemporary Accounting and Economics, Elsevier, vol. 21(3).
    15. Sung Jae Jun & Sokbae Lee, 2024. "Causal Inference Under Outcome-Based Sampling with Monotonicity Assumptions," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 42(3), pages 998-1009, July.
    16. Soren Blomquist & Anil Kumar & Che-Yuan Liang & Whitney K. Newey, 2022. "Nonlinear Budget Set Regressions for the Random Utility Model," Working Papers 2219, Federal Reserve Bank of Dallas.
    17. Tingting Zheng & Zongxuan Chai & Pengfei Zuo & Xinyu Wang, 2024. "The Effect of Multilateral Economic Cooperation on Sustainable Natural Resource Development," Sustainability, MDPI, vol. 16(17), pages 1-25, August.
    18. Guo, Jiaqi & Wang, Qiang & Li, Rongrong, 2024. "Can official development assistance promote renewable energy in sub-Saharan Africa countries? A matter of institutional transparency of recipient countries," Energy Policy, Elsevier, vol. 186(C).
    19. Xinkun Nie & Stefan Wager, 2017. "Quasi-Oracle Estimation of Heterogeneous Treatment Effects," Papers 1712.04912, arXiv.org, revised Aug 2020.
    20. Oyenubi, Adeola & Kollamparambil, Umakrishnan, 2023. "Does noncompliance with COVID-19 regulations impact the depressive symptoms of others?," Economic Modelling, Elsevier, vol. 120(C).

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2512.25042. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.