IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2509.17385.html
   My bibliography  Save this paper

Bayesian Semi-supervised Inference via a Debiased Modeling Approach

Author

Listed:
  • Gozde Sert
  • Abhishek Chakrabortty
  • Anirban Bhattacharya

Abstract

Inference in semi-supervised (SS) settings has gained substantial attention in recent years due to increased relevance in modern big-data problems. In a typical SS setting, there is a much larger-sized unlabeled data, containing only observations of predictors, and a moderately sized labeled data containing observations for both an outcome and the set of predictors. Such data naturally arises when the outcome, unlike the predictors, is costly or difficult to obtain. One of the primary statistical objectives in SS settings is to explore whether parameter estimation can be improved by exploiting the unlabeled data. We propose a novel Bayesian method for estimating the population mean in SS settings. The approach yields estimators that are both efficient and optimal for estimation and inference. The method itself has several interesting artifacts. The central idea behind the method is to model certain summary statistics of the data in a targeted manner, rather than the entire raw data itself, along with a novel Bayesian notion of debiasing. Specifying appropriate summary statistics crucially relies on a debiased representation of the population mean that incorporates unlabeled data through a flexible nuisance function while also learning its estimation bias. Combined with careful usage of sample splitting, this debiasing approach mitigates the effect of bias due to slow rates or misspecification of the nuisance parameter from the posterior of the final parameter of interest, ensuring its robustness and efficiency. Concrete theoretical results, via Bernstein--von Mises theorems, are established, validating all claims, and are further supported through extensive numerical studies. To our knowledge, this is possibly the first work on Bayesian inference in SS settings, and its central ideas also apply more broadly to other Bayesian semi-parametric inference problems.

Suggested Citation

  • Gozde Sert & Abhishek Chakrabortty & Anirban Bhattacharya, 2025. "Bayesian Semi-supervised Inference via a Debiased Modeling Approach," Papers 2509.17385, arXiv.org.
  • Handle: RePEc:arx:papers:2509.17385
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2509.17385
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Victor Chernozhukov & Denis Chetverikov & Mert Demirer & Esther Duflo & Christian Hansen & Whitney Newey & James Robins, 2018. "Double/debiased machine learning for treatment and structural parameters," Econometrics Journal, Royal Economic Society, vol. 21(1), pages 1-68, February.
    2. Carlos M. Carvalho & Nicholas G. Polson & James G. Scott, 2010. "The horseshoe estimator for sparse signals," Biometrika, Biometrika Trust, vol. 97(2), pages 465-480.
    3. Valen E. Johnson & David Rossell, 2012. "Bayesian Model Selection in High-Dimensional Settings," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 107(498), pages 649-660, June.
    4. Max H. Farrell & Tengyuan Liang & Sanjog Misra, 2021. "Deep Neural Networks for Estimation and Inference," Econometrica, Econometric Society, vol. 89(1), pages 181-213, January.
    5. Stefan Wager & Susan Athey, 2018. "Estimation and Inference of Heterogeneous Treatment Effects using Random Forests," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 113(523), pages 1228-1242, July.
    6. Paul Fearnhead & Dennis Prangle, 2012. "Constructing summary statistics for approximate Bayesian computation: semi-automatic approximate Bayesian computation," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 74(3), pages 419-474, June.
    7. Vaart,A. W. van der, 2000. "Asymptotic Statistics," Cambridge Books, Cambridge University Press, number 9780521784504, November.
    8. Luo, Yu & Graham, Daniel J. & McCoy, Emma J., 2023. "Semiparametric Bayesian doubly robust causal estimation," LSE Research Online Documents on Economics 117944, London School of Economics and Political Science, LSE Library.
    9. T. Tony Cai & Zijian Guo, 2020. "Semisupervised inference for explained variance in high dimensional linear regression and its applications," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 82(2), pages 391-419, April.
    10. Christoph Breunig & Ruixuan Liu & Zhengfei Yu, 2025. "Double Robust Bayesian Inference on Average Treatment Effects," Econometrica, Econometric Society, vol. 93(2), pages 539-568, March.
    11. Norets, Andriy, 2015. "Bayesian regression with nonparametric heteroskedasticity," Journal of Econometrics, Elsevier, vol. 185(2), pages 409-419.
    12. Veronika Ročková & Edward I. George, 2018. "The Spike-and-Slab LASSO," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 113(521), pages 431-444, January.
    13. Valen E. Johnson, 2005. "Bayes factors based on test statistics," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(5), pages 689-701, November.
    14. Anirban Bhattacharya & Debdeep Pati & Natesh S. Pillai & David B. Dunson, 2015. "Dirichlet--Laplace Priors for Optimal Shrinkage," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 110(512), pages 1479-1490, December.
    15. Yuqian Zhang & Jelena Bradic, 2022. "High-dimensional semi-supervised learning: in search of optimal inference of the mean [Multivariate tests comparing binomial probabilities, with application to safety studies for drugs]," Biometrika, Biometrika Trust, vol. 109(2), pages 387-403.
    16. David Azriel & Lawrence D. Brown & Michael Sklar & Richard Berk & Andreas Buja & Linda Zhao, 2022. "Semi-Supervised Linear Regression," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 117(540), pages 2238-2251, October.
    17. Christoph Breunig & Ruixuan Liu & Zhengfei Yu, 2022. "Double Robust Bayesian Inference on Average Treatment Effects," Papers 2211.16298, arXiv.org, revised Feb 2025.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Dimitris Korobilis & Kenichi Shimizu, 2022. "Bayesian Approaches to Shrinkage and Sparse Estimation," Foundations and Trends(R) in Econometrics, now publishers, vol. 11(4), pages 230-354, June.
    2. Hauzenberger, Niko & Huber, Florian & Klieber, Karin & Marcellino, Massimiliano, 2025. "Bayesian neural networks for macroeconomic analysis," Journal of Econometrics, Elsevier, vol. 249(PC).
    3. Kyle Colangelo & Ying-Ying Lee, 2019. "Double debiased machine learning nonparametric inference with continuous treatments," CeMMAP working papers CWP72/19, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    4. Davide Viviano & Jelena Bradic, 2019. "Synthetic learner: model-free inference on treatments over time," Papers 1904.01490, arXiv.org, revised Aug 2022.
    5. Combes, Pierre-Philippe & Gobillon, Laurent & Zylberberg, Yanos, 2022. "Urban economics in a historical perspective: Recovering data with machine learning," Regional Science and Urban Economics, Elsevier, vol. 94(C).
    6. Hu, Guanyu, 2021. "Spatially varying sparsity in dynamic regression models," Econometrics and Statistics, Elsevier, vol. 17(C), pages 23-34.
    7. Kyle Colangelo & Ying-Ying Lee, 2020. "Double Debiased Machine Learning Nonparametric Inference with Continuous Treatments," Papers 2004.03036, arXiv.org, revised Sep 2023.
    8. Nan Liu & Yanbo Liu & Yuya Sasaki, 2024. "Estimation and Inference for Causal Functions with Multiway Clustered Data," Papers 2409.06654, arXiv.org.
    9. Phillip Heiler & Michael C. Knaus, 2021. "Effect or Treatment Heterogeneity? Policy Evaluation with Aggregated and Disaggregated Treatments," Papers 2110.01427, arXiv.org, revised Aug 2023.
    10. Christoph Breunig & Ruixuan Liu & Zhengfei Yu, 2025. "Robust Semiparametric Inference for Bayesian Additive Regression Trees," Papers 2509.24634, arXiv.org, revised Oct 2025.
    11. Xueying Tang & Xiaofan Xu & Malay Ghosh & Prasenjit Ghosh, 2018. "Bayesian Variable Selection and Estimation Based on Global-Local Shrinkage Priors," Sankhya A: The Indian Journal of Statistics, Springer;Indian Statistical Institute, vol. 80(2), pages 215-246, August.
    12. Daniel Jacob, 2021. "CATE meets ML," Digital Finance, Springer, vol. 3(2), pages 99-148, June.
    13. Achim Ahrens & Christian B. Hansen & Mark E. Schaffer & Thomas Wiemann, 2024. "ddml: Double/debiased machine learning in Stata," Stata Journal, StataCorp LLC, vol. 24(1), pages 3-45, March.
    14. Shi, Guiling & Lim, Chae Young & Maiti, Tapabrata, 2019. "Model selection using mass-nonlocal prior," Statistics & Probability Letters, Elsevier, vol. 147(C), pages 36-44.
    15. Posch, Konstantin & Arbeiter, Maximilian & Pilz, Juergen, 2020. "A novel Bayesian approach for variable selection in linear regression models," Computational Statistics & Data Analysis, Elsevier, vol. 144(C).
    16. Achim Ahrens & Christian B. Hansen & Mark E. Schaffer & Thomas Wiemann, 2025. "Model Averaging and Double Machine Learning," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 40(3), pages 249-269, April.
    17. Athey, Susan & Imbens, Guido W., 2019. "Machine Learning Methods Economists Should Know About," Research Papers 3776, Stanford University, Graduate School of Business.
    18. Kshitij Khare & Malay Ghosh, 2022. "MCMC Convergence for Global-Local Shrinkage Priors," Journal of Quantitative Economics, Springer;The Indian Econometric Society (TIES), vol. 20(1), pages 211-234, September.
    19. Alena Skolkova, 2023. "Instrumental Variable Estimation with Many Instruments Using Elastic-Net IV," CERGE-EI Working Papers wp759, The Center for Economic Research and Graduate Education - Economics Institute, Prague.
    20. Qifan Song & Guang Cheng, 2020. "Bayesian Fusion Estimation via t Shrinkage," Sankhya A: The Indian Journal of Statistics, Springer;Indian Statistical Institute, vol. 82(2), pages 353-385, August.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2509.17385. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.