IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2604.21260.html

Calibeating Prediction-Powered Inference

Author

Listed:
  • Lars van der Laan
  • Mark Van Der Laan

Abstract

We study semisupervised mean estimation with a small labeled sample, a large unlabeled sample, and a black-box prediction model whose output may be miscalibrated. A standard approach in this setting is augmented inverse-probability weighting (AIPW) [Robins et al., 1994], which protects against prediction-model misspecification but can be inefficient when the prediction score is poorly aligned with the outcome scale. We introduce Calibrated Prediction-Powered Inference, which post-hoc calibrates the prediction score on the labeled sample before using it for semisupervised estimation. This simple step requires no retraining and can improve the original score both as a predictor of the outcome and as a regression adjustment for semisupervised inference. We study both linear and isotonic calibration. For isotonic calibration, we establish first-order optimality guarantees: isotonic post-processing can improve predictive accuracy and estimator efficiency relative to the original score and simpler post-processing rules, while no further post-processing of the fitted isotonic score yields additional first-order gains. For linear calibration, we show first-order equivalence to PPI++. We also clarify the relationship among existing estimators, showing that the original PPI estimator is a special case of AIPW and can be inefficient when the prediction model is accurate, while PPI++ is AIPW with empirical efficiency maximization [Rubin et al., 2008]. In simulations and real-data experiments, our calibrated estimators often outperform PPI and are competitive with, or outperform, AIPW and PPI++. We provide an accompanying Python package, ppi_aipw, at https://larsvanderlaan.github.io/ppi-aipw/.

Suggested Citation

  • Lars van der Laan & Mark Van Der Laan, 2026. "Calibeating Prediction-Powered Inference," Papers 2604.21260, arXiv.org.
  • Handle: RePEc:arx:papers:2604.21260
    as

    Download full text from publisher

    File URL: https://arxiv.org/pdf/2604.21260
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Kosuke Imai & Marc Ratkovic, 2014. "Covariate balancing propensity score," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 76(1), pages 243-263, January.
    2. D Benkeser & M Carone & M J Van Der Laan & P B Gilbert, 2017. "Doubly robust nonparametric inference on the average treatment effect," Biometrika, Biometrika Trust, vol. 104(4), pages 863-880.
    3. Lendle Samuel David & Fireman Bruce & van der Laan Mark J., 2015. "Balancing Score Adjusted Targeted Minimum Loss-based Estimation," Journal of Causal Inference, De Gruyter, vol. 3(2), pages 139-155, September.
    4. David Cheng & Ashwin N. Ananthakrishnan & Tianxi Cai, 2021. "Robust and efficient semi‐supervised estimation of average treatment effects with application to electronic health records data," Biometrics, The International Biometric Society, vol. 77(2), pages 413-423, June.
    5. David Benkeser & Iván Díaz & Alex Luedtke & Jodi Segal & Daniel Scharfstein & Michael Rosenblum, 2021. "Improving precision and power in randomized trials for COVID‐19 treatments using covariate adjustment, for binary, ordinal, and time‐to‐event outcomes," Biometrics, The International Biometric Society, vol. 77(4), pages 1467-1481, December.
    6. Hainmueller, Jens, 2012. "Entropy Balancing for Causal Effects: A Multivariate Reweighting Method to Produce Balanced Samples in Observational Studies," Political Analysis, Cambridge University Press, vol. 20(1), pages 25-46, January.
    7. David Benkeser & Iván Díaz & Alex Luedtke & Jodi Segal & Daniel Scharfstein & Michael Rosenblum, 2021. "Rejoinder: Improving precision and power in randomized trials for COVID‐19 treatments using covariate adjustment, for binary, ordinal, and time‐to‐event outcomes," Biometrics, The International Biometric Society, vol. 77(4), pages 1492-1494, December.
    8. Ben B. Hansen, 2008. "The prognostic analogue of the propensity score," Biometrika, Biometrika Trust, vol. 95(2), pages 481-488.
    9. Victor Chernozhukov & Mert Demirer & Esther Duflo & Iván Fernández-Val, 2018. "Generic Machine Learning Inference on Heterogeneous Treatment Effects in Randomized Experiments, with an Application to Immunization in India," NBER Working Papers 24678, National Bureau of Economic Research, Inc.
    10. Tilmann Gneiting & Fadoua Balabdaoui & Adrian E. Raftery, 2007. "Probabilistic forecasts, calibration and sharpness," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 69(2), pages 243-268, April.
    11. Han, Peisong, 2012. "A note on improving the efficiency of inverse probability weighted estimator using the augmentation term," Statistics & Probability Letters, Elsevier, vol. 82(12), pages 2221-2228.
    12. Dylan J. Foster & Vasilis Syrgkanis, 2019. "Orthogonal Statistical Learning," Papers 1901.09036, arXiv.org, revised Jun 2023.
    13. Ted Westling & Peter Gilbert & Marco Carone, 2020. "Causal isotonic regression," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 82(3), pages 719-747, July.
    14. A Rotnitzky & E Smucler & J M Robins, 2021. "Characterization of parameters with a mixed bias property," Biometrika, Biometrika Trust, vol. 108(1), pages 231-238.
    15. Chuan Hong & Katherine P. Liao & Tianxi Cai, 2019. "Semi‐supervised validation of multiple surrogate outcomes with application to electronic medical records phenotyping," Biometrics, The International Biometric Society, vol. 75(1), pages 78-89, March.
    16. Sijia Li & Peter B. Gilbert & Rui Duan & Alex Luedtke, 2025. "Data Fusion Using Weakly Aligned Sources," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 120(552), pages 2569-2579, October.
    17. Nima S. Hejazi & Mark J. van der Laan & Holly E. Janes & Peter B. Gilbert & David C. Benkeser, 2021. "Efficient nonparametric inference on the effects of stochastic interventions under two‐phase sampling, with applications to vaccine efficacy trials," Biometrics, The International Biometric Society, vol. 77(4), pages 1241-1253, December.
    18. Andrea Rotnitzky & Quanhong Lei & Mariela Sued & James M. Robins, 2012. "Improved double-robust estimation in missing data and causal inference models," Biometrika, Biometrika Trust, vol. 99(2), pages 439-456.
    19. Dang Lauren Eyler & Tarp Jens Magelund & Abrahamsen Trine Julie & Kvist Kajsa & Buse John B. & Petersen Maya & van der Laan Mark, 2025. "Experiment-selector cross-validated targeted maximum likelihood estimator for hybrid RCT-external data studies," Journal of Causal Inference, De Gruyter, vol. 13(1), pages 1-33.
    20. Sven Klaassen & Jan Rabenseifner & Jannis Kueck & Philipp Bach, 2025. "Calibration Strategies for Robust Causal Estimation: Theoretical and Empirical Insights on Propensity Score-Based Estimators," Papers 2503.17290, arXiv.org, revised May 2025.
    21. van der Laan Mark & Qiu Sky & Tarp Jens Magelund & van der Laan Lars, 2026. "Adaptive-TMLE for the average treatment effect based on randomized controlled trial augmented with real-world data," Journal of Causal Inference, De Gruyter, vol. 14(1), pages 1-36.
    22. Rubin Daniel B & van der Laan Mark J., 2008. "Empirical Efficiency Maximization: Improved Locally Efficient Covariate Adjustment in Randomized Experiments and Survival Analysis," The International Journal of Biostatistics, De Gruyter, vol. 4(1), pages 1-42, May.
    23. van der Laan Mark J. & Rubin Daniel, 2006. "Targeted Maximum Likelihood Learning," The International Journal of Biostatistics, De Gruyter, vol. 2(1), pages 1-40, December.
    24. Michael Rosenblum & Mark J. van der Laan, 2009. "Using Regression Models to Analyze Randomized Trials: Asymptotically Valid Hypothesis Tests Despite Incorrectly Specified Models," Biometrics, The International Biometric Society, vol. 65(3), pages 937-945, September.
    25. Ambarish Chattopadhyay & José R Zubizarreta, 2023. "On the implied weights of linear regression for causal inference," Biometrika, Biometrika Trust, vol. 110(3), pages 615-629.
    26. Song Xi Chen & Denis H. Y. Leung & Jing Qin, 2008. "Improving semiparametric estimation by using surrogate data," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 70(4), pages 803-823, September.
    27. V Chernozhukov & W K Newey & R Singh, 2023. "A simple and general debiased machine learning theorem with finite-sample guarantees," Biometrika, Biometrika Trust, vol. 110(1), pages 257-264.
    28. Edward H. Kennedy & Zongming Ma & Matthew D. McHugh & Dylan S. Small, 2017. "Non-parametric methods for doubly robust estimation of continuous treatment effects," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 79(4), pages 1229-1245, September.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Ganesh Karapakula, 2023. "Stable Probability Weighting: Large-Sample and Finite-Sample Estimation and Inference Methods for Heterogeneous Causal Effects of Multivalued Treatments Under Limited Overlap," Papers 2301.05703, arXiv.org, revised Jan 2023.
    2. Jianxuan Liu & Yanyuan Ma & Lan Wang, 2018. "An alternative robust estimator of average treatment effect in causal inference," Biometrics, The International Biometric Society, vol. 74(3), pages 910-923, September.
    3. Martin Huber, 2019. "An introduction to flexible methods for policy evaluation," Papers 1910.00641, arXiv.org.
    4. Jikai Jin & Vasilis Syrgkanis, 2024. "Structure-agnostic Optimality of Doubly Robust Learning for Treatment Effect Estimation," Papers 2402.14264, arXiv.org, revised Jun 2025.
    5. Ao Yuan & Anqi Yin & Ming T. Tan, 2021. "Enhanced Doubly Robust Procedure for Causal Inference," Statistics in Biosciences, Springer;International Chinese Statistical Association, vol. 13(3), pages 454-478, December.
    6. Isaac Meza & Rahul Singh, 2021. "Nested Nonparametric Instrumental Variable Regression," Papers 2112.14249, arXiv.org, revised May 2025.
    7. Frölich, Markus & Huber, Martin & Wiesenfarth, Manuel, 2017. "The finite sample performance of semi- and non-parametric estimators for treatment effects and policy evaluation," Computational Statistics & Data Analysis, Elsevier, vol. 115(C), pages 91-102.
    8. Wang, Qihua & Su, Miaomiao & Wang, Ruoyu, 2021. "A beyond multiple robust approach for missing response problem," Computational Statistics & Data Analysis, Elsevier, vol. 155(C).
    9. Han, Peisong, 2012. "A note on improving the efficiency of inverse probability weighted estimator using the augmentation term," Statistics & Probability Letters, Elsevier, vol. 82(12), pages 2221-2228.
    10. Cousineau, Martin & Verter, Vedat & Murphy, Susan A. & Pineau, Joelle, 2023. "Estimating causal effects with optimization-based methods: A review and empirical comparison," European Journal of Operational Research, Elsevier, vol. 304(2), pages 367-380.
    11. Sven Klaassen & Jan Rabenseifner & Jannis Kueck & Philipp Bach, 2025. "Calibration Strategies for Robust Causal Estimation: Theoretical and Empirical Insights on Propensity Score-Based Estimators," Papers 2503.17290, arXiv.org, revised May 2025.
    12. David Cheng & Ashwin N. Ananthakrishnan & Tianxi Cai, 2021. "Robust and efficient semi‐supervised estimation of average treatment effects with application to electronic health records data," Biometrics, The International Biometric Society, vol. 77(2), pages 413-423, June.
    13. Jelena Bradic & Stefan Wager & Yinchu Zhu, 2019. "Sparsity Double Robust Inference of Average Treatment Effects," Papers 1905.00744, arXiv.org.
    14. Sun Hao & Ertefaie Ashkan & Lu Xin & Johnson Brent A., 2020. "Improved Doubly Robust Estimation in Marginal Mean Models for Dynamic Regimes," Journal of Causal Inference, De Gruyter, vol. 8(1), pages 300-314, January.
    15. Victor Chernozhukov & Whitney Newey & Rahul Singh & Vasilis Syrgkanis, 2020. "Adversarial Estimation of Riesz Representers," Papers 2101.00009, arXiv.org, revised Apr 2024.
    16. Shuyuan Chen & Peng Zhang & Yifan Cui, 2025. "Identification and Debiased Learning of Causal Effects with General Instrumental Variables," Papers 2510.20404, arXiv.org, revised Feb 2026.
    17. Susan Athey & Guido W. Imbens & Stefan Wager, 2018. "Approximate residual balancing: debiased inference of average treatment effects in high dimensions," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 80(4), pages 597-623, September.
    18. Lundberg, Ian & Brand, Jennie E. & Jeon, Nanum, 2022. "Researcher reasoning meets computational capacity: Machine learning for social science," SocArXiv s5zc8, Center for Open Science.
    19. Zhang, Xiaoke & Xue, Wu & Wang, Qiyue, 2021. "Covariate balancing functional propensity score for functional treatments in cross-sectional observational studies," Computational Statistics & Data Analysis, Elsevier, vol. 163(C).
    20. Hengfang Wang & Jae Kwang Kim, 2025. "Information projection approach to smoothed propensity score weighting for handling selection bias under missing at random," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 77(1), pages 127-153, February.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2604.21260. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: https://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.