IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2512.03366.html

Evaluating A/B Testing Methodologies via Sample Splitting: Theory and Practice

Author

Listed:
  • Ryan Kessler
  • James McQueen
  • Miikka Rokkanen

Abstract

We develop a theoretical framework for sample splitting in A/B testing environments, where data for each test are partitioned into two splits to measure methodological performance when the true impacts of tests are unobserved. We show that sample-split estimators are generally biased for full-sample performance but consistently estimate sample-split analogues of it. We derive their asymptotic distributions, construct valid confidence intervals, and characterize the bias-variance trade-offs underlying sample-split design choices. We validate our theoretical results through simulations and provide implementation guidance for A/B testing products seeking to evaluate new estimators and decision rules.

Suggested Citation

  • Ryan Kessler & James McQueen & Miikka Rokkanen, 2025. "Evaluating A/B Testing Methodologies via Sample Splitting: Theory and Practice," Papers 2512.03366, arXiv.org, revised Mar 2026.
  • Handle: RePEc:arx:papers:2512.03366
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2512.03366
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Eduardo M. Azevedo & Alex Deng & José Luis Montiel Olea & Justin Rao & E. Glen Weyl, 2020. "A/B Testing with Fat Tails," Journal of Political Economy, University of Chicago Press, vol. 128(12), pages 4614-4000.
    2. Alexander Shapiro & Jos Berge, 2002. "Statistical inference of minimum rank factor analysis," Psychometrika, Springer;The Psychometric Society, vol. 67(1), pages 79-94, March.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Nowak, Piotr Bolesław, 2016. "The MLE of the mean of the exponential distribution based on grouped data is stochastically increasing," Statistics & Probability Letters, Elsevier, vol. 111(C), pages 49-54.
    2. Camilo Alberto Cárdenas-Hurtado & Aaron Levi Garavito-Acosta & Jorge Hernán Toro-Córdoba, 2018. "Asymmetric Effects of Terms of Trade Shocks on Tradable and Non-tradable Investment Rates: The Colombian Case," Borradores de Economia 1043, Banco de la Republica de Colombia.
    3. Anastasiou, Andreas, 2017. "Bounds for the normal approximation of the maximum likelihood estimator from m-dependent random variables," Statistics & Probability Letters, Elsevier, vol. 129(C), pages 171-181.
    4. Evelina Di Corso & Tania Cerquitelli & Daniele Apiletti, 2018. "METATECH: METeorological Data Analysis for Thermal Energy CHaracterization by Means of Self-Learning Transparent Models," Energies, MDPI, vol. 11(6), pages 1-24, May.
    5. Silva, Ivair R., 2017. "Confidence intervals through sequential Monte Carlo," Computational Statistics & Data Analysis, Elsevier, vol. 105(C), pages 112-124.
    6. Denter, Philipp & Sisak, Dana, 2015. "Do polls create momentum in political competition?," Journal of Public Economics, Elsevier, vol. 130(C), pages 1-14.
    7. Salgado Alfredo, 2018. "Incomplete Information and Costly Signaling in College Admissions," Working Papers 2018-23, Banco de México.
    8. Albrecht, James & Anderson, Axel & Vroman, Susan, 2010. "Search by committee," Journal of Economic Theory, Elsevier, vol. 145(4), pages 1386-1407, July.
    9. Stegeman, Alwin, 2016. "A new method for simultaneous estimation of the factor model parameters, factor scores, and unique parts," Computational Statistics & Data Analysis, Elsevier, vol. 99(C), pages 189-203.
    10. Mauricio Romero & Ã lvaro Riascos & Diego Jara, 2015. "On the Optimality of Answer-Copying Indices," Journal of Educational and Behavioral Statistics, , vol. 40(5), pages 435-453, October.
    11. Bruno Carballa Smichowski & Yassine Lefouili & Andrea Mantovani & Carlo Reggiani, 2025. "Data Sharing or Analytics Sharing ?," Working Papers hal-04956937, HAL.
    12. Maximilian Schaefer, 2025. "When Should we Expect Non-Decreasing Returns from Data in Prediction Tasks?," Papers 2503.03602, arXiv.org.
    13. Chen, Yunxiao & Moustaki, Irini & Zhang, H, 2020. "A note on likelihood ratio tests for models with latent variables," LSE Research Online Documents on Economics 107490, London School of Economics and Political Science, LSE Library.
    14. Wang, Yuhao & Li, Xinran, 2025. "Asymptotic theory of the best-choice rerandomization using the Mahalanobis distance," Journal of Econometrics, Elsevier, vol. 251(C).
    15. Blier-Wong, Christopher & Cossette, Hélène & Marceau, Etienne, 2023. "Risk aggregation with FGM copulas," Insurance: Mathematics and Economics, Elsevier, vol. 111(C), pages 102-120.
    16. Zhu, Qiansheng & Lang, Joseph B., 2022. "Test-inversion confidence intervals for estimands in contingency tables subject to equality constraints," Computational Statistics & Data Analysis, Elsevier, vol. 169(C).
    17. Weizhen Wang & Chongxiu Yu & Zhongzhan Zhang, 2026. "Finite-sample analytic properties of percentile bootstrap intervals," Metrika: International Journal for Theoretical and Applied Statistics, Springer, vol. 89(1), pages 1-27, January.
    18. van Bentum, Thomas & Cramer, Erhard, 2019. "Stochastic monotonicity of MLEs of the mean for exponentially distributed lifetimes under hybrid censoring," Statistics & Probability Letters, Elsevier, vol. 148(C), pages 1-8.
    19. Yusuke Narita, 2021. "A Theory of Quasi-Experimental Evaluation of School Quality," Management Science, INFORMS, vol. 67(8), pages 4982-5010, August.
    20. Yuchen Hu & Henry Zhu & Emma Brunskill & Stefan Wager, 2024. "Minimax-Regret Sample Selection in Randomized Experiments," Papers 2403.01386, arXiv.org, revised Jun 2024.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2512.03366. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.