SNPL: Simultaneous Policy Learning and Evaluation for Safe Multi-Objective Policy Improvement

My bibliography Save this paper

SNPL: Simultaneous Policy Learning and Evaluation for Safe Multi-Objective Policy Improvement

Author

Listed:

Brian Cho
Ana-Roxana Pop
Ariel Evnine
Nathan Kallus

Registered:

Abstract

To design effective digital interventions, experimenters face the challenge of learning decision policies that balance multiple objectives using offline data. Often, they aim to develop policies that maximize goal outcomes, while ensuring there are no undesirable changes in guardrail outcomes. To provide credible recommendations, experimenters must not only identify policies that satisfy the desired changes in goal and guardrail outcomes, but also offer probabilistic guarantees about the changes these policies induce. In practice, however, policy classes are often large, and digital experiments tend to produce datasets with small effect sizes relative to noise. In this setting, standard approaches such as data splitting or multiple testing often result in unstable policy selection and/or insufficient statistical power. In this paper, we provide safe noisy policy learning (SNPL), a novel approach that leverages the concept of algorithmic stability to address these challenges. Our method enables policy learning while simultaneously providing high-confidence guarantees using the entire dataset, avoiding the need for data-splitting. We present finite-sample and asymptotic versions of our algorithm that ensure the recommended policy satisfies high-probability guarantees for avoiding guardrail regressions and/or achieving goal outcome improvements. We test both variants of our approach approach empirically on a real-world application of personalizing SMS delivery. Our results on real-world data suggest that our approach offers dramatic improvements in settings with large policy classes and low signal-to-noise across both finite-sample and asymptotic safety guarantees, offering up to 300\% improvements in detection rates and 150\% improvements in policy gains at significantly smaller sample sizes.

Suggested Citation

Brian Cho & Ana-Roxana Pop & Ariel Evnine & Nathan Kallus, 2025. "SNPL: Simultaneous Policy Learning and Evaluation for Safe Multi-Objective Policy Improvement," Papers 2503.12760, arXiv.org, revised Mar 2025.

Handle: RePEc:arx:papers:2503.12760

Download full text from publisher

References listed on IDEAS

Zhan, Ruohan & Hadad, Vitor & Hirshberg, David A. & Athey, Susan, 2021. "Off-Policy Evaluation via Adaptive Weighting with Data from Contextual Bandits," Research Papers 3970, Stanford University, Graduate School of Business.
- Ruohan Zhan & Vitor Hadad & David A. Hirshberg & Susan Athey, 2021. "Off-Policy Evaluation via Adaptive Weighting with Data from Contextual Bandits," Papers 2106.02029, arXiv.org, revised Jun 2021.
Victor Chernozhukov & Denis Chetverikov & Mert Demirer & Esther Duflo & Christian Hansen & Whitney Newey & James Robins, 2018. "Double/debiased machine learning for treatment and structural parameters," Econometrics Journal, Royal Economic Society, vol. 21(1), pages 1-68, February.
- Victor Chernozhukov & Denis Chetverikov & Mert Demirer & Esther Duflo & Christian Hansen & Whitney Newey & James Robins, 2017. "Double/Debiased Machine Learning for Treatment and Structural Parameters," NBER Working Papers 23564, National Bureau of Economic Research, Inc.
- Victor Chernozhukov & Denis Chetverikov & Mert Demirer & Esther Duflo & Christian Hansen & Whitney K. Newey & James Robins, 2017. "Double/debiased machine learning for treatment and structural parameters," CeMMAP working papers CWP28/17, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
- Victor Chernozhukov & Denis Chetverikov & Mert Demirer & Esther Duflo & Christian Hansen & Whitney K. Newey & James Robins, 2017. "Double/debiased machine learning for treatment and structural parameters," CeMMAP working papers 28/17, Institute for Fiscal Studies.
Toru Kitagawa & Aleksey Tetenov, 2018. "Who Should Be Treated? Empirical Welfare Maximization Methods for Treatment Choice," Econometrica, Econometric Society, vol. 86(2), pages 591-616, March.
- Toru Kitagawa & Aleksey Tetenov, 2015. "Who should be treated? Empirical welfare maximization methods for treatment choice," CeMMAP working papers CWP10/15, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
- Toru Kitagawa & Aleksey Tetenov, 2017. "Who should be treated? Empirical welfare maximization methods for treatment choice," CeMMAP working papers CWP24/17, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
- Toru Kitagawa & Aleksey Tetenov, 2015. "Who should be Treated? Empirical Welfare Maximization Methods for Treatment Choice," Carlo Alberto Notebooks 402, Collegio Carlo Alberto.
Susan Athey & Stefan Wager, 2021. "Policy Learning With Observational Data," Econometrica, Econometric Society, vol. 89(1), pages 133-161, January.
- Susan Athey & Stefan Wager, 2017. "Policy Learning with Observational Data," Papers 1702.02896, arXiv.org, revised Sep 2020.
Rubin Daniel & Dudoit Sandrine & van der Laan Mark, 2006. "A Method to Increase the Power of Multiple Testing Procedures Through Sample Splitting," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 5(1), pages 1-20, August.
José Luis Montiel Olea & Mikkel Plagborg‐Møller, 2019. "Simultaneous confidence bands: Theory, implementation, and an application to SVARs," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 34(1), pages 1-17, January.

Full references (including those not matched with items on IDEAS)

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Justin Whitehouse & Morgane Austern & Vasilis Syrgkanis, 2025. "Inference on Optimal Policy Values and Other Irregular Functionals via Smoothing," Papers 2507.11780, arXiv.org.
Ruohan Zhan & Zhimei Ren & Susan Athey & Zhengyuan Zhou, 2024. "Policy Learning with Adaptively Collected Data," Management Science, INFORMS, vol. 70(8), pages 5270-5297, August.
- Zhan, Ruohan & Ren, Zhimei & Athey, Susan & Zhou, Zhengyuan, 2021. "Policy Learning with Adaptively Collected Data," Research Papers 3963, Stanford University, Graduate School of Business.
- Ruohan Zhan & Zhimei Ren & Susan Athey & Zhengyuan Zhou, 2021. "Policy Learning with Adaptively Collected Data," Papers 2105.02344, arXiv.org, revised Nov 2022.
Jonas Metzger, 2022. "Adversarial Estimators," Papers 2204.10495, arXiv.org, revised Jun 2022.
Henrika Langen & Martin Huber, 2023. "How causal machine learning can leverage marketing strategies: Assessing and improving the performance of a coupon campaign," PLOS ONE, Public Library of Science, vol. 18(1), pages 1-37, January.
- Henrika Langen & Martin Huber, 2022. "How causal machine learning can leverage marketing strategies: Assessing and improving the performance of a coupon campaign," Papers 2204.10820, arXiv.org, revised Jun 2022.
Zequn Jin & Gaoqian Xu & Xi Zheng & Yahong Zhou, 2025. "Policy Learning under Unobserved Confounding: A Robust and Efficient Approach," Papers 2507.20550, arXiv.org.
Nora Bearth & Michael Lechner & Jana Mareckova & Fabian Muny, 2025. "Fairness-Aware and Interpretable Policy Learning," Papers 2509.12119, arXiv.org.
Augustine Denteh & Helge Liebert, 2022. "Who Increases Emergency Department Use? New Insights from the Oregon Health Insurance Experiment," Working Papers 2201, Tulane University, Department of Economics.
- Augustine Denteh & Helge Liebert, 2022. "Who Increases Emergency Department Use? New Insights from the Oregon Health Insurance Experiment," CESifo Working Paper Series 9664, CESifo.
- Denteh, Augustine & Liebert, Helge, 2022. "Who Increases Emergency Department Use? New Insights from the Oregon Health Insurance Experiment," IZA Discussion Papers 15192, Institute of Labor Economics (IZA).
- Augustine Denteh & Helge Liebert, 2022. "Who Increases Emergency Department Use? New Insights from the Oregon Health Insurance Experiment," Papers 2201.07072, arXiv.org, revised Apr 2023.
Ayush Sawarni & Jikai Jin & Justin Whitehouse & Vasilis Syrgkanis, 2025. "Policy Learning with Abstention," Papers 2510.19672, arXiv.org, revised Nov 2025.
Alejandro Sanchez-Becerra, 2023. "Robust inference for the treatment effect variance in experiments using machine learning," Papers 2306.03363, arXiv.org.
Goller, Daniel & Lechner, Michael & Pongratz, Tamara & Wolff, Joachim, 2025. "Active labor market policies for the long-term unemployed: New evidence from causal machine learning," Labour Economics, Elsevier, vol. 94(C).
- Goller, Daniel & Harrer, Tamara & Lechner, Michael & Wolff, Joachim, 2021. "Active labour market policies for the long-term unemployed: New evidence from causal machine learning," Economics Working Paper Series 2108, University of St. Gallen, School of Economics and Political Science.
- Daniel Goller & Tamara Harrer & Michael Lechner & Joachim Wolff, 2021. "Active labour market policies for the long-term unemployed: New evidence from causal machine learning," Papers 2106.10141, arXiv.org, revised May 2023.
- Goller, Daniel & Harrer, Tamara & Lechner, Michael & Wolff, Joachim, 2021. "Active Labour Market Policies for the Long-Term Unemployed: New Evidence from Causal Machine Learning," IZA Discussion Papers 14486, Institute of Labor Economics (IZA).
Ganesh Karapakula, 2023. "Stable Probability Weighting: Large-Sample and Finite-Sample Estimation and Inference Methods for Heterogeneous Causal Effects of Multivalued Treatments Under Limited Overlap," Papers 2301.05703, arXiv.org, revised Jan 2023.
Achim Ahrens & Alessandra Stampi‐Bombelli & Selina Kurer & Dominik Hangartner, 2024. "Optimal multi‐action treatment allocation: A two‐phase field experiment to boost immigrant naturalization," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 39(7), pages 1379-1395, November.
- Achim Ahrens & Alessandra Stampi-Bombelli & Selina Kurer & Dominik Hangartner, 2023. "Optimal multi-action treatment allocation: A two-phase field experiment to boost immigrant naturalization," Papers 2305.00545, arXiv.org, revised Feb 2024.
Patrick Rehill & Nicholas Biddle, 2025. "Policy Learning for Many Outcomes of Interest: Combining Optimal Policy Trees with Multi-objective Bayesian Optimisation," Computational Economics, Springer;Society for Computational Economics, vol. 66(2), pages 971-1001, August.
Davide Viviano & Jess Rudder, 2020. "Policy design in experiments with unknown interference," Papers 2011.08174, arXiv.org, revised May 2024.
Aldo Gael Carranza & Susan Athey, 2023. "Federated Offline Policy Learning," Papers 2305.12407, arXiv.org, revised Oct 2024.
- Carranza, Aldo Gael & Athey, Susan, 2024. "Federated Offline Policy Learning," Research Papers 4215, Stanford University, Graduate School of Business.
Emily Breza & Arun G. Chandrasekhar & Davide Viviano, 2025. "Generalizability with ignorance in mind: learning what we do (not) know for archetypes discovery," Papers 2501.13355, arXiv.org, revised Jul 2025.
Yanqin Fan & Yuan Qi & Gaoqian Xu, 2025. "Policy Learning with $\alpha$-Expected Welfare," Papers 2505.00256, arXiv.org.
Chunrong Ai & Yue Fang & Haitian Xie, 2024. "Data-Driven Policy Learning for Continuous Treatments," Papers 2402.02535, arXiv.org, revised Dec 2025.
Asanov, Anastasiya-Mariya & Asanov, Igor & Buenstorf, Guido, 2024. "A low-cost digital first aid tool to reduce psychological distress in refugees: A multi-country randomized controlled trial of self-help online in the first months after the invasion of Ukraine," Social Science & Medicine, Elsevier, vol. 362(C).
Kyle Colangelo & Ying-Ying Lee, 2019. "Double debiased machine learning nonparametric inference with continuous treatments," CeMMAP working papers CWP72/19, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.

More about this item

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2503.12760. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

SNPL: Simultaneous Policy Learning and Evaluation for Safe Multi-Objective Policy Improvement

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Most related items

More about this item

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data