Personalized Policy Learning through Discrete Experimentation: Theory and Empirical Evidence

Personalized Policy Learning through Discrete Experimentation: Theory and Empirical Evidence

Author

Listed:

Zhiqi Zhang
Zhiyu Zeng
Ruohan Zhan
Dennis Zhang

Abstract

Randomized Controlled Trials (RCTs), or A/B testing, have become the gold standard for optimizing various operational policies on online platforms. However, RCTs on these platforms typically cover a limited number of discrete treatment levels, while the platforms increasingly face complex operational challenges involving optimizing continuous variables, such as pricing and incentive programs. The current industry practice involves discretizing these continuous decision variables into several treatment levels and selecting the optimal discrete treatment level. This approach, however, often leads to suboptimal decisions as it cannot accurately extrapolate performance for untested treatment levels and fails to account for heterogeneity in treatment effects across user characteristics. This study addresses these limitations by developing a theoretically solid and empirically verified framework to learn personalized continuous policies based on high-dimensional user characteristics, using observations from an RCT with only a discrete set of treatment levels. Specifically, we introduce a deep learning for policy targeting (DLPT) framework that includes both personalized policy value estimation and personalized policy learning. We prove that our policy value estimators are asymptotically unbiased and consistent, and the learned policy achieves a root-n-regret bound. We empirically validate our methods in collaboration with a leading social media platform to optimize incentive levels for content creation. Results demonstrate that our DLPT framework significantly outperforms existing benchmarks, achieving substantial improvements in both evaluating the value of policies for each user group and identifying the optimal personalized policy.

Suggested Citation

Zhiqi Zhang & Zhiyu Zeng & Ruohan Zhan & Dennis Zhang, 2026. "Personalized Policy Learning through Discrete Experimentation: Theory and Empirical Evidence," Papers 2602.05099, arXiv.org.

Handle: RePEc:arx:papers:2602.05099

Download full text from publisher

References listed on IDEAS

Guanhua Chen & Donglin Zeng & Michael R. Kosorok, 2016. "Personalized Dose Finding Using Outcome Weighted Learning," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 111(516), pages 1509-1521, October.
Emre Demirkaya & Yingying Fan & Lan Gao & Jinchi Lv & Patrick Vossler & Jingbo Wang, 2024. "Optimal Nonparametric Inference with Two-Scale Distributional Nearest Neighbors," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 119(545), pages 297-307, January.
Günter J. Hitsch & Sanjog Misra & Walter W. Zhang, 2024. "Heterogeneous treatment effects and optimal targeting policy evaluation," Quantitative Marketing and Economics (QME), Springer, vol. 22(2), pages 115-168, June.
Iavor Bojinov & David Simchi-Levi & Jinglong Zhao, 2023. "Design and Analysis of Switchback Experiments," Management Science, INFORMS, vol. 69(7), pages 3759-3777, July.
Yingqi Zhao & Donglin Zeng & A. John Rush & Michael R. Kosorok, 2012. "Estimating Individualized Treatment Rules Using Outcome Weighted Learning," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 107(499), pages 1106-1118, September.
Duncan Simester & Artem Timoshenko & Spyros I. Zoumpoulis, 2020. "Efficiently Evaluating Targeting Policies: Improving on Champion vs. Challenger Experiments," Management Science, INFORMS, vol. 66(8), pages 3412-3424, August.
Peter M. Guadagni & John D. C. Little, 1983. "A Logit Model of Brand Choice Calibrated on Scanner Data," Marketing Science, INFORMS, vol. 2(3), pages 203-238.
Peter E. Rossi & Robert E. McCulloch & Greg M. Allenby, 1996. "The Value of Purchase History Data in Target Marketing," Marketing Science, INFORMS, vol. 15(4), pages 321-340.
Duncan Simester & Artem Timoshenko & Spyros I. Zoumpoulis, 2020. "Targeting Prospective Customers: Robustness of Machine-Learning Methods to Typical Data Challenges," Management Science, INFORMS, vol. 66(6), pages 2495-2522, June.
Max H. Farrell & Tengyuan Liang & Sanjog Misra, 2021. "Deep Neural Networks for Estimation and Inference," Econometrica, Econometric Society, vol. 89(1), pages 181-213, January.
- Max H. Farrell & Tengyuan Liang & Sanjog Misra, 2018. "Deep Neural Networks for Estimation and Inference," Papers 1809.09953, arXiv.org, revised Sep 2019.
Hidehiko Ichimura & Whitney K. Newey, 2022. "The influence function of semiparametric estimators," Quantitative Economics, Econometric Society, vol. 13(1), pages 29-61, January.
- Hidehiko Ichimura & Whitney K. Newey, 2015. "The influence function of semiparametric estimators," CeMMAP working papers CWP44/15, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
- Hidehiko Ichimura & Whitney K. Newey, 2015. "The Influence Function of Semiparametric Estimators," CIRJE F-Series CIRJE-F-985, CIRJE, Faculty of Economics, University of Tokyo.
- Hidehiko Ichimura & Whitney K. Newey, 2017. "The influence function of semiparametric estimators," CeMMAP working papers 06/17, Institute for Fiscal Studies.
- Hidehiko Ichimura & Whitney K. Newey, 2015. "The influence function of semiparametric estimators," CeMMAP working papers 44/15, Institute for Fiscal Studies.
- Hidehiko Ichimura & Whitney K. Newey, 2017. "The influence function of semiparametric estimators," CeMMAP working papers CWP06/17, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
Ramesh Johari & Hannah Li & Inessa Liskovich & Gabriel Y. Weintraub, 2022. "Experimental Design in Two-Sided Platforms: An Analysis of Bias," Management Science, INFORMS, vol. 68(10), pages 7069-7089, October.
Ruohan Zhan & Zhimei Ren & Susan Athey & Zhengyuan Zhou, 2024. "Policy Learning with Adaptively Collected Data," Management Science, INFORMS, vol. 70(8), pages 5270-5297, August.
- Ruohan Zhan & Zhimei Ren & Susan Athey & Zhengyuan Zhou, 2021. "Policy Learning with Adaptively Collected Data," Papers 2105.02344, arXiv.org, revised Nov 2022.
- Zhan, Ruohan & Ren, Zhimei & Athey, Susan & Zhou, Zhengyuan, 2021. "Policy Learning with Adaptively Collected Data," Research Papers 3963, Stanford University, Graduate School of Business.
Zhiyu Zeng & Hengchen Dai & Dennis J. Zhang & Heng Zhang & Renyu Zhang & Zhiwei Xu & Zuo-Jun Max Shen, 2023. "The Impact of Social Nudges on User-Generated Content for Social Network Platforms," Management Science, INFORMS, vol. 69(9), pages 5189-5208, September.
Ruoxuan Xiong & Susan Athey & Mohsen Bayati & Guido Imbens, 2024. "Optimal Experimental Design for Staggered Rollouts," Management Science, INFORMS, vol. 70(8), pages 5317-5336, August.
- Athey, Susan & Imbens, Guido W. & Bayati, Mohsen, 2019. "Optimal Experimental Design for Staggered Rollouts," Research Papers 3837, Stanford University, Graduate School of Business.
- Ruoxuan Xiong & Susan Athey & Mohsen Bayati & Guido Imbens, 2019. "Optimal Experimental Design for Staggered Rollouts," Papers 1911.03764, arXiv.org, revised Sep 2023.
Patrick Bajari & Denis Nekipelov & Stephen P. Ryan & Miaoyu Yang, 2015. "Machine Learning Methods for Demand Estimation," American Economic Review, American Economic Association, vol. 105(5), pages 481-485, May.
Kathleen T. Li & Christophe Van den Bulte, 2023. "Augmented Difference-in-Differences," Marketing Science, INFORMS, vol. 42(4), pages 746-767, July.
Jean-Pierre Dubé & Sanjog Misra, 2023. "Personalized Pricing and Consumer Welfare," Journal of Political Economy, University of Chicago Press, vol. 131(1), pages 131-189.
Hema Yoganarasimhan & Ebrahim Barzegary & Abhishek Pani, 2023. "Design and Evaluation of Optimal Free Trials," Management Science, INFORMS, vol. 69(6), pages 3220-3240, June.
Daniel Garcia & Juha Tolvanen & Alexander K. Wagner, 2022. "Demand Estimation Using Managerial Responses to Automated Price Recommendations," Management Science, INFORMS, vol. 68(11), pages 7918-7939, November.
- Daniel Garcia & Juha Tolvanen & Alexander K. Wagner, 2021. "Demand Estimation Using Managerial Responses to Automated Price Recommendations," CESifo Working Paper Series 9127, CESifo.
Stefan Wager & Kuang Xu, 2021. "Experimenting in Equilibrium," Management Science, INFORMS, vol. 67(11), pages 6694-6715, November.
Vaart,A. W. van der, 2000. "Asymptotic Statistics," Cambridge Books, Cambridge University Press, number 9780521784504, Enero-Abr.
Dennis J. Zhang & Hengchen Dai & Lingxiu Dong & Fangfang Qi & Nannan Zhang & Xiaofei Liu & Zhongyi Liu & Jiang Yang, 2020. "The Long-term and Spillover Effects of Price Promotions on Retailing Platforms: Evidence from a Large Randomized Experiment on Alibaba," Management Science, INFORMS, vol. 66(6), pages 2589-2609, June.
Victor Chernozhukov & Denis Chetverikov & Mert Demirer & Esther Duflo & Christian Hansen & Whitney Newey & James Robins, 2018. "Double/debiased machine learning for treatment and structural parameters," Econometrics Journal, Royal Economic Society, vol. 21(1), pages 1-68, February.
- Victor Chernozhukov & Denis Chetverikov & Mert Demirer & Esther Duflo & Christian Hansen & Whitney K. Newey & James Robins, 2017. "Double/debiased machine learning for treatment and structural parameters," CeMMAP working papers CWP28/17, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
- Victor Chernozhukov & Denis Chetverikov & Mert Demirer & Esther Duflo & Christian Hansen & Whitney K. Newey & James Robins, 2017. "Double/debiased machine learning for treatment and structural parameters," CeMMAP working papers 28/17, Institute for Fiscal Studies.
- Victor Chernozhukov & Denis Chetverikov & Mert Demirer & Esther Duflo & Christian Hansen & Whitney Newey & James Robins, 2017. "Double/Debiased Machine Learning for Treatment and Structural Parameters," NBER Working Papers 23564, National Bureau of Economic Research, Inc.
Gaurab Aryal & Charles Murry & Jonathan W Williams, 2024. "Price Discrimination in International Airline Markets," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 91(2), pages 641-689.
- Gaurab Aryal & Charles Murry & Jonathan W. Williams, 2018. "Price Discrimination in International Airline Markets," Boston College Working Papers in Economics 968, Boston College Department of Economics.
- Gaurab Aryal & Charles Murry & Jonathan W. Williams, 2021. "Price Discrimination in International Airline Markets," Papers 2102.05751, arXiv.org, revised Dec 2022.
Ganesh Iyer & T. Tony Ke, 2024. "Competitive Model Selection in Algorithmic Targeting," Marketing Science, INFORMS, vol. 43(6), pages 1226-1241, November.
Ningyuan Chen & Guillermo Gallego, 2021. "Nonparametric Pricing Analytics with Customer Covariates," Operations Research, INFORMS, vol. 69(3), pages 974-984, May.
Susan Athey & Stefan Wager, 2021. "Policy Learning With Observational Data," Econometrica, Econometric Society, vol. 89(1), pages 133-161, January.
- Susan Athey & Stefan Wager, 2017. "Policy Learning with Observational Data," Papers 1702.02896, arXiv.org, revised Sep 2020.
Mert Demirer & Vasilis Syrgkanis & Greg Lewis & Victor Chernozhukov, 2019. "Semi-Parametric Efficient Policy Learning with Continuous Actions," Papers 1905.10116, arXiv.org, revised Jul 2019.
- Mert Demirer & Vasilis Syrgkanis & Greg Lewis & Victor Chernozhukov, 2019. "Semi-Parametric Efficient Policy Learning with Continuous Actions," CeMMAP working papers CWP34/19, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
Zikun Ye & Dennis J. Zhang & Heng Zhang & Renyu Zhang & Xin Chen & Zhiwei Xu, 2023. "Cold Start to Improve Market Thickness on Online Advertising Platforms: Data-Driven Algorithms and Field Experiments," Management Science, INFORMS, vol. 69(7), pages 3838-3860, July.
Iñaki Aguirre & Simon Cowan & John Vickers, 2010. "Monopoly Price Discrimination and Demand Curvature," American Economic Review, American Economic Association, vol. 100(4), pages 1601-1615, September.
- Aguirre Pérez, Iñaki & Cowan, Simon & Vickers, John, 2009. "Monopoly Price Discrimination and Demand Curvature," IKERLANAK info:eu-repo/grantAgreeme, Universidad del País Vasco - Departamento de Fundamentos del Análisis Económico I.
Alexander Bleier & Maik Eisenbeiss, 2015. "Personalized Online Advertising Effectiveness: The Interplay of What, When, and Where," Marketing Science, INFORMS, vol. 34(5), pages 669-688, September.
Zhengyuan Zhou & Susan Athey & Stefan Wager, 2023. "Offline Multi-Action Policy Learning: Generalization and Optimization," Operations Research, INFORMS, vol. 71(1), pages 148-183, January.
- Zhou, Zhengyuan & Athey, Susan & Wager, Stefan, 2018. "Offline Multi-Action Policy Learning: Generalization and Optimization," Research Papers 3734, Stanford University, Graduate School of Business.
- Zhengyuan Zhou & Susan Athey & Stefan Wager, 2018. "Offline Multi-Action Policy Learning: Generalization and Optimization," Papers 1810.04778, arXiv.org, revised Nov 2018.
Ruomeng Cui & Jun Li & Dennis J. Zhang, 2020. "Reducing Discrimination with Reviews in the Sharing Economy: Evidence from Field Experiments on Airbnb," Management Science, INFORMS, vol. 66(3), pages 1071-1094, March.
Zhijun Chen & Chongwoo Choe & Noriaki Matsushima, 2020. "Competitive Personalized Pricing," Management Science, INFORMS, vol. 66(9), pages 4003-4023, September.
- Zhijun Chen & Chongwoo Choe & Noriaki Matsushima, 2018. "Competitive Personalized Pricing," ISER Discussion Paper 1023, Institute of Social and Economic Research, The University of Osaka.
- Zhijun Chen & Chongwoo Choe Author-X-Name- Chongwoo & Noriaki Matsushima, 2019. "Competitive Personalized Pricing," Monash Economics Working Papers 02-18, Monash University, Department of Economics.
Peter Cohen & Robert Hahn & Jonathan Hall & Steven Levitt & Robert Metcalfe, 2016. "Using Big Data to Estimate Consumer Surplus: The Case of Uber," NBER Working Papers 22627, National Bureau of Economic Research, Inc.
Qiuping Yu & Yiming Zhang & Yong-Pin Zhou, 2022. "Delay Information in Virtual Queues: A Large-Scale Field Experiment on a Major Ride-Sharing Platform," Management Science, INFORMS, vol. 68(8), pages 5745-5757, August.
Nur Kaynar & Auyon Siddiq, 2023. "Estimating Effects of Incentive Contracts in Online Labor Platforms," Management Science, INFORMS, vol. 69(4), pages 2106-2126, April.
Adam N. Elmachtoub & Vishal Gupta & Michael L. Hamilton, 2021. "The Value of Personalized Pricing," Management Science, INFORMS, vol. 67(10), pages 6055-6070, October.

Full references (including those not matched with items on IDEAS)

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

repec:arx:papers:2411.16552 is not listed on IDEAS
David Simchi-Levi & Chonghuan Wang, 2026. "Pricing Experimental Design: Causal Effect, Expected Revenue and Tail Risk," Management Science, INFORMS, vol. 72(2), pages 1157-1174, February.
Artem Timoshenko & Caio Waisman, 2025. "Profit-Aligned CATE Estimation: Reconciling Policy Learning and Inference," Papers 2512.13400, arXiv.org, revised Apr 2026.
Yanqin Fan & Yuan Qi & Gaoqian Xu, 2025. "Policy Learning with $\alpha$-Expected Welfare," Papers 2505.00256, arXiv.org.
Chunrong Ai & Yue Fang & Haitian Xie, 2024. "Data-Driven Policy Learning for Continuous Treatments," Papers 2402.02535, arXiv.org, revised Dec 2025.
Günter J. Hitsch & Sanjog Misra & Walter W. Zhang, 2024. "Heterogeneous treatment effects and optimal targeting policy evaluation," Quantitative Marketing and Economics (QME), Springer, vol. 22(2), pages 115-168, June.
Jinglong Zhao, 2024. "Experimental Design For Causal Inference Through An Optimization Lens," Papers 2408.09607, arXiv.org, revised Aug 2024.
Ruohan Zhan & Shichao Han & Yuchen Hu & Zhenling Jiang, 2024. "Estimating Treatment Effects under Algorithmic Interference: A Structured Neural Networks Approach," Papers 2406.14380, arXiv.org, revised Mar 2026.
Anya Shchetkina, 2025. "Blind Targeting: Personalization under Third-Party Privacy Constraints," Papers 2507.05175, arXiv.org.
David Simchi-Levi & Chonghuan Wang, 2025. "Multi-armed Bandit Experimental Design: Online Decision-Making and Adaptive Inference," Management Science, INFORMS, vol. 71(6), pages 4828-4846, June.
Jingwen Tang & Zhengling Qi & Ethan Fang & Cong Shi, 2025. "Offline Feature-Based Pricing Under Censored Demand: A Causal Inference Approach," Manufacturing & Service Operations Management, INFORMS, vol. 27(2), pages 535-553, March.
Kyle Colangelo & Ying-Ying Lee, 2019. "Double debiased machine learning nonparametric inference with continuous treatments," CeMMAP working papers CWP72/19, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
Shuze Chen & David Simchi-Levi & Chonghuan Wang, 2024. "Improving the Estimation of Lifetime Effects in A/B Testing via Treatment Locality," Papers 2407.19618, arXiv.org, revised Sep 2025.
Kyle Colangelo & Ying-Ying Lee, 2020. "Double Debiased Machine Learning Nonparametric Inference with Continuous Treatments," Papers 2004.03036, arXiv.org, revised Sep 2023.
Walter W. Zhang, 2025. "Optimal Comprehensible Targeting," Papers 2512.02424, arXiv.org.
Zequn Jin & Gaoqian Xu & Xi Zheng & Yahong Zhou, 2025. "Policy Learning under Unobserved Confounding: A Robust and Efficient Approach," Papers 2507.20550, arXiv.org.
Tobias Cagala & Ulrich Glogowsky & Johannes Rincke & Anthony Strittmatter, 2021. "Optimal Targeting in Fundraising: A Machine-Learning Approach," Economics working papers 2021-08, Department of Economics, Johannes Kepler University Linz, Austria.
- Tobias Cagala & Ulrich Glogowsky & Johannes Rincke & Anthony Strittmatter, 2021. "Optimal Targeting in Fundraising: A Machine-Learning Approach," CESifo Working Paper Series 9037, CESifo.
Nathan Kallus, 2023. "Treatment Effect Risk: Bounds and Inference," Management Science, INFORMS, vol. 69(8), pages 4579-4590, August.
Ozan Candogan & Chen Chen & Rad Niazadeh, 2024. "Correlated Cluster-Based Randomized Experiments: Robust Variance Minimization," Management Science, INFORMS, vol. 70(6), pages 4069-4086, June.
Andrew Bennett & Nathan Kallus, 2020. "Efficient Policy Learning from Surrogate-Loss Classification Reductions," Papers 2002.05153, arXiv.org.
Yining Wang & Quanquan Liu, 2025. "Estimation of High-Dimensional Contextual Pricing Models with Nonparametric Price Confounders," Operations Research, INFORMS, vol. 73(6), pages 3065-3084, November.

More about this item

NEP fields

This paper has been announced in the following NEP Reports:

NEP-EXP-2026-02-09 (Experimental Economics)

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2602.05099. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

Personalized Policy Learning through Discrete Experimentation: Theory and Empirical Evidence

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Most related items

More about this item

NEP fields

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data