IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2602.05099.html

Personalized Policy Learning through Discrete Experimentation: Theory and Empirical Evidence

Author

Listed:
  • Zhiqi Zhang
  • Zhiyu Zeng
  • Ruohan Zhan
  • Dennis Zhang

Abstract

Randomized Controlled Trials (RCTs), or A/B testing, have become the gold standard for optimizing various operational policies on online platforms. However, RCTs on these platforms typically cover a limited number of discrete treatment levels, while the platforms increasingly face complex operational challenges involving optimizing continuous variables, such as pricing and incentive programs. The current industry practice involves discretizing these continuous decision variables into several treatment levels and selecting the optimal discrete treatment level. This approach, however, often leads to suboptimal decisions as it cannot accurately extrapolate performance for untested treatment levels and fails to account for heterogeneity in treatment effects across user characteristics. This study addresses these limitations by developing a theoretically solid and empirically verified framework to learn personalized continuous policies based on high-dimensional user characteristics, using observations from an RCT with only a discrete set of treatment levels. Specifically, we introduce a deep learning for policy targeting (DLPT) framework that includes both personalized policy value estimation and personalized policy learning. We prove that our policy value estimators are asymptotically unbiased and consistent, and the learned policy achieves a root-n-regret bound. We empirically validate our methods in collaboration with a leading social media platform to optimize incentive levels for content creation. Results demonstrate that our DLPT framework significantly outperforms existing benchmarks, achieving substantial improvements in both evaluating the value of policies for each user group and identifying the optimal personalized policy.

Suggested Citation

  • Zhiqi Zhang & Zhiyu Zeng & Ruohan Zhan & Dennis Zhang, 2026. "Personalized Policy Learning through Discrete Experimentation: Theory and Empirical Evidence," Papers 2602.05099, arXiv.org.
  • Handle: RePEc:arx:papers:2602.05099
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2602.05099
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Guanhua Chen & Donglin Zeng & Michael R. Kosorok, 2016. "Personalized Dose Finding Using Outcome Weighted Learning," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 111(516), pages 1509-1521, October.
    2. Emre Demirkaya & Yingying Fan & Lan Gao & Jinchi Lv & Patrick Vossler & Jingbo Wang, 2024. "Optimal Nonparametric Inference with Two-Scale Distributional Nearest Neighbors," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 119(545), pages 297-307, January.
    3. Günter J. Hitsch & Sanjog Misra & Walter W. Zhang, 2024. "Heterogeneous treatment effects and optimal targeting policy evaluation," Quantitative Marketing and Economics (QME), Springer, vol. 22(2), pages 115-168, June.
    4. Iavor Bojinov & David Simchi-Levi & Jinglong Zhao, 2023. "Design and Analysis of Switchback Experiments," Management Science, INFORMS, vol. 69(7), pages 3759-3777, July.
    5. Yingqi Zhao & Donglin Zeng & A. John Rush & Michael R. Kosorok, 2012. "Estimating Individualized Treatment Rules Using Outcome Weighted Learning," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 107(499), pages 1106-1118, September.
    6. Duncan Simester & Artem Timoshenko & Spyros I. Zoumpoulis, 2020. "Efficiently Evaluating Targeting Policies: Improving on Champion vs. Challenger Experiments," Management Science, INFORMS, vol. 66(8), pages 3412-3424, August.
    7. Peter M. Guadagni & John D. C. Little, 1983. "A Logit Model of Brand Choice Calibrated on Scanner Data," Marketing Science, INFORMS, vol. 2(3), pages 203-238.
    8. Peter E. Rossi & Robert E. McCulloch & Greg M. Allenby, 1996. "The Value of Purchase History Data in Target Marketing," Marketing Science, INFORMS, vol. 15(4), pages 321-340.
    9. Duncan Simester & Artem Timoshenko & Spyros I. Zoumpoulis, 2020. "Targeting Prospective Customers: Robustness of Machine-Learning Methods to Typical Data Challenges," Management Science, INFORMS, vol. 66(6), pages 2495-2522, June.
    10. Max H. Farrell & Tengyuan Liang & Sanjog Misra, 2021. "Deep Neural Networks for Estimation and Inference," Econometrica, Econometric Society, vol. 89(1), pages 181-213, January.
    11. Hidehiko Ichimura & Whitney K. Newey, 2022. "The influence function of semiparametric estimators," Quantitative Economics, Econometric Society, vol. 13(1), pages 29-61, January.
    12. Ramesh Johari & Hannah Li & Inessa Liskovich & Gabriel Y. Weintraub, 2022. "Experimental Design in Two-Sided Platforms: An Analysis of Bias," Management Science, INFORMS, vol. 68(10), pages 7069-7089, October.
    13. Ruohan Zhan & Zhimei Ren & Susan Athey & Zhengyuan Zhou, 2024. "Policy Learning with Adaptively Collected Data," Management Science, INFORMS, vol. 70(8), pages 5270-5297, August.
    14. Zhiyu Zeng & Hengchen Dai & Dennis J. Zhang & Heng Zhang & Renyu Zhang & Zhiwei Xu & Zuo-Jun Max Shen, 2023. "The Impact of Social Nudges on User-Generated Content for Social Network Platforms," Management Science, INFORMS, vol. 69(9), pages 5189-5208, September.
    15. Ruoxuan Xiong & Susan Athey & Mohsen Bayati & Guido Imbens, 2024. "Optimal Experimental Design for Staggered Rollouts," Management Science, INFORMS, vol. 70(8), pages 5317-5336, August.
    16. Patrick Bajari & Denis Nekipelov & Stephen P. Ryan & Miaoyu Yang, 2015. "Machine Learning Methods for Demand Estimation," American Economic Review, American Economic Association, vol. 105(5), pages 481-485, May.
    17. Kathleen T. Li & Christophe Van den Bulte, 2023. "Augmented Difference-in-Differences," Marketing Science, INFORMS, vol. 42(4), pages 746-767, July.
    18. Jean-Pierre Dubé & Sanjog Misra, 2023. "Personalized Pricing and Consumer Welfare," Journal of Political Economy, University of Chicago Press, vol. 131(1), pages 131-189.
    19. Hema Yoganarasimhan & Ebrahim Barzegary & Abhishek Pani, 2023. "Design and Evaluation of Optimal Free Trials," Management Science, INFORMS, vol. 69(6), pages 3220-3240, June.
    20. Daniel Garcia & Juha Tolvanen & Alexander K. Wagner, 2022. "Demand Estimation Using Managerial Responses to Automated Price Recommendations," Management Science, INFORMS, vol. 68(11), pages 7918-7939, November.
    21. Stefan Wager & Kuang Xu, 2021. "Experimenting in Equilibrium," Management Science, INFORMS, vol. 67(11), pages 6694-6715, November.
    22. Vaart,A. W. van der, 2000. "Asymptotic Statistics," Cambridge Books, Cambridge University Press, number 9780521784504, November.
    23. Dennis J. Zhang & Hengchen Dai & Lingxiu Dong & Fangfang Qi & Nannan Zhang & Xiaofei Liu & Zhongyi Liu & Jiang Yang, 2020. "The Long-term and Spillover Effects of Price Promotions on Retailing Platforms: Evidence from a Large Randomized Experiment on Alibaba," Management Science, INFORMS, vol. 66(6), pages 2589-2609, June.
    24. Victor Chernozhukov & Denis Chetverikov & Mert Demirer & Esther Duflo & Christian Hansen & Whitney Newey & James Robins, 2018. "Double/debiased machine learning for treatment and structural parameters," Econometrics Journal, Royal Economic Society, vol. 21(1), pages 1-68, February.
    25. Gaurab Aryal & Charles Murry & Jonathan W Williams, 2024. "Price Discrimination in International Airline Markets," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 91(2), pages 641-689.
    26. Ganesh Iyer & T. Tony Ke, 2024. "Competitive Model Selection in Algorithmic Targeting," Marketing Science, INFORMS, vol. 43(6), pages 1226-1241, November.
    27. Ningyuan Chen & Guillermo Gallego, 2021. "Nonparametric Pricing Analytics with Customer Covariates," Operations Research, INFORMS, vol. 69(3), pages 974-984, May.
    28. Susan Athey & Stefan Wager, 2021. "Policy Learning With Observational Data," Econometrica, Econometric Society, vol. 89(1), pages 133-161, January.
    29. Mert Demirer & Vasilis Syrgkanis & Greg Lewis & Victor Chernozhukov, 2019. "Semi-Parametric Efficient Policy Learning with Continuous Actions," Papers 1905.10116, arXiv.org, revised Jul 2019.
    30. Zikun Ye & Dennis J. Zhang & Heng Zhang & Renyu Zhang & Xin Chen & Zhiwei Xu, 2023. "Cold Start to Improve Market Thickness on Online Advertising Platforms: Data-Driven Algorithms and Field Experiments," Management Science, INFORMS, vol. 69(7), pages 3838-3860, July.
    31. Iñaki Aguirre & Simon Cowan & John Vickers, 2010. "Monopoly Price Discrimination and Demand Curvature," American Economic Review, American Economic Association, vol. 100(4), pages 1601-1615, September.
    32. Alexander Bleier & Maik Eisenbeiss, 2015. "Personalized Online Advertising Effectiveness: The Interplay of What, When, and Where," Marketing Science, INFORMS, vol. 34(5), pages 669-688, September.
    33. Zhengyuan Zhou & Susan Athey & Stefan Wager, 2023. "Offline Multi-Action Policy Learning: Generalization and Optimization," Operations Research, INFORMS, vol. 71(1), pages 148-183, January.
    34. Ruomeng Cui & Jun Li & Dennis J. Zhang, 2020. "Reducing Discrimination with Reviews in the Sharing Economy: Evidence from Field Experiments on Airbnb," Management Science, INFORMS, vol. 66(3), pages 1071-1094, March.
    35. Zhijun Chen & Chongwoo Choe & Noriaki Matsushima, 2020. "Competitive Personalized Pricing," Management Science, INFORMS, vol. 66(9), pages 4003-4023, September.
    36. Peter Cohen & Robert Hahn & Jonathan Hall & Steven Levitt & Robert Metcalfe, 2016. "Using Big Data to Estimate Consumer Surplus: The Case of Uber," NBER Working Papers 22627, National Bureau of Economic Research, Inc.
    37. Qiuping Yu & Yiming Zhang & Yong-Pin Zhou, 2022. "Delay Information in Virtual Queues: A Large-Scale Field Experiment on a Major Ride-Sharing Platform," Management Science, INFORMS, vol. 68(8), pages 5745-5757, August.
    38. Nur Kaynar & Auyon Siddiq, 2023. "Estimating Effects of Incentive Contracts in Online Labor Platforms," Management Science, INFORMS, vol. 69(4), pages 2106-2126, April.
    39. Adam N. Elmachtoub & Vishal Gupta & Michael L. Hamilton, 2021. "The Value of Personalized Pricing," Management Science, INFORMS, vol. 67(10), pages 6055-6070, October.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. repec:arx:papers:2411.16552 is not listed on IDEAS
    2. Artem Timoshenko & Caio Waisman, 2025. "Policy-Aligned Estimation of Conditional Average Treatment Effects," Papers 2512.13400, arXiv.org.
    3. Yanqin Fan & Yuan Qi & Gaoqian Xu, 2025. "Policy Learning with $\alpha$-Expected Welfare," Papers 2505.00256, arXiv.org.
    4. Chunrong Ai & Yue Fang & Haitian Xie, 2024. "Data-Driven Policy Learning for Continuous Treatments," Papers 2402.02535, arXiv.org, revised Dec 2025.
    5. Günter J. Hitsch & Sanjog Misra & Walter W. Zhang, 2024. "Heterogeneous treatment effects and optimal targeting policy evaluation," Quantitative Marketing and Economics (QME), Springer, vol. 22(2), pages 115-168, June.
    6. Jinglong Zhao, 2024. "Experimental Design For Causal Inference Through An Optimization Lens," Papers 2408.09607, arXiv.org, revised Aug 2024.
    7. Ruohan Zhan & Shichao Han & Yuchen Hu & Zhenling Jiang, 2024. "Estimating Treatment Effects under Algorithmic Interference: A Structured Neural Networks Approach," Papers 2406.14380, arXiv.org, revised Mar 2026.
    8. Anya Shchetkina, 2025. "Blind Targeting: Personalization under Third-Party Privacy Constraints," Papers 2507.05175, arXiv.org.
    9. Kyle Colangelo & Ying-Ying Lee, 2019. "Double debiased machine learning nonparametric inference with continuous treatments," CeMMAP working papers CWP72/19, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    10. Shuze Chen & David Simchi-Levi & Chonghuan Wang, 2024. "Improving the Estimation of Lifetime Effects in A/B Testing via Treatment Locality," Papers 2407.19618, arXiv.org, revised Sep 2025.
    11. Kyle Colangelo & Ying-Ying Lee, 2020. "Double Debiased Machine Learning Nonparametric Inference with Continuous Treatments," Papers 2004.03036, arXiv.org, revised Sep 2023.
    12. Walter W. Zhang, 2025. "Optimal Comprehensible Targeting," Papers 2512.02424, arXiv.org.
    13. Zequn Jin & Gaoqian Xu & Xi Zheng & Yahong Zhou, 2025. "Policy Learning under Unobserved Confounding: A Robust and Efficient Approach," Papers 2507.20550, arXiv.org.
    14. Tobias Cagala & Ulrich Glogowsky & Johannes Rincke & Anthony Strittmatter, 2021. "Optimal Targeting in Fundraising: A Machine-Learning Approach," Economics working papers 2021-08, Department of Economics, Johannes Kepler University Linz, Austria.
    15. Nathan Kallus, 2023. "Treatment Effect Risk: Bounds and Inference," Management Science, INFORMS, vol. 69(8), pages 4579-4590, August.
    16. Ozan Candogan & Chen Chen & Rad Niazadeh, 2024. "Correlated Cluster-Based Randomized Experiments: Robust Variance Minimization," Management Science, INFORMS, vol. 70(6), pages 4069-4086, June.
    17. Andrew Bennett & Nathan Kallus, 2020. "Efficient Policy Learning from Surrogate-Loss Classification Reductions," Papers 2002.05153, arXiv.org.
    18. Cai, Hengrui & Shi, Chengchun & Song, Rui & Lu, Wenbin, 2023. "Jump interval-learning for individualized decision making with continuous treatments," LSE Research Online Documents on Economics 118231, London School of Economics and Political Science, LSE Library.
    19. Max H. Farrell & Tengyuan Liang & Sanjog Misra, 2020. "Deep Learning for Individual Heterogeneity," Papers 2010.14694, arXiv.org, revised Apr 2025.
    20. Hirano, Keisuke & Porter, Jack R., 2020. "Asymptotic analysis of statistical decision rules in econometrics," Handbook of Econometrics, in: Steven N. Durlauf & Lars Peter Hansen & James J. Heckman & Rosa L. Matzkin (ed.), Handbook of Econometrics, edition 1, volume 7, chapter 0, pages 283-354, Elsevier.
    21. Masahiro Kato, 2020. "Confidence Interval for Off-Policy Evaluation from Dependent Samples via Bandit Algorithm: Approach from Standardized Martingales," Papers 2006.06982, arXiv.org.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2602.05099. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.