IDEAS home Printed from https://ideas.repec.org/a/inm/ormnsc/v69y2023i10p5772-5793.html
   My bibliography  Save this article

Distributionally Robust Batch Contextual Bandits

Author

Listed:
  • Nian Si

    (Department of Management Science & Engineering, Stanford University, Stanford, California 94305)

  • Fan Zhang

    (Department of Management Science & Engineering, Stanford University, Stanford, California 94305)

  • Zhengyuan Zhou

    (Stern School of Business, New York University, New York, New York 10012)

  • Jose Blanchet

    (Department of Management Science & Engineering, Stanford University, Stanford, California 94305)

Abstract

Policy learning using historical observational data are an important problem that has widespread applications. Examples include selecting offers, prices, or advertisements for consumers; choosing bids in contextual first-price auctions; and selecting medication based on patients’ characteristics. However, existing literature rests on the crucial assumption that the future environment where the learned policy will be deployed is the same as the past environment that has generated the data: an assumption that is often false or too coarse an approximation. In this paper, we lift this assumption and aim to learn a distributionally robust policy with incomplete observational data. We first present a policy evaluation procedure that allows us to assess how well the policy does under worst-case environment shift. We then establish a central limit theorem type guarantee for this proposed policy evaluation scheme. Leveraging this evaluation scheme, we further propose a novel learning algorithm that is able to learn a policy that is robust to adversarial perturbations and unknown covariate shifts with a performance guarantee based on the theory of uniform convergence. Finally, we empirically test the effectiveness of our proposed algorithm in synthetic datasets and demonstrate that it provides the robustness that is missing using standard policy learning algorithms. We conclude the paper by providing a comprehensive application of our methods in the context of a real-world voting data set.

Suggested Citation

  • Nian Si & Fan Zhang & Zhengyuan Zhou & Jose Blanchet, 2023. "Distributionally Robust Batch Contextual Bandits," Management Science, INFORMS, vol. 69(10), pages 5772-5793, October.
  • Handle: RePEc:inm:ormnsc:v:69:y:2023:i:10:p:5772-5793
    DOI: 10.1287/mnsc.2023.4678
    as

    Download full text from publisher

    File URL: http://dx.doi.org/10.1287/mnsc.2023.4678
    Download Restriction: no

    File URL: https://libkey.io/10.1287/mnsc.2023.4678?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:inm:ormnsc:v:69:y:2023:i:10:p:5772-5793. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Asher (email available below). General contact details of provider: https://edirc.repec.org/data/inforea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.