The Adaptive Doubly Robust Estimator for Policy Evaluation in Adaptive Experiments and a Paradox Concerning Logging Policy

The Adaptive Doubly Robust Estimator for Policy Evaluation in Adaptive Experiments and a Paradox Concerning Logging Policy

Author

Listed:

Masahiro Kato
Shota Yasui
Kenichiro McAlinn

Abstract

The doubly robust (DR) estimator, which consists of two nuisance parameters, the conditional mean outcome and the logging policy (the probability of choosing an action), is crucial in causal inference. This paper proposes a DR estimator for dependent samples obtained from adaptive experiments. To obtain an asymptotically normal semiparametric estimator from dependent samples with non-Donsker nuisance estimators, we propose adaptive-fitting as a variant of sample-splitting. We also report an empirical paradox that our proposed DR estimator tends to show better performances compared to other estimators utilizing the true logging policy. While a similar phenomenon is known for estimators with i.i.d. samples, traditional explanations based on asymptotic efficiency cannot elucidate our case with dependent samples. We confirm this hypothesis through simulation studies.

Suggested Citation

Masahiro Kato & Shota Yasui & Kenichiro McAlinn, 2020. "The Adaptive Doubly Robust Estimator for Policy Evaluation in Adaptive Experiments and a Paradox Concerning Logging Policy," Papers 2010.03792, arXiv.org, revised Jun 2021.

Handle: RePEc:arx:papers:2010.03792

Download full text from publisher

References listed on IDEAS

Jinyong Hahn & Keisuke Hirano & Dean Karlan, 2011. "Adaptive Experimental Design Using the Propensity Score," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 29(1), pages 96-108, January.
- Hahn, Jinyong & Hirano, Keisuke & Karlan, Dean, 2011. "Adaptive Experimental Design Using the Propensity Score," Journal of Business & Economic Statistics, American Statistical Association, vol. 29(1), pages 96-108.
- Hahn, Jinyong & Hirano, Keisuke & Karlan, Dean, 2008. "Adaptive Experimental Design Using the Propensity Score," MPRA Paper 8315, University Library of Munich, Germany.
- Hahn, Jinyong & Hirano, Keisuke & Karlan, Dean, 2009. "Adaptive Experimental Design Using the Propensity Score," Working Papers 59, Yale University, Department of Economics.
- Jinyong Hahn & Keisuke Hirano & Dean Karlan, 2009. "Adaptive Experimental Design Using the Propensity Score," Working Papers 969, Economic Growth Center, Yale University.
- Hahn, Jinyong & Hirano, Keisuke & Karlan, Dean S., 2009. "Adaptive Experimental Design Using the Propensity Score," Center Discussion Papers 47107, Yale University, Economic Growth Center.
Masahiro Kato & Masatoshi Uehara & Shota Yasui, 2020. "Off-Policy Evaluation and Learning for External Validity under a Covariate Shift," Papers 2002.11642, arXiv.org, revised Oct 2020.
Victor Chernozhukov & Denis Chetverikov & Mert Demirer & Esther Duflo & Christian Hansen & Whitney Newey & James Robins, 2018. "Double/debiased machine learning for treatment and structural parameters," Econometrics Journal, Royal Economic Society, vol. 21(1), pages 1-68, February.
- Victor Chernozhukov & Denis Chetverikov & Mert Demirer & Esther Duflo & Christian Hansen & Whitney K. Newey & James Robins, 2017. "Double/debiased machine learning for treatment and structural parameters," CeMMAP working papers CWP28/17, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
- Victor Chernozhukov & Denis Chetverikov & Mert Demirer & Esther Duflo & Christian Hansen & Whitney K. Newey & James Robins, 2017. "Double/debiased machine learning for treatment and structural parameters," CeMMAP working papers 28/17, Institute for Fiscal Studies.
- Victor Chernozhukov & Denis Chetverikov & Mert Demirer & Esther Duflo & Christian Hansen & Whitney Newey & James Robins, 2017. "Double/Debiased Machine Learning for Treatment and Structural Parameters," NBER Working Papers 23564, National Bureau of Economic Research, Inc.
Masahiro Kato, 2020. "Confidence Interval for Off-Policy Evaluation from Dependent Samples via Bandit Algorithm: Approach from Standardized Martingales," Papers 2006.06982, arXiv.org.
Yusuke Narita & Shota Yasui & Kohei Yata, 2018. "Efficient Counterfactual Learning from Bandit Feedback," Cowles Foundation Discussion Papers 2155, Cowles Foundation for Research in Economics, Yale University.
Athey, Susan & Wager, Stefan, 2017. "Efficient Policy Learning," Research Papers 3506, Stanford University, Graduate School of Business.

Full references (including those not matched with items on IDEAS)

Citations

Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.

Cited by:

Masahiro Kato, 2021. "Adaptive Doubly Robust Estimator from Non-stationary Logging Policy under a Convergence of Average Probability," Papers 2102.08975, arXiv.org, revised Mar 2021.
Masahiro Kato & Masaaki Imaizumi & Takuya Ishihara & Toru Kitagawa, 2022. "Best Arm Identification with Contextual Information under a Small Gap," Papers 2209.07330, arXiv.org, revised Jan 2023.
Masahiro Kato & Masaaki Imaizumi & Takuya Ishihara & Toru Kitagawa, 2023. "Asymptotically Optimal Fixed-Budget Best Arm Identification with Variance-Dependent Bounds," Papers 2302.02988, arXiv.org, revised Jul 2023.

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Masahiro Kato, 2020. "Confidence Interval for Off-Policy Evaluation from Dependent Samples via Bandit Algorithm: Approach from Standardized Martingales," Papers 2006.06982, arXiv.org.
Masahiro Kato & Kenshi Abe & Kaito Ariu & Shota Yasui, 2020. "A Practical Guide of Off-Policy Evaluation for Bandit Problems," Papers 2010.12470, arXiv.org.
Masahiro Kato, 2021. "Adaptive Doubly Robust Estimator from Non-stationary Logging Policy under a Convergence of Average Probability," Papers 2102.08975, arXiv.org, revised Mar 2021.
Masahiro Kato & Masatoshi Uehara & Shota Yasui, 2020. "Off-Policy Evaluation and Learning for External Validity under a Covariate Shift," Papers 2002.11642, arXiv.org, revised Oct 2020.
Masahiro Kato & Yusuke Kaneko, 2020. "Off-Policy Evaluation of Bandit Algorithm from Dependent Samples under Batch Update Policy," Papers 2010.13554, arXiv.org.
Xinkun Nie & Stefan Wager, 2017. "Quasi-Oracle Estimation of Heterogeneous Treatment Effects," Papers 1712.04912, arXiv.org, revised Aug 2020.
Masahiro Kato & Masaaki Imaizumi & Takuya Ishihara & Toru Kitagawa, 2023. "Asymptotically Optimal Fixed-Budget Best Arm Identification with Variance-Dependent Bounds," Papers 2302.02988, arXiv.org, revised Jul 2023.
Michael C Knaus, 2022. "Double machine learning-based programme evaluation under unconfoundedness [Econometric methods for program evaluation]," The Econometrics Journal, Royal Economic Society, vol. 25(3), pages 602-627.
- Knaus, Michael C., 2020. "Double Machine Learning based Program Evaluation under Unconfoundedness," Economics Working Paper Series 2004, University of St. Gallen, School of Economics and Political Science.
- Knaus, Michael C., 2020. "Double Machine Learning Based Program Evaluation under Unconfoundedness," IZA Discussion Papers 13051, Institute of Labor Economics (IZA).
- Michael C. Knaus, 2020. "Double Machine Learning based Program Evaluation under Unconfoundedness," Papers 2003.03191, arXiv.org, revised Jun 2022.
Rahul Singh & Liyuan Xu & Arthur Gretton, 2020. "Kernel Methods for Causal Functions: Dose, Heterogeneous, and Incremental Response Curves," Papers 2010.04855, arXiv.org, revised Oct 2022.
Maria Dimakopoulou & Zhimei Ren & Zhengyuan Zhou, 2021. "Online Multi-Armed Bandits with Adaptive Inference," Papers 2102.13202, arXiv.org, revised Jun 2021.
Daniel Jacob, 2021. "CATE meets ML," Digital Finance, Springer, vol. 3(2), pages 99-148, June.
Nathan Kallus, 2022. "Treatment Effect Risk: Bounds and Inference," Papers 2201.05893, arXiv.org, revised Jul 2022.
Andrew Bennett & Nathan Kallus, 2020. "Efficient Policy Learning from Surrogate-Loss Classification Reductions," Papers 2002.05153, arXiv.org.
Pedro Carneiro & Sokbae Lee & Daniel Wilhelm, 2020. "Optimal data collection for randomized control trials," The Econometrics Journal, Royal Economic Society, vol. 23(1), pages 1-31.
- Carneiro, Pedro & Lee, Sokbae & Wilhelm, Daniel, 2016. "Optimal Data Collection for Randomized Control Trials," IZA Discussion Papers 9908, Institute of Labor Economics (IZA).
- Pedro Carneiro & Sokbae (Simon) Lee & Daniel Wilhelm, 2019. "Optimal Data Collection for Randomized Control Trials," CeMMAP working papers CWP21/19, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
- Pedro Carneiro & Sokbae (Simon) Lee & Daniel Wilhelm, 2017. "Optimal data collection for randomized control trials," CeMMAP working papers 15/17, Institute for Fiscal Studies.
- Pedro Carneiro & Sokbae (Simon) Lee & Daniel Wilhelm, 2017. "Optimal data collection for randomized control trials," CeMMAP working papers 45/17, Institute for Fiscal Studies.
- Pedro Carneiro & Sokbae (Simon) Lee & Daniel Wilhelm, 2017. "Optimal data collection for randomized control trials," CeMMAP working papers CWP15/17, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
- Pedro Carneiro & Sokbae Lee & Daniel Wilhelm, 2016. "Optimal Data Collection for Randomized Control Trials," Papers 1603.03675, arXiv.org, revised Aug 2016.
- Pedro Carneiro & Sokbae (Simon) Lee & Daniel Wilhelm, 2017. "Optimal data collection for randomized control trials," CeMMAP working papers CWP45/17, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
- Pedro Carneiro & Sokbae (Simon) Lee & Daniel Wilhelm, 2016. "Optimal data collection for randomized control trials," CeMMAP working papers 15/16, Institute for Fiscal Studies.
- Pedro Carneiro & Sokbae (Simon) Lee & Daniel Wilhelm, 2016. "Optimal data collection for randomized control trials," CeMMAP working papers CWP15/16, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
Davide Viviano, 2019. "Policy Targeting under Network Interference," Papers 1906.10258, arXiv.org, revised Apr 2024.
Kenshi Abe & Yusuke Kaneko, 2020. "Off-Policy Exploitability-Evaluation in Two-Player Zero-Sum Markov Games," Papers 2007.02141, arXiv.org, revised Dec 2020.
Masahiro Kato & Masaaki Imaizumi & Takuya Ishihara & Toru Kitagawa, 2022. "Best Arm Identification with Contextual Information under a Small Gap," Papers 2209.07330, arXiv.org, revised Jan 2023.
Hirano, Keisuke & Porter, Jack R., 2020. "Asymptotic analysis of statistical decision rules in econometrics," Handbook of Econometrics, in: Steven N. Durlauf & Lars Peter Hansen & James J. Heckman & Rosa L. Matzkin (ed.), Handbook of Econometrics, edition 1, volume 7, chapter 0, pages 283-354, Elsevier.
Harrison H. Li & Art B. Owen, 2023. "Double machine learning and design in batch adaptive experiments," Papers 2309.15297, arXiv.org.
Masahiro Kato & Akihiro Oga & Wataru Komatsubara & Ryo Inokuchi, 2024. "Active Adaptive Experimental Design for Treatment Effect Estimation with Covariate Choices," Papers 2403.03589, arXiv.org, revised Jun 2024.

More about this item

NEP fields

This paper has been announced in the following NEP Reports:

NEP-ECM-2020-10-19 (Econometrics)

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2010.03792. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

The Adaptive Doubly Robust Estimator for Policy Evaluation in Adaptive Experiments and a Paradox Concerning Logging Policy

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Citations

Most related items

More about this item

NEP fields

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data