IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2103.04021.html
   My bibliography  Save this paper

Causal Reinforcement Learning: An Instrumental Variable Approach

Author

Listed:
  • Jin Li
  • Ye Luo
  • Xiaowei Zhang

Abstract

In the standard data analysis framework, data is first collected (once for all), and then data analysis is carried out. Moreover, the data-generating process is typically assumed to be exogenous. This approach is natural when the data analyst has no impact on how the data is generated. The advancement of digital technology, however, has facilitated firms to learn from data and make decisions at the same time. As these decisions generate new data, the data analyst -- a business manager or an algorithm -- also becomes the data generator. This interaction generates a new type of bias -- reinforcement bias -- that exacerbates the endogeneity problem in static data analysis. Causal inference techniques ought to be incorporated into reinforcement learning to address such issues.

Suggested Citation

  • Jin Li & Ye Luo & Xiaowei Zhang, 2021. "Causal Reinforcement Learning: An Instrumental Variable Approach," Papers 2103.04021, arXiv.org, revised Sep 2022.
  • Handle: RePEc:arx:papers:2103.04021
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2103.04021
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Alchian, Armen A & Demsetz, Harold, 1972. "Production , Information Costs, and Economic Organization," American Economic Review, American Economic Association, vol. 62(5), pages 777-795, December.
    2. Guido W. Imbens, 2020. "Potential Outcome and Directed Acyclic Graph Approaches to Causality: Relevance for Empirical Practice in Economics," Journal of Economic Literature, American Economic Association, vol. 58(4), pages 1129-1179, December.
    3. Hansen, Lars Peter, 1982. "Large Sample Properties of Generalized Method of Moments Estimators," Econometrica, Econometric Society, vol. 50(4), pages 1029-1054, July.
    4. Angrist, Joshua D. & Krueger, Alan B., 1999. "Empirical strategies in labor economics," Handbook of Labor Economics, in: O. Ashenfelter & D. Card (ed.), Handbook of Labor Economics, edition 1, volume 3, chapter 23, pages 1277-1366, Elsevier.
    5. Angrist, Joshua D, 1990. "Lifetime Earnings and the Vietnam Era Draft Lottery: Evidence from Social Security Administrative Records: Errata," American Economic Review, American Economic Association, vol. 80(5), pages 1284-1286, December.
    6. Stephen Morris, 2001. "Political Correctness," Journal of Political Economy, University of Chicago Press, vol. 109(2), pages 231-265, April.
    7. Heckman, James, 2013. "Sample selection bias as a specification error," Applied Econometrics, Russian Presidential Academy of National Economy and Public Administration (RANEPA), vol. 31(3), pages 129-137.
    8. Angrist, Joshua D, 1990. "Lifetime Earnings and the Vietnam Era Draft Lottery: Evidence from Social Security Administrative Records," American Economic Review, American Economic Association, vol. 80(3), pages 313-336, June.
    9. Levitt, Steven D, 1997. "Using Electoral Cycles in Police Hiring to Estimate the Effect of Police on Crime," American Economic Review, American Economic Association, vol. 87(3), pages 270-290, June.
    10. Victor Chernozhukov & Christian Hansen, 2005. "An IV Model of Quantile Treatment Effects," Econometrica, Econometric Society, vol. 73(1), pages 245-261, January.
    11. Newey, Whitney & West, Kenneth, 2014. "A simple, positive semi-definite, heteroscedasticity and autocorrelation consistent covariance matrix," Applied Econometrics, Russian Presidential Academy of National Economy and Public Administration (RANEPA), vol. 33(1), pages 125-132.
    12. Holmstrom, Bengt & Milgrom, Paul, 1991. "Multitask Principal-Agent Analyses: Incentive Contracts, Asset Ownership, and Job Design," The Journal of Law, Economics, and Organization, Oxford University Press, vol. 7(0), pages 24-52, Special I.
    13. Joshua D. Angrist & Alan B. Keueger, 1991. "Does Compulsory School Attendance Affect Schooling and Earnings?," The Quarterly Journal of Economics, President and Fellows of Harvard College, vol. 106(4), pages 979-1014.
    14. Thomas Blake & Chris Nosko & Steven Tadelis, 2015. "Consumer Heterogeneity and Paid Search Effectiveness: A Large‐Scale Field Experiment," Econometrica, Econometric Society, vol. 83, pages 155-174, January.
    15. Hausman, Jerry A., 1983. "Specification and estimation of simultaneous equation models," Handbook of Econometrics, in: Z. Griliches† & M. D. Intriligator (ed.), Handbook of Econometrics, edition 1, volume 1, chapter 7, pages 391-448, Elsevier.
    16. Julian Schrittwieser & Ioannis Antonoglou & Thomas Hubert & Karen Simonyan & Laurent Sifre & Simon Schmitt & Arthur Guez & Edward Lockhart & Demis Hassabis & Thore Graepel & Timothy Lillicrap & David , 2020. "Mastering Atari, Go, chess and shogi by planning with a learned model," Nature, Nature, vol. 588(7839), pages 604-609, December.
    17. Emilio Calvano & Giacomo Calzolari & Vincenzo Denicolò & Sergio Pastorello, 2020. "Artificial Intelligence, Algorithmic Pricing, and Collusion," American Economic Review, American Economic Association, vol. 110(10), pages 3267-3297, October.
    18. A. Belloni & D. Chen & V. Chernozhukov & C. Hansen, 2012. "Sparse Models and Methods for Optimal Instruments With an Application to Eminent Domain," Econometrica, Econometric Society, vol. 80(6), pages 2369-2429, November.
    19. Johnson, Justin Pappas & Rhodes, Andrew & Wildenbeest, Matthijs, 2020. "Platform Design when Sellers Use Pricing Algorithms," TSE Working Papers 20-1146, Toulouse School of Economics (TSE).
    20. Imbens, Guido W., 2014. "Instrumental Variables: An Econometrician's Perspective," IZA Discussion Papers 8048, Institute of Labor Economics (IZA).
    21. Danielle Li & Lindsey R. Raymond & Peter Bergman, 2020. "Hiring as Exploration," NBER Working Papers 27736, National Bureau of Economic Research, Inc.
    22. Imbens, Guido W & Angrist, Joshua D, 1994. "Identification and Estimation of Local Average Treatment Effects," Econometrica, Econometric Society, vol. 62(2), pages 467-475, March.
    23. Griliches, Zvi, 1977. "Estimating the Returns to Schooling: Some Econometric Problems," Econometrica, Econometric Society, vol. 45(1), pages 1-22, January.
    24. Whitney K. Newey & James L. Powell, 2003. "Instrumental Variable Estimation of Nonparametric Models," Econometrica, Econometric Society, vol. 71(5), pages 1565-1578, September.
    25. Chunrong Ai & Xiaohong Chen, 2003. "Efficient Estimation of Models with Conditional Moment Restrictions Containing Unknown Functions," Econometrica, Econometric Society, vol. 71(6), pages 1795-1843, November.
    26. Heckman, James J, 1990. "Varieties of Selection Bias," American Economic Review, American Economic Association, vol. 80(2), pages 313-318, May.
    27. Chernozhukov, Victor & Imbens, Guido W. & Newey, Whitney K., 2007. "Instrumental variable estimation of nonseparable models," Journal of Econometrics, Elsevier, vol. 139(1), pages 4-14, July.
    28. Mila Nambiar & David Simchi-Levi & He Wang, 2019. "Dynamic Learning and Pricing with Model Misspecification," Management Science, INFORMS, vol. 65(11), pages 4980-5000, November.
    29. Prendergast, Canice, 1993. "A Theory of "Yes Men."," American Economic Review, American Economic Association, vol. 83(4), pages 757-770, September.
    30. Xiaohong Chen & Han Hong & Elie Tamer, 2005. "Measurement Error Models with Auxiliary Data," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 72(2), pages 343-366.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Jingwen Zhang & Yifang Chen & Amandeep Singh, 2022. "Causal Bandits: Online Decision-Making in Endogenous Settings," Papers 2211.08649, arXiv.org, revised Feb 2023.
    2. Peter Hull & Michal Kolesár & Christopher Walters, 2022. "Labor by design: contributions of David Card, Joshua Angrist, and Guido Imbens," Scandinavian Journal of Economics, Wiley Blackwell, vol. 124(3), pages 603-645, July.
    3. Guido W. Imbens & Jeffrey M. Wooldridge, 2009. "Recent Developments in the Econometrics of Program Evaluation," Journal of Economic Literature, American Economic Association, vol. 47(1), pages 5-86, March.
    4. Thomas J. Kane & Cecilia E. Rouse, 1993. "Labor Market Returns to Two- and Four-Year Colleges: Is a Credit a Credit and Do Degrees Matter?," NBER Working Papers 4268, National Bureau of Economic Research, Inc.
    5. Qin, Duo, 2014. "Resurgence of instrument variable estimation and fallacy of endogeneity," Economics Discussion Papers 2014-42, Kiel Institute for the World Economy (IfW Kiel).
    6. Guilhem Bascle, 2008. "Controlling for endogeneity with instrumental variables in strategic management research," Post-Print hal-00576795, HAL.
    7. Halbert White & Karim Chalak, 2013. "Identification and Identification Failure for Treatment Effects Using Structural Systems," Econometric Reviews, Taylor & Francis Journals, vol. 32(3), pages 273-317, November.
    8. Guido W. Imbens, 2022. "Causality in Econometrics: Choice vs Chance," Econometrica, Econometric Society, vol. 90(6), pages 2541-2566, November.
    9. Heckman, James J. & Lochner, Lance J. & Todd, Petra E., 2006. "Earnings Functions, Rates of Return and Treatment Effects: The Mincer Equation and Beyond," Handbook of the Economics of Education, in: Erik Hanushek & F. Welch (ed.), Handbook of the Economics of Education, edition 1, volume 1, chapter 7, pages 307-458, Elsevier.
    10. Joshua D. Angrist & Alan B. Krueger, 2001. "Instrumental Variables and the Search for Identification: From Supply and Demand to Natural Experiments," Journal of Economic Perspectives, American Economic Association, vol. 15(4), pages 69-85, Fall.
    11. Frolich, Markus, 2007. "Nonparametric IV estimation of local average treatment effects with covariates," Journal of Econometrics, Elsevier, vol. 139(1), pages 35-75, July.
    12. van der Klaauw, Bas, 2014. "From micro data to causality: Forty years of empirical labor economics," Labour Economics, Elsevier, vol. 30(C), pages 88-97.
    13. Chen, Xiaohong, 2007. "Large Sample Sieve Estimation of Semi-Nonparametric Models," Handbook of Econometrics, in: J.J. Heckman & E.E. Leamer (ed.), Handbook of Econometrics, edition 1, volume 6, chapter 76, Elsevier.
    14. Committee, Nobel Prize, 2021. "Answering causal questions using observational data," Nobel Prize in Economics documents 2021-2, Nobel Prize Committee.
    15. Halbert White & Karim Chalak, 2008. "Identifying Structural Effects in Nonseparable Systems Using Covariates," Boston College Working Papers in Economics 734, Boston College Department of Economics.
    16. Xiaohong Chen & Andres Santos, 2018. "Overidentification in Regular Models," Econometrica, Econometric Society, vol. 86(5), pages 1771-1817, September.
    17. Arthur Lewbel, 2012. "Using Heteroscedasticity to Identify and Estimate Mismeasured and Endogenous Regressor Models," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 30(1), pages 67-80.
    18. Xiaohong Chen & Victor Chernozhukov & Sokbae Lee & Whitney K. Newey, 2014. "Local Identification of Nonparametric and Semiparametric Models," Econometrica, Econometric Society, vol. 82(2), pages 785-809, March.
    19. Markus Frölich, 2004. "Programme Evaluation with Multiple Treatments," Journal of Economic Surveys, Wiley Blackwell, vol. 18(2), pages 181-224, April.
    20. Hall, George & Rust, John, 2021. "Estimation of endogenously sampled time series: The case of commodity price speculation in the steel market," Journal of Econometrics, Elsevier, vol. 222(1), pages 219-243.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2103.04021. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.