IDEAS home Printed from https://ideas.repec.org/a/inm/orijds/v4y2025i3p197-229.html

Observational vs. Experimental Data When Making Automated Decisions Using Machine Learning

Author

Listed:
  • Carlos Fernández-Loría

    (School of Business and Management, Hong Kong University of Science and Technology, New Territories, Hong Kong)

  • Foster Provost

    (Leonard N. Stern School of Business, New York University, New York, New York 10012)

Abstract

Decisions supported by machine learning often aim to improve outcomes through interventions, such as influencing purchasing behavior with ads or increasing customer retention with special offers. However, using observational data to estimate these effects can introduce confounding bias. Although experimental data can mitigate confounding, it is not always feasible to obtain and can be costly when it is. This paper presents theoretical results focusing on the impact of confounding on decision making, emphasizing that optimizing decisions often involves determining whether a causal effect exceeds a threshold rather than minimizing bias in the estimate. Consequently, models built with readily available but confounded data can sometimes yield decisions as good as or better than those based on costly, unconfounded data. This can occur when larger effects are more likely to be overestimated or when the benefits of larger, cheaper data sets outweigh the drawbacks of confounding. We validate the theoretical findings using benchmark data from the 2016 Atlantic Causal Inference Conference causal modeling competition, encompassing 77 scenarios and 7,700 data sets. We then introduce theoretical conditions, weaker than ignorability, that characterize when confounding preserves effect rankings. These conditions allow for empirical heuristic tests to assess whether observational data aligns with this structure. Finally, we apply our findings in a large-scale case study using advertising data, demonstrating how these insights can guide decision making in practice.

Suggested Citation

  • Carlos Fernández-Loría & Foster Provost, 2025. "Observational vs. Experimental Data When Making Automated Decisions Using Machine Learning," INFORMS Joural on Data Science, INFORMS, vol. 4(3), pages 197-229, July.
  • Handle: RePEc:inm:orijds:v:4:y:2025:i:3:p:197-229
    DOI: 10.1287/ijds.2023.0012
    as

    Download full text from publisher

    File URL: http://dx.doi.org/10.1287/ijds.2023.0012
    Download Restriction: no

    File URL: https://libkey.io/10.1287/ijds.2023.0012?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Yingqi Zhao & Donglin Zeng & A. John Rush & Michael R. Kosorok, 2012. "Estimating Individualized Treatment Rules Using Outcome Weighted Learning," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 107(499), pages 1106-1118, September.
    2. Duncan Simester & Artem Timoshenko & Spyros I. Zoumpoulis, 2020. "Efficiently Evaluating Targeting Policies: Improving on Champion vs. Challenger Experiments," Management Science, INFORMS, vol. 66(8), pages 3412-3424, August.
    3. Stefan Wager & Susan Athey, 2018. "Estimation and Inference of Heterogeneous Treatment Effects using Random Forests," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 113(523), pages 1228-1242, July.
    4. Emre M. Demirezen & Subodha Kumar, 2016. "Optimization of Recommender Systems Based on Inventory," Production and Operations Management, Production and Operations Management Society, vol. 25(4), pages 593-608, April.
    5. Qi Feng & Sirong Luo & Dan Zhang, 2014. "Dynamic Inventory–Pricing Control Under Backorder: Demand Estimation and Policy Optimization," Manufacturing & Service Operations Management, INFORMS, vol. 16(1), pages 149-160, February.
    6. Train,Kenneth E., 2009. "Discrete Choice Methods with Simulation," Cambridge Books, Cambridge University Press, number 9780521766555, Enero-Abr.
    7. Brett R. Gordon & Florian Zettelmeyer & Neha Bhargava & Dan Chapsky, 2019. "A Comparison of Approaches to Advertising Measurement: Evidence from Big Field Experiments at Facebook," Marketing Science, INFORMS, vol. 38(2), pages 193-225, March.
    8. Marco Morucci & Md. Noor-E-Alam & Cynthia Rudin, 2022. "A Robust Approach to Quantifying Uncertainty in Matching Problems of Causal Inference," INFORMS Joural on Data Science, INFORMS, vol. 1(2), pages 156-171, October.
    9. Keisuke Hirano & Jack R. Porter, 2009. "Asymptotics for Statistical Treatment Rules," Econometrica, Econometric Society, vol. 77(5), pages 1683-1701, September.
    10. Bhattacharya, Debopam & Dupas, Pascaline, 2012. "Inferring welfare maximizing treatment assignment under budget constraints," Journal of Econometrics, Elsevier, vol. 167(1), pages 168-196.
    11. Verbeke, Wouter & Olaya, Diego & Guerry, Marie-Anne & Van Belle, Jente, 2023. "To do or not to do? Cost-sensitive causal classification with individual treatment effect estimates," European Journal of Operational Research, Elsevier, vol. 305(2), pages 838-852.
    12. Evan T.R. Rosenman & Guillaume Basse & Art B. Owen & Mike Baiocchi, 2023. "Combining observational and experimental datasets using shrinkage estimators," Biometrics, The International Biometric Society, vol. 79(4), pages 2961-2973, December.
    13. Susan Athey & Stefan Wager, 2021. "Policy Learning With Observational Data," Econometrica, Econometric Society, vol. 89(1), pages 133-161, January.
    14. Charles F. Manski, 2004. "Statistical Treatment Rules for Heterogeneous Populations," Econometrica, Econometric Society, vol. 72(4), pages 1221-1246, July.
    15. Kris Johnson Ferreira & Bin Hong Alex Lee & David Simchi-Levi, 2016. "Analytics for an Online Retailer: Demand Forecasting and Price Optimization," Manufacturing & Service Operations Management, INFORMS, vol. 18(1), pages 69-88, February.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Carlos Fernández-Loría & Foster Provost, 2022. "Causal Decision Making and Causal Effect Estimation Are Not the Same…and Why It Matters," INFORMS Joural on Data Science, INFORMS, vol. 1(1), pages 4-16, April.
    2. Nan Liu & Yanbo Liu & Yuya Sasaki & Yuanyuan Wan, 2025. "Nonparametric Uniform Inference in Binary Classification and Policy Values," Working Papers tecipa-811, University of Toronto, Department of Economics.
    3. Carlos Fernández-Loría & Foster Provost & Jesse Anderton & Benjamin Carterette & Praveen Chandar, 2023. "A Comparison of Methods for Treatment Assignment with an Application to Playlist Generation," Information Systems Research, INFORMS, vol. 34(2), pages 786-803, June.
    4. Yan Liu, 2022. "Policy Learning under Endogeneity Using Instrumental Variables," Papers 2206.09883, arXiv.org, revised Jan 2026.
    5. Artem Timoshenko & Caio Waisman, 2025. "Profit-Aligned CATE Estimation: Reconciling Policy Learning and Inference," Papers 2512.13400, arXiv.org, revised Apr 2026.
    6. Achim Ahrens & Alessandra Stampi‐Bombelli & Selina Kurer & Dominik Hangartner, 2024. "Optimal multi‐action treatment allocation: A two‐phase field experiment to boost immigrant naturalization," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 39(7), pages 1379-1395, November.
    7. Augustine Denteh & Helge Liebert, 2022. "Who Increases Emergency Department Use? New Insights from the Oregon Health Insurance Experiment," CESifo Working Paper Series 9664, CESifo.
    8. Toru Kitagawa & Guanyi Wang, 2021. "Who should get vaccinated? Individualized allocation of vaccines over SIR network," CeMMAP working papers CWP28/21, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    9. Kitagawa, Toru & Wang, Guanyi, 2023. "Who should get vaccinated? Individualized allocation of vaccines over SIR network," Journal of Econometrics, Elsevier, vol. 232(1), pages 109-131.
    10. Daniel F. Pellatt, 2022. "PAC-Bayesian Treatment Allocation Under Budget Constraints," Papers 2212.09007, arXiv.org, revised Jun 2023.
    11. Raphael Langevin, 2026. "Policy Learning with Observational Data: The Case of Hepatitis C Treatment for HIV/HCV Co-Infected Patients," Papers 2605.16593, arXiv.org.
    12. Samuel D. Higbee, 2025. "Policy learning with new treatments," Quantitative Economics, Econometric Society, vol. 16(4), pages 1409-1456, November.
    13. Christopher Adjaho & Timothy Christensen, 2022. "Externally Valid Policy Choice," Papers 2205.05561, arXiv.org, revised Nov 2025.
    14. Justin Whitehouse & Qizhao Chen & Morgane Austern & Vasilis Syrgkanis, 2025. "Inference on Optimal Policy Values and Other Irregular Functionals via Softmax Smoothing," Papers 2507.11780, arXiv.org, revised Mar 2026.
    15. Eric Mbakop & Max Tabord‐Meehan, 2021. "Model Selection for Treatment Choice: Penalized Welfare Maximization," Econometrica, Econometric Society, vol. 89(2), pages 825-848, March.
    16. Garbero, Alessandra & Sakos, Grayson & Cerulli, Giovanni, 2023. "Towards data-driven project design: Providing optimal treatment rules for development projects," Socio-Economic Planning Sciences, Elsevier, vol. 89(C).
    17. Carlos Fern'andez-Lor'ia & Foster Provost & Jesse Anderton & Benjamin Carterette & Praveen Chandar, 2020. "A Comparison of Methods for Treatment Assignment with an Application to Playlist Generation," Papers 2004.11532, arXiv.org, revised Apr 2022.
    18. Giovanni Cerulli, 2020. "Optimal Policy Learning: From Theory to Practice," Papers 2011.04993, arXiv.org.
    19. Shosei Sakaguchi, 2025. "Estimation of optimal dynamic treatment assignment rules under policy constraints," Quantitative Economics, Econometric Society, vol. 16(3), pages 981-1022, July.
    20. Susan Athey & Stefan Wager, 2021. "Policy Learning With Observational Data," Econometrica, Econometric Society, vol. 89(1), pages 133-161, January.

    More about this item

    Keywords

    ;
    ;
    ;
    ;
    ;
    ;

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:inm:orijds:v:4:y:2025:i:3:p:197-229. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Asher (email available below). General contact details of provider: https://edirc.repec.org/data/inforea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.