IDEAS home Printed from https://ideas.repec.org/a/inm/ormnsc/v68y2022i9p6762-6782.html
   My bibliography  Save this article

False Discovery in A/B Testing

Author

Listed:
  • Ron Berman

    (Marketing, The Wharton School of the University of Pennsylvania, Philadelphia, Pennsylvania 19104)

  • Christophe Van den Bulte

    (Marketing, The Wharton School of the University of Pennsylvania, Philadelphia, Pennsylvania 19104)

Abstract

We investigate what fraction of all significant results in website A/B testing is actually null effects (i.e., the false discovery rate (FDR)). Our data consist of 4,964 effects from 2,766 experiments conducted on a commercial A/B testing platform. Using three different methods, we find that the FDR ranges between 28% and 37% for tests conducted at 10% significance and between 18% and 25% for tests at 5% significance (two sided). These high FDRs stem mostly from the high fraction of true null effects, about 70%, rather than from low power. Using our estimates, we also assess the potential of various A/B test designs to reduce the FDR. The two main implications are that decision makers should expect one in five interventions achieving significance at 5% confidence to be ineffective when deployed in the field and that analysts should consider using two-stage designs with multiple variations rather than basic A/B tests.

Suggested Citation

  • Ron Berman & Christophe Van den Bulte, 2022. "False Discovery in A/B Testing," Management Science, INFORMS, vol. 68(9), pages 6762-6782, September.
  • Handle: RePEc:inm:ormnsc:v:68:y:2022:i:9:p:6762-6782
    DOI: 10.1287/mnsc.2021.4207
    as

    Download full text from publisher

    File URL: http://dx.doi.org/10.1287/mnsc.2021.4207
    Download Restriction: no

    File URL: https://libkey.io/10.1287/mnsc.2021.4207?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Brett R. Gordon & Florian Zettelmeyer & Neha Bhargava & Dan Chapsky, 2019. "A Comparison of Approaches to Advertising Measurement: Evidence from Big Field Experiments at Facebook," Marketing Science, INFORMS, vol. 38(2), pages 193-225, March.
    2. Daniel J. Benjamin & James O. Berger & Magnus Johannesson & Brian A. Nosek & E.-J. Wagenmakers & Richard Berk & Kenneth A. Bollen & Björn Brembs & Lawrence Brown & Colin Camerer & David Cesarini & Chr, 2018. "Redefine statistical significance," Nature Human Behaviour, Nature, vol. 2(1), pages 6-10, January.
      • Daniel Benjamin & James Berger & Magnus Johannesson & Brian Nosek & E. Wagenmakers & Richard Berk & Kenneth Bollen & Bjorn Brembs & Lawrence Brown & Colin Camerer & David Cesarini & Christopher Chambe, 2017. "Redefine Statistical Significance," Artefactual Field Experiments 00612, The Field Experiments Website.
    3. Sanat K. Sarkar & Jingjing Chen & Wenge Guo, 2013. "Multiple Testing in a Two-Stage Adaptive Design With Combination Tests Controlling FDR," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 108(504), pages 1385-1401, December.
    4. Abel Brodeur & Nikolai Cook & Anthony Heyes, 2020. "Methods Matter: p-Hacking and Publication Bias in Causal Analysis in Economics," American Economic Review, American Economic Association, vol. 110(11), pages 3634-3660, November.
    5. Thomas Blake & Chris Nosko & Steven Tadelis, 2015. "Consumer Heterogeneity and Paid Search Effectiveness: A Large‐Scale Field Experiment," Econometrica, Econometric Society, vol. 83, pages 155-174, January.
    6. Stoye, Jörg, 2009. "Minimax regret treatment choice with finite samples," Journal of Econometrics, Elsevier, vol. 151(1), pages 70-81, July.
    7. Prabhakant Sinha & Andris A. Zoltners, 2001. "Sales-Force Decision Models: Insights from 25 Years of Implementation," Interfaces, INFORMS, vol. 31(3_supplem), pages 8-44, June.
    8. Eduardo M. Azevedo & Alex Deng & José Luis Montiel Olea & Justin Rao & E. Glen Weyl, 2020. "A/B Testing with Fat Tails," Journal of Political Economy, University of Chicago Press, vol. 128(12), pages 4614-4000.
    9. Zacharias Maniadis & Fabio Tufano & John A. List, 2014. "One Swallow Doesn't Make a Summer: New Evidence on Anchoring Effects," American Economic Review, American Economic Association, vol. 104(1), pages 277-290, January.
    10. James G. Scott & Ryan C. Kelly & Matthew A. Smith & Pengcheng Zhou & Robert E. Kass, 2015. "False Discovery Rate Regression: An Application to Neural Synchrony Detection in Primary Visual Cortex," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 110(510), pages 459-471, June.
    11. Michael L. Anderson & Jeremy Magruder, 2017. "Split-Sample Strategies for Avoiding False Discoveries," NBER Working Papers 23544, National Bureau of Economic Research, Inc.
    12. John D. Storey, 2002. "A direct approach to false discovery rates," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 64(3), pages 479-498, August.
    13. Randall A. Lewis & Justin M. Rao, 2015. "The Unfavorable Economics of Measuring the Returns to Advertising," The Quarterly Journal of Economics, President and Fellows of Harvard College, vol. 130(4), pages 1941-1973.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Shan Huang & Chen Wang & Yuan Yuan & Jinglong Zhao & Jingjing Zhang, 2023. "Estimating Effects of Long-Term Treatments," Papers 2308.08152, arXiv.org.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Susan Athey & Kristen Grabarz & Michael Luca & Nils Wernerfelt, 2023. "Digital public health interventions at scale: The impact of social media advertising on beliefs and outcomes related to COVID vaccines," Proceedings of the National Academy of Sciences, Proceedings of the National Academy of Sciences, vol. 120(5), pages 2208110120-, January.
    2. Garrett Johnson & Julian Runge & Eric Seufert, 2022. "Privacy-Centric Digital Advertising: Implications for Research," Customer Needs and Solutions, Springer;Institute for Sustainable Innovation and Growth (iSIG), vol. 9(1), pages 49-54, June.
    3. Weijia Dai & Hyunjin Kim & Michael Luca, 2023. "Frontiers: Which Firms Gain from Digital Advertising? Evidence from a Field Experiment," Marketing Science, INFORMS, vol. 42(3), pages 429-439, May.
    4. Brett R Gordon & Kinshuk Jerath & Zsolt Katona & Sridhar Narayanan & Jiwoong Shin & Kenneth C Wilbur, 2019. "Inefficiencies in Digital Advertising Markets," Papers 1912.09012, arXiv.org, revised Feb 2020.
    5. Berman, Ron & Heller, Yuval, 2020. "Naive Analytics Equilibrium," MPRA Paper 103824, University Library of Munich, Germany.
    6. George Z. Gui, 2020. "Combining Observational and Experimental Data to Improve Efficiency Using Imperfect Instruments," Papers 2010.05117, arXiv.org, revised Dec 2023.
    7. Susan Athey & Kristen Grabarz & Michael Luca & Nils Wernerfelt, 2022. "The Effectiveness of Digital Interventions on COVID-19 Attitudes and Beliefs," Papers 2206.10214, arXiv.org.
    8. Thomas W. Frick & Rodrigo Belo & Rahul Telang, 2023. "Incentive Misalignments in Programmatic Advertising: Evidence from a Randomized Field Experiment," Management Science, INFORMS, vol. 69(3), pages 1665-1686, March.
    9. Bradley T. Shapiro, 2020. "Advertising in Health Insurance Markets," Marketing Science, INFORMS, vol. 39(3), pages 587-611, May.
    10. Randall Lewis & Dan Nguyen, 2015. "Display advertising’s competitive spillovers to consumer search," Quantitative Marketing and Economics (QME), Springer, vol. 13(2), pages 93-115, June.
    11. Jacob LaRiviere & Mikolaj Czajkowski & Nick Hanley & Katherine Simpson, 2016. "What is the Causal Impact of Knowledge on Preferences in Stated Preference Studies?," Working Papers 2016-12, Faculty of Economic Sciences, University of Warsaw.
    12. Colin F. Camerer & Anna Dreber & Felix Holzmeister & Teck-Hua Ho & Jürgen Huber & Magnus Johannesson & Michael Kirchler & Gideon Nave & Brian A. Nosek & Thomas Pfeiffer & Adam Altmejd & Nick Buttrick , 2018. "Evaluating the replicability of social science experiments in Nature and Science between 2010 and 2015," Nature Human Behaviour, Nature, vol. 2(9), pages 637-644, September.
    13. Tobias Regner, 2021. "Crowdfunding a monthly income: an analysis of the membership platform Patreon," Journal of Cultural Economics, Springer;The Association for Cultural Economics International, vol. 45(1), pages 133-142, March.
    14. Dreber, Anna & Johannesson, Magnus, 2023. "A framework for evaluating reproducibility and replicability in economics," Ruhr Economic Papers 1055, RWI - Leibniz-Institut für Wirtschaftsforschung, Ruhr-University Bochum, TU Dortmund University, University of Duisburg-Essen.
    15. Kirthi Kalyanam & John McAteer & Jonathan Marek & James Hodges & Lifeng Lin, 2018. "Cross channel effects of search engine advertising on brick & mortar retail sales: Meta analysis of large scale field experiments on Google.com," Quantitative Marketing and Economics (QME), Springer, vol. 16(1), pages 1-42, March.
    16. Uddin, Main & Wang, Liang Choon & Smyth, Russell, 2021. "Do government-initiated energy comparison sites encourage consumer search and lower prices? Evidence from an online randomized controlled experiment in Australia," Journal of Economic Behavior & Organization, Elsevier, vol. 188(C), pages 167-182.
    17. Maurizio Canavari & Andreas C. Drichoutis & Jayson L. Lusk & Rodolfo M. Nayga, Jr., 2018. "How to run an experimental auction: A review of recent advances," Working Papers 2018-5, Agricultural University of Athens, Department Of Agricultural Economics.
    18. Azevedo, Eduardo M. & Mao, David & Montiel Olea, José Luis & Velez, Amilcar, 2023. "The A/B testing problem with Gaussian priors," Journal of Economic Theory, Elsevier, vol. 210(C).
    19. George Gui & Harikesh Nair & Fengshi Niu, 2021. "Auction Throttling and Causal Inference of Online Advertising Effects," Papers 2112.15155, arXiv.org, revised Feb 2022.
    20. Naoki Aizawa & You Suk Kim, 2020. "Public and Private Provision of Information in Market-Based Public Programs: Evidence from Advertising in Health Insurance Marketplaces," NBER Working Papers 27695, National Bureau of Economic Research, Inc.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:inm:ormnsc:v:68:y:2022:i:9:p:6762-6782. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Asher (email available below). General contact details of provider: https://edirc.repec.org/data/inforea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.