False Discovery in A/B Testing

My bibliography Save this article

False Discovery in A/B Testing

Author

Listed:

Ron Berman
(Marketing, The Wharton School of the University of Pennsylvania, Philadelphia, Pennsylvania 19104)
Christophe Van den Bulte
(Marketing, The Wharton School of the University of Pennsylvania, Philadelphia, Pennsylvania 19104)

Registered:

Abstract

We investigate what fraction of all significant results in website A/B testing is actually null effects (i.e., the false discovery rate (FDR)). Our data consist of 4,964 effects from 2,766 experiments conducted on a commercial A/B testing platform. Using three different methods, we find that the FDR ranges between 28% and 37% for tests conducted at 10% significance and between 18% and 25% for tests at 5% significance (two sided). These high FDRs stem mostly from the high fraction of true null effects, about 70%, rather than from low power. Using our estimates, we also assess the potential of various A/B test designs to reduce the FDR. The two main implications are that decision makers should expect one in five interventions achieving significance at 5% confidence to be ineffective when deployed in the field and that analysts should consider using two-stage designs with multiple variations rather than basic A/B tests.

Suggested Citation

Ron Berman & Christophe Van den Bulte, 2022. "False Discovery in A/B Testing," Management Science, INFORMS, vol. 68(9), pages 6762-6782, September.

Handle: RePEc:inm:ormnsc:v:68:y:2022:i:9:p:6762-6782
DOI: 10.1287/mnsc.2021.4207

Download full text from publisher

References listed on IDEAS

Nikhil Bhat & Vivek F. Farias & Ciamac C. Moallemi & Deeksha Sinha, 2020. "Near-Optimal A-B Testing," Management Science, INFORMS, vol. 66(10), pages 4477-4495, October.
Stoye, Jörg, 2009. "Minimax regret treatment choice with finite samples," Journal of Econometrics, Elsevier, vol. 151(1), pages 70-81, July.
Brett R. Gordon & Florian Zettelmeyer & Neha Bhargava & Dan Chapsky, 2019. "A Comparison of Approaches to Advertising Measurement: Evidence from Big Field Experiments at Facebook," Marketing Science, INFORMS, vol. 38(2), pages 193-225, March.
Daniel J. Benjamin & James O. Berger & Magnus Johannesson & Brian A. Nosek & E.-J. Wagenmakers & Richard Berk & Kenneth A. Bollen & Björn Brembs & Lawrence Brown & Colin Camerer & David Cesarini & Chr, 2018. "Redefine statistical significance," Nature Human Behaviour, Nature, vol. 2(1), pages 6-10, January.
- Daniel Benjamin & James Berger & Magnus Johannesson & Brian Nosek & E. Wagenmakers & Richard Berk & Kenneth Bollen & Bjorn Brembs & Lawrence Brown & Colin Camerer & David Cesarini & Christopher Chambe, 2017. "Redefine Statistical Significance," Artefactual Field Experiments 00612, The Field Experiments Website.
Sanat K. Sarkar & Jingjing Chen & Wenge Guo, 2013. "Multiple Testing in a Two-Stage Adaptive Design With Combination Tests Controlling FDR," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 108(504), pages 1385-1401, December.
Abel Brodeur & Nikolai Cook & Anthony Heyes, 2020. "Methods Matter: p-Hacking and Publication Bias in Causal Analysis in Economics," American Economic Review, American Economic Association, vol. 110(11), pages 3634-3660, November.
Thomas Blake & Chris Nosko & Steven Tadelis, 2015. "Consumer Heterogeneity and Paid Search Effectiveness: A Large‐Scale Field Experiment," Econometrica, Econometric Society, vol. 83, pages 155-174, January.
- Tom Blake & Chris Nosko & Steven Tadelis, 2014. "Consumer Heterogeneity and Paid Search Effectiveness: A Large Scale Field Experiment," NBER Working Papers 20171, National Bureau of Economic Research, Inc.
Prabhakant Sinha & Andris A. Zoltners, 2001. "Sales-Force Decision Models: Insights from 25 Years of Implementation," Interfaces, INFORMS, vol. 31(3_supplem), pages 8-44, June.
Eduardo M. Azevedo & Alex Deng & José Luis Montiel Olea & Justin Rao & E. Glen Weyl, 2020. "A/B Testing with Fat Tails," Journal of Political Economy, University of Chicago Press, vol. 128(12), pages 4614-4000.
Zacharias Maniadis & Fabio Tufano & John A. List, 2014. "One Swallow Doesn't Make a Summer: New Evidence on Anchoring Effects," American Economic Review, American Economic Association, vol. 104(1), pages 277-290, January.
- Zacharias Maniadis & Fabio Tufano & John A List, 2013. "One Swallow Doesn't Make a Summer: New Evidence on Anchoring Effects," Discussion Papers 2013-07, The Centre for Decision Research and Experimental Economics, School of Economics, University of Nottingham.
- Zacharias Maniadis & Fabio Tufano & John List, 2013. "One Swallow Does not Make a Summer: New Evidence on Anchoring Effects," Levine's Working Paper Archive 786969000000000824, David K. Levine.
James G. Scott & Ryan C. Kelly & Matthew A. Smith & Pengcheng Zhou & Robert E. Kass, 2015. "False Discovery Rate Regression: An Application to Neural Synchrony Detection in Primary Visual Cortex," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 110(510), pages 459-471, June.
Michael L. Anderson & Jeremy Magruder, 2017. "Split-Sample Strategies for Avoiding False Discoveries," NBER Working Papers 23544, National Bureau of Economic Research, Inc.
John D. Storey, 2002. "A direct approach to false discovery rates," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 64(3), pages 479-498, August.
Randall A. Lewis & Justin M. Rao, 2015. "The Unfavorable Economics of Measuring the Returns to Advertising," The Quarterly Journal of Economics, President and Fellows of Harvard College, vol. 130(4), pages 1941-1973.

Full references (including those not matched with items on IDEAS)

Citations

Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.

Cited by:

Deer, Lachlan & Adler, Susanne J. & Datta, Hannes & Mizik, Natalie & Sarstedt, Marko, 2025. "Toward open science in marketing research," International Journal of Research in Marketing, Elsevier, vol. 42(1), pages 212-233.
Ali Goli & Jason Huang & David Reiley & Nickolai M. Riabov, 2024. "Measuring Consumer Sensitivity to Audio Advertising: A Long-Run Field Experiment on Pandora Internet Radio," Papers 2412.05516, arXiv.org.
Anya Shchetkina & Ron Berman, 2024. "When Is Heterogeneity Actionable for Personalization?," Papers 2411.16552, arXiv.org.
Shan Huang & Chen Wang & Yuan Yuan & Jinglong Zhao & Brocco & Zhang, 2023. "Estimating Effects of Long-Term Treatments," Papers 2308.08152, arXiv.org, revised Dec 2024.

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

George Z. Gui, 2024. "Combining Observational and Experimental Data to Improve Efficiency Using Imperfect Instruments," Marketing Science, INFORMS, vol. 43(2), pages 378-391, March.
Anna Dreber & Magnus Johannesson & Yifan Yang, 2024. "Selective reporting of placebo tests in top economics journals," Economic Inquiry, Western Economic Association International, vol. 62(3), pages 921-932, July.
- Dreber, Anna & Johannesson, Magnus & Yang, Yifan, 2023. "Selective Reporting of Placebo Tests in Top Economics Journals," I4R Discussion Paper Series 31, The Institute for Replication (I4R).
Susan Athey & Kristen Grabarz & Michael Luca & Nils Wernerfelt, 2023. "Digital public health interventions at scale: The impact of social media advertising on beliefs and outcomes related to COVID vaccines," Proceedings of the National Academy of Sciences, Proceedings of the National Academy of Sciences, vol. 120(5), pages 2208110120-, January.
- Susan Athey & Kristen Grabarz & Michael Luca & Nils C. Wernerfelt, 2022. "Digital Public Health Interventions at Scale: The Impact of Social Media Advertising on Beliefs and Outcomes Related to COVID Vaccines," NBER Working Papers 30273, National Bureau of Economic Research, Inc.
Garrett Johnson & Julian Runge & Eric Seufert, 2022. "Privacy-Centric Digital Advertising: Implications for Research," Customer Needs and Solutions, Springer;Institute for Sustainable Innovation and Growth (iSIG), vol. 9(1), pages 49-54, June.
Weijia Dai & Hyunjin Kim & Michael Luca, 2023. "Frontiers: Which Firms Gain from Digital Advertising? Evidence from a Field Experiment," Marketing Science, INFORMS, vol. 42(3), pages 429-439, May.
Brett R Gordon & Kinshuk Jerath & Zsolt Katona & Sridhar Narayanan & Jiwoong Shin & Kenneth C Wilbur, 2019. "Inefficiencies in Digital Advertising Markets," Papers 1912.09012, arXiv.org, revised Feb 2020.
Berman, Ron & Heller, Yuval, 2020. "Naive Analytics Equilibrium," MPRA Paper 103824, University Library of Munich, Germany.
- Ron Berman & Yuval Heller, 2020. "Naive analytics equilibrium," Papers 2010.15810, arXiv.org, revised Apr 2021.
George Z. Gui, 2020. "Combining Observational and Experimental Data to Improve Efficiency Using Imperfect Instruments," Papers 2010.05117, arXiv.org, revised Dec 2023.
Susan Athey & Kristen Grabarz & Michael Luca & Nils Wernerfelt, 2022. "The Effectiveness of Digital Interventions on COVID-19 Attitudes and Beliefs," Papers 2206.10214, arXiv.org.
Thomas W. Frick & Rodrigo Belo & Rahul Telang, 2023. "Incentive Misalignments in Programmatic Advertising: Evidence from a Randomized Field Experiment," Management Science, INFORMS, vol. 69(3), pages 1665-1686, March.
Bradley T. Shapiro, 2020. "Advertising in Health Insurance Markets," Marketing Science, INFORMS, vol. 39(3), pages 587-611, May.
Randall Lewis & Dan Nguyen, 2015. "Display advertising’s competitive spillovers to consumer search," Quantitative Marketing and Economics (QME), Springer, vol. 13(2), pages 93-115, June.
- Randall Lewis & Dan Nguyen, 2015. "Display advertising’s competitive spillovers to consumer search," Quantitative Marketing and Economics (QME), Springer, vol. 13(2), pages 93-115, June.
Jacob LaRiviere & Mikolaj Czajkowski & Nick Hanley & Katherine Simpson, 2016. "What is the Causal Impact of Knowledge on Preferences in Stated Preference Studies?," Working Papers 2016-12, Faculty of Economic Sciences, University of Warsaw.
- Nick Hanley & Mikolaj Czajkowski, 2016. "What is the Causal Impact of Knowledge on Preferences in Stated Preference Studies?," Discussion Papers in Environment and Development Economics 2016-09, University of St. Andrews, School of Geography and Sustainable Development.
Gonzalez-Jimenez, David & Capozza, Francesco & Dirkmaat, Thomas & van de Veer, Evelien & van Druten, Amber & Baillon, Aurélien, 2025. "Falling and failing (to learn): Evidence from a nation-wide cybersecurity field experiment with SMEs," Journal of Economic Behavior & Organization, Elsevier, vol. 230(C).
Jinglong Zhao, 2024. "Experimental Design For Causal Inference Through An Optimization Lens," Papers 2408.09607, arXiv.org, revised Aug 2024.
Colin F. Camerer & Anna Dreber & Felix Holzmeister & Teck-Hua Ho & Jürgen Huber & Magnus Johannesson & Michael Kirchler & Gideon Nave & Brian A. Nosek & Thomas Pfeiffer & Adam Altmejd & Nick Buttrick , 2018. "Evaluating the replicability of social science experiments in Nature and Science between 2010 and 2015," Nature Human Behaviour, Nature, vol. 2(9), pages 637-644, September.
- Camerer, Colin & Dreber, Anna & Holzmeister, Felix & Ho, Teck Hua & Huber, Juergen & Johannesson, Magnus & Kirchler, Michael & Nave, Gideon & Nosek, Brian A. & Pfeiffer, Thomas, 2018. "Evaluating the replicability of social science experiments in Nature and Science between 2010 and 2015," SocArXiv 4hmb6, Center for Open Science.
- Camerer, Colin F. & Dreber, Anna & Holzmeister, Felix & Ho, Teck-Hua & Huber, Jürgen & Johannesson, Magnus & Kirchler, Michael & Nave, Gideon & Nosek, Brian A. & Pfeiffer, Thomas & Altmejd, Adam & But, 2018. "Evaluating the replicability of social science experiments in Nature and Science between 2010 and 2015," Munich Reprints in Economics 62818, University of Munich, Department of Economics.
Tobias Regner, 2021. "Crowdfunding a monthly income: an analysis of the membership platform Patreon," Journal of Cultural Economics, Springer;The Association for Cultural Economics International, vol. 45(1), pages 133-142, March.
- Tobias Regner, 2019. "Crowdfunding a monthly income: an analysis of the membership platform Patreon," Jena Economics Research Papers 2019-010, Friedrich-Schiller-University Jena.
Cawley, John & Eddelbuettel, Julia & Cunningham, Scott & Eisenberg, Matthew D. & Mathios, Alan D. & Avery, Rosemary J., 2025. "The role of repugnance in markets: How the Jared Fogle scandal affected patronage of subway," Journal of Economic Behavior & Organization, Elsevier, vol. 229(C).
- John Cawley & Julia Eddelbuettel & Scott Cunningham & Matthew D. Eisenberg & Alan D. Mathios & Rosemary J. Avery, 2023. "The Role of Repugnance in Markets: How the Jared Fogle Scandal Affected Patronage of Subway," NBER Working Papers 31782, National Bureau of Economic Research, Inc.
Yuchen Hu & Henry Zhu & Emma Brunskill & Stefan Wager, 2024. "Minimax-Regret Sample Selection in Randomized Experiments," Papers 2403.01386, arXiv.org, revised Jun 2024.
Anna Dreber & Magnus Johannesson, 2025. "A framework for evaluating reproducibility and replicability in economics," Economic Inquiry, Western Economic Association International, vol. 63(2), pages 338-356, April.
- Dreber, Anna & Johannesson, Magnus, 2023. "A framework for evaluating reproducibility and replicability in economics," Ruhr Economic Papers 1055, RWI - Leibniz-Institut für Wirtschaftsforschung, Ruhr-University Bochum, TU Dortmund University, University of Duisburg-Essen.
- Dreber, Anna & Johannesson, Magnus, 2023. "A framework for evaluating reproducibility and replicability in economics," I4R Discussion Paper Series 38, The Institute for Replication (I4R).

More about this item

Keywords

; ; ; ; ; ;

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:inm:ormnsc:v:68:y:2022:i:9:p:6762-6782. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Asher (email available below). General contact details of provider: https://edirc.repec.org/data/inforea.html .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

False Discovery in A/B Testing

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Citations

Most related items

More about this item

Keywords

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data