Author
Listed:
- Alex P. Miller
(Marshall School of Business, University of Southern California, Los Angeles, California 90089)
- Kartik Hosanagar
(The Wharton School, University of Pennsylvania, Philadelphia, Pennsylvania 19104)
Abstract
In recent years, randomized experiments (or “A/B tests”) have become commonplace in many industrial settings as managers increasingly seek the aid of scientific rigor in their decision making. However, just as this practice has proliferated among firms, the problem of p -hacking—whereby experimenters adjust their sample size or try several statistical analyses until they find one that produces a statistically significant p -value—has emerged as a prevalent concern in the scientific community. Notably, many commentators have highlighted how A/B testing software enables and may even encourage p -hacking behavior. To investigate this phenomenon, we analyze the prevalence of p -hacking in a primary sample of 2,270 experiments conducted by 242 firms on a large U.S.-based e-commerce A/B testing platform. Using multiple statistical techniques—including a novel approach we call the asymmetric caliper test —we analyze the p -values corresponding to each experiment’s designated target metric across multiple significance thresholds. Our findings reveal essentially no evidence for p -hacking in our data. In an extended sample that examines p -hacking across all outcome metrics (encompassing more than 16,000 p -values in total), we similarly observe no evidence of p -hacking behavior. We use simulations to determine that if a modest effect of p -hacking were present in our data set, our methods would have the power to detect it at our current sample size. We contrast our results with the prevalence of p -hacking in academic contexts and discuss a number of possible factors explaining the divergent results, highlighting the potential roles of organizational learning and economic incentives.
Suggested Citation
Alex P. Miller & Kartik Hosanagar, 2025.
"An Investigation of p -Hacking in E-Commerce A/B Testing,"
Information Systems Research, INFORMS, vol. 36(3), pages 1691-1717, September.
Handle:
RePEc:inm:orisre:v:36:y:2025:i:3:p:1691-1717
DOI: 10.1287/isre.2024.0872
Download full text from publisher
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:inm:orisre:v:36:y:2025:i:3:p:1691-1717. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Asher (email available below). General contact details of provider: https://edirc.repec.org/data/inforea.html .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.