IDEAS home Printed from https://ideas.repec.org/p/zbw/i4rdps/8.html
   My bibliography  Save this paper

We Need to Talk about Mechanical Turk: What 22,989 Hypothesis Tests Tell us about p-Hacking and Publication Bias in Online Experiments

Author

Listed:
  • Brodeur, Abel
  • Cook, Nikolai
  • Heyes, Anthony

Abstract

Amazon's Mechanical Turk is a very widely-used tool in business and economics research, but how trustworthy are results from well-published studies that use it? Analyzing the universe of hypotheses tested on the platform and published in leading journals between 2010 and 2020 we find evidence of widespread p-hacking, publication bias and over-reliance on results from plausibly under-powered studies. Even ignoring questions arising from the characteristics and behaviors of study recruits, the conduct of the research community itself erodes substantially the credibility of these studies' conclusions. The extent of the problems vary across the business, economics, management and marketing research fields (with marketing especially afflicted). The problems are not getting better over time and are much more prevalent than in a comparison set of non-online experiments. We explore correlates of increased credibility.

Suggested Citation

  • Brodeur, Abel & Cook, Nikolai & Heyes, Anthony, 2022. "We Need to Talk about Mechanical Turk: What 22,989 Hypothesis Tests Tell us about p-Hacking and Publication Bias in Online Experiments," I4R Discussion Paper Series 8, The Institute for Replication (I4R).
  • Handle: RePEc:zbw:i4rdps:8
    as

    Download full text from publisher

    File URL: https://www.econstor.eu/bitstream/10419/266266/1/I4R-DP008.pdf
    Download Restriction: no
    ---><---

    Other versions of this item:

    References listed on IDEAS

    as
    1. Stephen T. Ziliak & Deirdre N. McCloskey, 2004. "Size Matters: The Standard Error of Regressions in the American Economic Review," Econ Journal Watch, Econ Journal Watch, vol. 1(2), pages 331-358, August.
    2. Matias D. Cattaneo & Michael Jansson & Xinwei Ma, 2020. "Simple Local Polynomial Density Estimators," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 115(531), pages 1449-1455, July.
    3. John Horton & David Rand & Richard Zeckhauser, 2011. "The online laboratory: conducting experiments in a real labor market," Experimental Economics, Springer;Economic Science Association, vol. 14(3), pages 399-425, September.
    4. Camerer, Colin F & Hogarth, Robin M, 1999. "The Effects of Financial Incentives in Experiments: A Review and Capital-Labor-Production Framework," Journal of Risk and Uncertainty, Springer, vol. 19(1-3), pages 7-42, December.
    5. Ben Gillen & Erik Snowberg & Leeat Yariv, 2019. "Experimenting with Measurement Error: Techniques with Applications to the Caltech Cohort Study," Journal of Political Economy, University of Chicago Press, vol. 127(4), pages 1826-1863.
    6. Stefano DellaVigna & Elizabeth Linos, 2022. "RCTs to Scale: Comprehensive Evidence From Two Nudge Units," Econometrica, Econometric Society, vol. 90(1), pages 81-116, January.
    7. Steven D. Levitt & John A. List, 2007. "Viewpoint: On the generalizability of lab behaviour to the field," Canadian Journal of Economics, Canadian Economics Association, vol. 40(2), pages 347-370, May.
    8. Francesco Guala & Luigi Mittone, 2005. "Experiments in economics: External validity and the robustness of phenomena," Journal of Economic Methodology, Taylor & Francis Journals, vol. 12(4), pages 495-515.
    9. Abel Brodeur & Nikolai Cook & Anthony Heyes, 2020. "Methods Matter: p-Hacking and Publication Bias in Causal Analysis in Economics," American Economic Review, American Economic Association, vol. 110(11), pages 3634-3660, November.
    10. Abel Brodeur & Mathias Lé & Marc Sangnier & Yanos Zylberberg, 2016. "Star Wars: The Empirics Strike Back," American Economic Journal: Applied Economics, American Economic Association, vol. 8(1), pages 1-32, January.
    11. David Johnson & John Barry Ryan, 2020. "Amazon Mechanical Turk workers can provide consistent and economically meaningful data," Southern Economic Journal, John Wiley & Sons, vol. 87(1), pages 369-385, July.
    12. Andreas Ortman & Le Zhang, 2013. "Exploring the Meaning of Significance in Experimental Economics," Discussion Papers 2013-32, School of Economics, The University of New South Wales.
    13. Isaiah Andrews & Maximilian Kasy, 2019. "Identification of and Correction for Publication Bias," American Economic Review, American Economic Association, vol. 109(8), pages 2766-2794, August.
    14. repec:cup:judgdm:v:5:y:2010:i:5:p:411-419 is not listed on IDEAS
    15. Harrison, Glenn W. & Lau, Morten I. & Elisabet Rutström, E., 2009. "Risk attitudes, randomization to treatment, and self-selection into experiments," Journal of Economic Behavior & Organization, Elsevier, vol. 70(3), pages 498-507, June.
    16. Eva Vivalt, 2019. "Specification Searching and Significance Inflation Across Time, Methods and Disciplines," Oxford Bulletin of Economics and Statistics, Department of Economics, University of Oxford, vol. 81(4), pages 797-816, August.
    17. Tomáš Havránek, 2015. "Measuring Intertemporal Substitution: The Importance Of Method Choices And Selective Reporting," Journal of the European Economic Association, European Economic Association, vol. 13(6), pages 1180-1204, December.
    18. Felix Chopra & Ingar Haaland & Christopher Roth & Andreas Stegmann, 2023. "The Null Result Penalty," The Economic Journal, Royal Economic Society, vol. 134(657), pages 193-219.
    19. Coppock, Alexander, 2019. "Generalizing from Survey Experiments Conducted on Mechanical Turk: A Replication Approach," Political Science Research and Methods, Cambridge University Press, vol. 7(3), pages 613-628, July.
    20. Yun Shin Lee & Yong Won Seo & Enno Siemsen, 2018. "Running Behavioral Operations Experiments Using Amazon's Mechanical Turk," Production and Operations Management, Production and Operations Management Society, vol. 27(5), pages 973-989, May.
    21. Antonio A. Arechar & Gordon T. Kraft-Todd & David G. Rand, 2017. "Turking overtime: how participant characteristics and behavior vary over time and day on Amazon Mechanical Turk," Journal of the Economic Science Association, Springer;Economic Science Association, vol. 3(1), pages 1-11, July.
    22. Nicholas Swanson & Garret Christensen & Rebecca Littman & David Birke & Edward Miguel & Elizabeth Levy Paluck & Zenan Wang, 2020. "Research Transparency Is on the Rise in Economics," AEA Papers and Proceedings, American Economic Association, vol. 110, pages 61-65, May.
    23. Armin Falk & Stephan Meier & Christian Zehnder, 2013. "Do Lab Experiments Misrepresent Social Preferences? The Case Of Self-Selected Student Samples," Journal of the European Economic Association, European Economic Association, vol. 11(4), pages 839-852, August.
    24. Megan L Head & Luke Holman & Rob Lanfear & Andrew T Kahn & Michael D Jennions, 2015. "The Extent and Consequences of P-Hacking in Science," PLOS Biology, Public Library of Science, vol. 13(3), pages 1-15, March.
    25. Berinsky, Adam J. & Huber, Gregory A. & Lenz, Gabriel S., 2012. "Evaluating Online Labor Markets for Experimental Research: Amazon.com's Mechanical Turk," Political Analysis, Cambridge University Press, vol. 20(3), pages 351-368, July.
    26. Erik Snowberg & Leeat Yariv, 2021. "Testing the Waters: Behavior across Participant Pools," American Economic Review, American Economic Association, vol. 111(2), pages 687-719, February.
    27. Chris Doucouliagos & T.D. Stanley, 2013. "Are All Economic Facts Greatly Exaggerated? Theory Competition And Selectivity," Journal of Economic Surveys, Wiley Blackwell, vol. 27(2), pages 316-339, April.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Ankel-Peters, Jörg & Fiala, Nathan & Neubauer, Florian, 2023. "Do economists replicate?," Journal of Economic Behavior & Organization, Elsevier, vol. 212(C), pages 219-232.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Abel Brodeur, Nikolai M. Cook, Anthony Heyes, 2022. "We Need to Talk about Mechanical Turk: What 22,989 Hypothesis Tests Tell Us about Publication Bias and p-Hacking in Online Experiments," LCERPA Working Papers am0133, Laurier Centre for Economic Research and Policy Analysis.
    2. Brodeur, Abel & Cook, Nikolai & Neisser, Carina, 2022. "P-Hacking, Data Type and Data-Sharing Policy," IZA Discussion Papers 15586, Institute of Labor Economics (IZA).
    3. Brodeur, Abel & Cook, Nikolai & Hartley, Jonathan & Heyes, Anthony, 2022. "Do Pre-Registration and Pre-analysis Plans Reduce p-Hacking and Publication Bias?," MetaArXiv uxf39, Center for Open Science.
    4. Graham Elliott & Nikolay Kudrin & Kaspar Wüthrich, 2022. "Detecting p‐Hacking," Econometrica, Econometric Society, vol. 90(2), pages 887-906, March.
    5. Eszter Czibor & David Jimenez‐Gomez & John A. List, 2019. "The Dozen Things Experimental Economists Should Do (More of)," Southern Economic Journal, John Wiley & Sons, vol. 86(2), pages 371-432, October.
    6. Brodeur, Abel & Cook, Nikolai M. & Hartley, Jonathan S. & Heyes, Anthony, 2023. "Do Pre-Registration and Pre-Analysis Plans Reduce p-Hacking and Publication Bias?: Evidence from 15,992 Test Statistics and Suggestions for Improvement," GLO Discussion Paper Series 1147 [pre.], Global Labor Organization (GLO).
    7. Graham Elliott & Nikolay Kudrin & Kaspar Wuthrich, 2022. "The Power of Tests for Detecting $p$-Hacking," Papers 2205.07950, arXiv.org, revised Jun 2023.
    8. Johannes G. Jaspersen & Marc A. Ragin & Justin R. Sydnor, 2022. "Insurance demand experiments: Comparing crowdworking to the lab," Journal of Risk & Insurance, The American Risk and Insurance Association, vol. 89(4), pages 1077-1107, December.
    9. Guillaume Coqueret, 2023. "Forking paths in financial economics," Papers 2401.08606, arXiv.org.
    10. Abel Brodeur & Scott Carrell & David Figlio & Lester Lusher, 2023. "Unpacking P-hacking and Publication Bias," American Economic Review, American Economic Association, vol. 113(11), pages 2974-3002, November.
    11. Dominika Ehrenbergerova & Josef Bajzik & Tomas Havranek, 2023. "When Does Monetary Policy Sway House Prices? A Meta-Analysis," IMF Economic Review, Palgrave Macmillan;International Monetary Fund, vol. 71(2), pages 538-573, June.
    12. Cristina Blanco-Perez & Abel Brodeur, 2020. "Publication Bias and Editorial Statement on Negative Findings," The Economic Journal, Royal Economic Society, vol. 130(629), pages 1226-1247.
    13. Doucouliagos, Hristos & Hinz, Thomas & Zigova, Katarina, 2022. "Bias and careers: Evidence from the aid effectiveness literature," European Journal of Political Economy, Elsevier, vol. 71(C).
    14. Abel Brodeur & Nikolai Cook & Anthony Heyes, 2020. "Methods Matter: p-Hacking and Publication Bias in Causal Analysis in Economics," American Economic Review, American Economic Association, vol. 110(11), pages 3634-3660, November.
    15. Jindrich Matousek & Tomas Havranek & Zuzana Irsova, 2022. "Individual discount rates: a meta-analysis of experimental evidence," Experimental Economics, Springer;Economic Science Association, vol. 25(1), pages 318-358, February.
    16. Tomas Havranek & Zuzana Irsova & Lubica Laslopova & Olesia Zeynalova, 2020. "Skilled and Unskilled Labor Are Less Substitutable than Commonly Thought," Working Papers IES 2020/29, Charles University Prague, Faculty of Social Sciences, Institute of Economic Studies, revised Sep 2020.
    17. Brodeur, Abel & Cook, Nikolai & Heyes, Anthony, 2018. "Methods Matter: P-Hacking and Causal Inference in Economics," IZA Discussion Papers 11796, Institute of Labor Economics (IZA).
    18. Picchio, Matteo & Ubaldi, Michele, 2022. "Unemployment and Health: A Meta-Analysis," GLO Discussion Paper Series 1128, Global Labor Organization (GLO).
    19. Cazachevici, Alina & Havranek, Tomas & Horvath, Roman, 2020. "Remittances and economic growth: A meta-analysis," World Development, Elsevier, vol. 134(C).
    20. Yariv, Leeat & Snowberg, Erik, 2018. "Testing the Waters: Behavior across Participant Pools," CEPR Discussion Papers 13015, C.E.P.R. Discussion Papers.

    More about this item

    Keywords

    online crowd-sourcing platforms; Amazon Mechanical Turk; p-hacking; publication bias; statistical power; research credibility;
    All these keywords.

    JEL classification:

    • B41 - Schools of Economic Thought and Methodology - - Economic Methodology - - - Economic Methodology
    • C13 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods and Methodology: General - - - Estimation: General
    • C40 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods: Special Topics - - - General
    • C90 - Mathematical and Quantitative Methods - - Design of Experiments - - - General

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:zbw:i4rdps:8. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ZBW - Leibniz Information Centre for Economics (email available below). General contact details of provider: https://www.i4replication.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.