IDEAS home Printed from https://ideas.repec.org/a/pal/palcom/v5y2019i1d10.1057_s41599-019-0340-8.html
   My bibliography  Save this article

Raiders of the lost HARK: a reproducible inference framework for big data science

Author

Listed:
  • Mattia Prosperi

    (University of Florida)

  • Jiang Bian

    (University of Florida)

  • Iain E. Buchan

    (University of Liverpool)

  • James S. Koopman

    (University of Michigan)

  • Matthew Sperrin

    (University of Manchester)

  • Mo Wang

    (University of Florida)

Abstract

Hypothesizing after the results are known (HARK) has been disparaged as data dredging, and safeguards including hypothesis preregistration and statistically rigorous oversight have been recommended. Despite potential drawbacks, HARK has deepened thinking about complex causal processes. Some of the HARK precautions can conflict with the modern reality of researchers’ obligations to use big, ‘organic’ data sources—from high-throughput genomics to social media streams. We here propose a HARK-solid, reproducible inference framework suitable for big data, based on models that represent formalization of hypotheses. Reproducibility is attained by employing two levels of model validation: internal (relative to data collated around hypotheses) and external (independent to the hypotheses used to generate data or to the data used to generate hypotheses). With a model-centered paradigm, the reproducibility focus changes from the ability of others to reproduce both data and specific inferences from a study to the ability to evaluate models as representation of reality. Validation underpins ‘natural selection’ in a knowledge base maintained by the scientific community. The community itself is thereby supported to be more productive in generating and critically evaluating theories that integrate wider, complex systems.

Suggested Citation

  • Mattia Prosperi & Jiang Bian & Iain E. Buchan & James S. Koopman & Matthew Sperrin & Mo Wang, 2019. "Raiders of the lost HARK: a reproducible inference framework for big data science," Palgrave Communications, Palgrave Macmillan, vol. 5(1), pages 1-12, December.
  • Handle: RePEc:pal:palcom:v:5:y:2019:i:1:d:10.1057_s41599-019-0340-8
    DOI: 10.1057/s41599-019-0340-8
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1057/s41599-019-0340-8
    File Function: Abstract
    Download Restriction: Access to full text is restricted to subscribers.

    File URL: https://libkey.io/10.1057/s41599-019-0340-8?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Estelle Dumas-Mallet & Katherine Button & Thomas Boraud & Marcus Munafo & François Gonon, 2016. "Replication Validity of Initial Association Studies: A Comparison between Psychiatry, Neurology and Four Somatic Diseases," PLOS ONE, Public Library of Science, vol. 11(6), pages 1-20, June.
    2. Megan L Head & Luke Holman & Rob Lanfear & Andrew T Kahn & Michael D Jennions, 2015. "The Extent and Consequences of P-Hacking in Science," PLOS Biology, Public Library of Science, vol. 13(3), pages 1-15, March.
    3. Laibson, David I. & Rietveld, Cornelius A. & Conley, Dalton & Eriksson, Nicholas & Esko, Tonu & Medland, Sarah E. & Vinkhuyzen, Anna A. E. & Yang, Jian & Boardman, Jason D. & Chabris, Christopher F. &, 2014. "Replicability and Robustness of Genome-Wide-Association Studies for Behavioral Traits," Scholarly Articles 33371478, Harvard University Department of Economics.
    4. Kenneth F Schulz & Douglas G Altman & David Moher & for the CONSORT Group, 2010. "CONSORT 2010 Statement: Updated Guidelines for Reporting Parallel Group Randomised Trials," PLOS Medicine, Public Library of Science, vol. 7(3), pages 1-7, March.
    5. Vancouver, Jeffrey B., 2018. "In Defense of HARKing," Industrial and Organizational Psychology, Cambridge University Press, vol. 11(1), pages 73-80, March.
    6. Mazzola, Joseph J. & Deuling, Jacqueline K., 2013. "Forgetting What We Learned as Graduate Students: HARKing and Selective Outcome Reporting in I–O Journal Articles," Industrial and Organizational Psychology, Cambridge University Press, vol. 6(3), pages 279-284, September.
    7. Marcus R. Munafò & Brian A. Nosek & Dorothy V. M. Bishop & Katherine S. Button & Christopher D. Chambers & Nathalie Percie du Sert & Uri Simonsohn & Eric-Jan Wagenmakers & Jennifer J. Ware & John P. A, 2017. "A manifesto for reproducible science," Nature Human Behaviour, Nature, vol. 1(1), pages 1-9, January.
    8. Nosek, Brian A. & Ebersole, Charles R. & DeHaven, Alexander Carl & Mellor, David Thomas, 2018. "The Preregistration Revolution," OSF Preprints 2dxu5, Center for Open Science.
    9. Robert A. Stine, 2004. "Model Selection Using Information Theory and the MDL Principle," Sociological Methods & Research, , vol. 33(2), pages 230-260, November.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Jasper Brinkerink, 2023. "When Shooting for the Stars Becomes Aiming for Asterisks: P-Hacking in Family Business Research," Entrepreneurship Theory and Practice, , vol. 47(2), pages 304-343, March.
    2. Eszter Czibor & David Jimenez‐Gomez & John A. List, 2019. "The Dozen Things Experimental Economists Should Do (More of)," Southern Economic Journal, John Wiley & Sons, vol. 86(2), pages 371-432, October.
    3. Rubin, Mark, 2020. "Does preregistration improve the credibility of research findings?," MetaArXiv vgr89, Center for Open Science.
    4. Christopher Allen & David M A Mehler, 2019. "Open science challenges, benefits and tips in early career and beyond," PLOS Biology, Public Library of Science, vol. 17(5), pages 1-14, May.
    5. Kraft-Todd, Gordon T. & Rand, David G., 2021. "Practice what you preach: Credibility-enhancing displays and the growth of open science," Organizational Behavior and Human Decision Processes, Elsevier, vol. 164(C), pages 1-10.
    6. Brinkerink, Jasper & De Massis, Alfredo & Kellermanns, Franz, 2022. "One finding is no finding: Toward a replication culture in family business research," Journal of Family Business Strategy, Elsevier, vol. 13(4).
    7. Filip Melinscak & Dominik R Bach, 2020. "Computational optimization of associative learning experiments," PLOS Computational Biology, Public Library of Science, vol. 16(1), pages 1-23, January.
    8. Nosek, Brian A. & Errington, Timothy M., 2019. "What is replication?," MetaArXiv u4g6t, Center for Open Science.
    9. Cantone, Giulio Giacomo, 2023. "The multiversal methodology as a remedy of the replication crisis," MetaArXiv kuhmz, Center for Open Science.
    10. Merton S. Krause, 2019. "Replication and preregistration," Quality & Quantity: International Journal of Methodology, Springer, vol. 53(5), pages 2647-2652, September.
    11. Logg, Jennifer M. & Dorison, Charles A., 2021. "Pre-registration: Weighing costs and benefits for researchers," Organizational Behavior and Human Decision Processes, Elsevier, vol. 167(C), pages 18-27.
    12. Persson, Emil & Tinghög, Gustav, 2020. "Opportunity cost neglect in public policy," Journal of Economic Behavior & Organization, Elsevier, vol. 170(C), pages 301-312.
    13. Michael J. Fell & Alexandra Schneiders & David Shipworth, 2019. "Consumer Demand for Blockchain-Enabled Peer-to-Peer Electricity Trading in the United Kingdom: An Online Survey Experiment," Energies, MDPI, vol. 12(20), pages 1-25, October.
    14. Su Keng Tan & Wai Keung Leung & Alexander Tin Hong Tang & Roger A Zwahlen, 2017. "Effects of mandibular setback with or without maxillary advancement osteotomies on pharyngeal airways: An overview of systematic reviews," PLOS ONE, Public Library of Science, vol. 12(10), pages 1-20, October.
    15. Ángel Enrique & Juana Bretón-López & Guadalupe Molinari & Rosa M. Baños & Cristina Botella, 2018. "Efficacy of an adaptation of the Best Possible Self intervention implemented through positive technology: a randomized control trial," Applied Research in Quality of Life, Springer;International Society for Quality-of-Life Studies, vol. 13(3), pages 671-689, September.
    16. Gerben ter Riet & Paula Chesley & Alan G Gross & Lara Siebeling & Patrick Muggensturm & Nadine Heller & Martin Umbehr & Daniela Vollenweider & Tsung Yu & Elie A Akl & Lizzy Brewster & Olaf M Dekkers &, 2013. "All That Glitters Isn't Gold: A Survey on Acknowledgment of Limitations in Biomedical Studies," PLOS ONE, Public Library of Science, vol. 8(11), pages 1-6, November.
    17. Iranzu Mugueta-Aguinaga & Begonya Garcia-Zapirain, 2017. "FRED: Exergame to Prevent Dependence and Functional Deterioration Associated with Ageing. A Pilot Three-Week Randomized Controlled Clinical Trial," IJERPH, MDPI, vol. 14(12), pages 1-18, November.
    18. Foster, Dean P. & Stine, Robert & Young, H. Peyton, 2011. "A Markov Test for Alpha," Working Papers 11-49, University of Pennsylvania, Wharton School, Weiss Center.
    19. Spyridon N Papageorgiou & Georgios N Antonoglou & George K Sándor & Theodore Eliades, 2017. "Randomized clinical trials in orthodontics are rarely registered a priori and often published late or not at all," PLOS ONE, Public Library of Science, vol. 12(8), pages 1-13, August.
    20. Milagros Molero-Zafra & María Teresa Mitjans-Lafont & María Jesús Hernández-Jiménez & Marián Pérez-Marín, 2022. "Psychological Intervention in Women Victims of Childhood Sexual Abuse: An Open Study—Protocol of a Randomized Controlled Clinical Trial Comparing EMDR Psychotherapy and Trauma-Based Cognitive Therapy," IJERPH, MDPI, vol. 19(12), pages 1-16, June.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:pal:palcom:v:5:y:2019:i:1:d:10.1057_s41599-019-0340-8. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: https://www.nature.com/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.