IDEAS home Printed from https://ideas.repec.org/p/nbr/nberwo/26562.html
   My bibliography  Save this paper

Factorial Designs, Model Selection, and (Incorrect) Inference in Randomized Experiments

Author

Listed:
  • Karthik Muralidharan
  • Mauricio Romero
  • Kaspar Wüthrich

Abstract

Factorial designs are widely used for studying multiple treatments in one experiment. While t-tests based on the “long” model (including main and interaction effects) provide valid inferences against “business-as-usual” counterfactuals, “short” model t-tests (that ignore interactions) yield higher power if the interactions are zero, but incorrect inferences otherwise. Out of 27 factorial experiments published in top-5 journals in 2007–2017, 19 use the short model. We reanalyze these experiments, and show that over half of their published results lose significance when interactions are included. We show that testing the interactions using the long model and presenting the short model if the interactions are not significantly different from zero leads to incorrect inference due to the implied data-dependent model selection. Based on recent econometric advances, we show that local power improvements over the long model are possible. However, if the main effects are of primary interest, leaving the interaction cells empty yields valid inferences and global power improvements. In addition, the sample size needed to detect interactions is substantially larger than that required to detect main effects, resulting in most experiments being under-powered to detect interactions. Thus, using factorial designs to explore whether interactions are meaningful can be problematic because interaction estimates are likely to considerably overestimate the magnitude of the true effect conditional on being significant.

Suggested Citation

  • Karthik Muralidharan & Mauricio Romero & Kaspar Wüthrich, 2019. "Factorial Designs, Model Selection, and (Incorrect) Inference in Randomized Experiments," NBER Working Papers 26562, National Bureau of Economic Research, Inc.
  • Handle: RePEc:nbr:nberwo:26562
    Note: DEV ED EH LS TWP
    as

    Download full text from publisher

    File URL: http://www.nber.org/papers/w26562.pdf
    Download Restriction: no
    ---><---

    Other versions of this item:

    References listed on IDEAS

    as
    1. John A. List & Azeem M. Shaikh & Yang Xu, 2019. "Multiple hypothesis testing in experimental economics," Experimental Economics, Springer;Economic Science Association, vol. 22(4), pages 773-793, December.
    2. Graham Elliott & Ulrich K. Müller & Mark W. Watson, 2015. "Nearly Optimal Tests When a Nuisance Parameter Is Present Under the Null Hypothesis," Econometrica, Econometric Society, vol. 83, pages 771-811, March.
    3. Abhijit V. Banerjee & Shawn Cole & Esther Duflo & Leigh Linden, 2007. "Remedying Education: Evidence from Two Randomized Experiments in India," The Quarterly Journal of Economics, President and Fellows of Harvard College, vol. 122(3), pages 1235-1264.
    4. Timothy B. Armstrong & Michal Kolesár, 2018. "Optimal Inference in a Class of Regression Models," Econometrica, Econometric Society, vol. 86(2), pages 655-683, March.
    5. Christopher Blattman & Julian C. Jamison & Margaret Sheridan, 2017. "Reducing Crime and Violence: Experimental Evidence from Cognitive Behavioral Therapy in Liberia," American Economic Review, American Economic Association, vol. 107(4), pages 1165-1206, April.
    6. Uri Gneezy & Kenneth L. Leonard & John A. List, 2009. "Gender Differences in Competition: Evidence From a Matrilineal and a Patriarchal Society," Econometrica, Econometric Society, vol. 77(5), pages 1637-1664, September.
    7. Pedro Carneiro & Sokbae Lee & Daniel Wilhelm, 2020. "Optimal data collection for randomized control trials," The Econometrics Journal, Royal Economic Society, vol. 23(1), pages 1-31.
    8. Garret Christensen & Edward Miguel, 2018. "Transparency, Reproducibility, and the Credibility of Economics Research," Journal of Economic Literature, American Economic Association, vol. 56(3), pages 920-980, September.
    9. Hunt Allcott & Dmitry Taubinsky, 2015. "Evaluating Behaviorally Motivated Policy: Experimental Evidence from the Lightbulb Market," American Economic Review, American Economic Association, vol. 105(8), pages 2501-2538, August.
    10. Bertrand, Marianne & Karlan, Dean S. & Mullainathan, Sendhil & Shafir, Eldar & Zinman, Jonathan, 2005. "What's Psychology Worth? A Field Experiment in the Consumer Credit Market," Center Discussion Papers 28441, Yale University, Economic Growth Center.
    11. James Andreoni & Justin M. Rao & Hannah Trachtman, 2017. "Avoiding the Ask: A Field Experiment on Altruism, Empathy, and Charitable Giving," Journal of Political Economy, University of Chicago Press, vol. 125(3), pages 625-653.
    12. Dean Karlan & Robert Osei & Isaac Osei-Akoto & Christopher Udry, 2014. "Agricultural Decisions after Relaxing Credit and Risk Constraints," The Quarterly Journal of Economics, President and Fellows of Harvard College, vol. 129(2), pages 597-652.
    13. Fischer, Gregory, 2013. "Contract structure, risk sharing and investment choice," LSE Research Online Documents on Economics 46796, London School of Economics and Political Science, LSE Library.
    14. Guido W. Imbens & Charles F. Manski, 2004. "Confidence Intervals for Partially Identified Parameters," Econometrica, Econometric Society, vol. 72(6), pages 1845-1857, November.
    15. Jessica Cohen & Pascaline Dupas & Simone Schaner, 2015. "Price Subsidies, Diagnostic Tests, and Targeting of Malaria Treatment: Evidence from a Randomized Controlled Trial," American Economic Review, American Economic Association, vol. 105(2), pages 609-645, February.
    16. John List & Sally Sadoff & Mathis Wagner, 2011. "So you want to run an experiment, now what? Some simple rules of thumb for optimal experimental design," Experimental Economics, Springer;Economic Science Association, vol. 14(4), pages 439-457, November.
    17. Duflo, Esther & Dupas, Pascaline & Kremer, Michael, 2015. "School governance, teacher incentives, and pupil–teacher ratios: Experimental evidence from Kenyan primary schools," Journal of Public Economics, Elsevier, vol. 123(C), pages 92-110.
    18. Vivi Alatas & Abhijit Banerjee & Rema Hanna & Benjamin A. Olken & Julia Tobias, 2012. "Targeting the Poor: Evidence from a Field Experiment in Indonesia," American Economic Review, American Economic Association, vol. 102(4), pages 1206-1240, June.
    19. Dean Karlan & John A. List, 2007. "Does Price Matter in Charitable Giving? Evidence from a Large-Scale Natural Field Experiment," American Economic Review, American Economic Association, vol. 97(5), pages 1774-1793, December.
    20. Stefan Eriksson & Dan-Olof Rooth, 2014. "Do Employers Use Unemployment as a Sorting Criterion When Hiring? Evidence from a Field Experiment," American Economic Review, American Economic Association, vol. 104(3), pages 1014-1039, March.
    21. McCloskey, Adam, 2017. "Bonferroni-based size-correction for nonstandard testing problems," Journal of Econometrics, Elsevier, vol. 200(1), pages 17-35.
    22. Leeb, Hannes & Pötscher, Benedikt M., 2005. "Model Selection And Inference: Facts And Fiction," Econometric Theory, Cambridge University Press, vol. 21(1), pages 21-59, February.
    23. Esther Duflo & Pascaline Dupas & Michael Kremer, 2011. "Peer Effects, Teacher Incentives, and the Impact of Tracking: Evidence from a Randomized Evaluation in Kenya," American Economic Review, American Economic Association, vol. 101(5), pages 1739-1774, August.
    24. Duflo, Esther & Glennerster, Rachel & Kremer, Michael, 2008. "Using Randomization in Development Economics Research: A Toolkit," Handbook of Development Economics, in: T. Paul Schultz & John A. Strauss (ed.), Handbook of Development Economics, edition 1, volume 4, chapter 61, pages 3895-3962, Elsevier.
    25. Daniel O. Gilligan & Naureen Karachiwalla & Ibrahim Kasirye & Adrienne M. Lucas & Derek Neal, 2022. "Educator Incentives and Educational Triage in Rural Primary Schools," Journal of Human Resources, University of Wisconsin Press, vol. 57(1), pages 79-111.
    26. Supreet Kaur & Michael Kremer & Sendhil Mullainathan, 2015. "Self-Control at Work," Journal of Political Economy, University of Chicago Press, vol. 123(6), pages 1227-1277.
    27. Amanda Pallais & Emily Glassberg Sands, 2016. "Why the Referential Treatment? Evidence from Field Experiments on Referrals," Journal of Political Economy, University of Chicago Press, vol. 124(6), pages 1793-1828.
    28. Dean Karlan & Jonathan Zinman, 2009. "Observing Unobservables: Identifying Information Asymmetries With a Consumer Credit Field Experiment," Econometrica, Econometric Society, vol. 77(6), pages 1993-2008, November.
    29. Isaiah Andrews & Maximilian Kasy, 2019. "Identification of and Correction for Publication Bias," American Economic Review, American Economic Association, vol. 109(8), pages 2766-2794, August.
    30. Blair, Graeme & Cooper, Jasper & Coppock, Alexander & Humphreys, Macartan, 2019. "Declaring and Diagnosing Research Designs," American Political Science Review, Cambridge University Press, vol. 113(3), pages 838-859, August.
    31. Isaac Mbiti & Karthik Muralidharan & Mauricio Romero & Youdi Schipper & Constantine Manda & Rakesh Rajani, 2019. "Inputs, Incentives, and Complementarities in Education: Experimental Evidence from Tanzania," The Quarterly Journal of Economics, President and Fellows of Harvard College, vol. 134(3), pages 1627-1673.
    32. Miriam Bruhn & David McKenzie, 2009. "In Pursuit of Balance: Randomization in Practice in Development Field Experiments," American Economic Journal: Applied Economics, American Economic Association, vol. 1(4), pages 200-232, October.
    33. Jeffrey A. Flory & Andreas Leibbrandt & John A. List, 2015. "Do Competitive Workplaces Deter Female Workers? A Large-Scale Natural Field Experiment on Job Entry Decisions," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 82(1), pages 122-155.
    34. Stefano Dellavigna & John A. List & Ulrike Malmendier & Gautam Rao, 2017. "Voting to Tell Others," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 84(1), pages 143-181.
    35. Marianne Bertrand & Dean Karlan & Sendhil Mullainathan & Eldar Shafir & Jonathan Zinman, 2010. "What's Advertising Content Worth? Evidence from a Consumer Credit Marketing Field Experiment," The Quarterly Journal of Economics, President and Fellows of Harvard College, vol. 125(1), pages 263-306.
    36. Leeb, Hannes & Pötscher, Benedikt M., 2008. "Can One Estimate The Unconditional Distribution Of Post-Model-Selection Estimators?," Econometric Theory, Cambridge University Press, vol. 24(2), pages 338-376, April.
    37. Henrik Jacobsen Kleven & Martin B. Knudsen & Claus Thustrup Kreiner & Søren Pedersen & Emmanuel Saez, 2011. "Unwilling or Unable to Cheat? Evidence From a Tax Audit Experiment in Denmark," Econometrica, Econometric Society, vol. 79(3), pages 651-692, May.
    38. Nava Ashraf & James Berry & Jesse M. Shapiro, 2010. "Can Higher Prices Stimulate Product Use? Evidence from a Field Experiment in Zambia," American Economic Review, American Economic Association, vol. 100(5), pages 2383-2413, December.
    39. Chad Kendall & Tommaso Nannicini & Francesco Trebbi, 2015. "How Do Voters Respond to Information? Evidence from a Randomized Campaign," American Economic Review, American Economic Association, vol. 105(1), pages 322-353, January.
    40. Jorg Stoye, 2009. "More on Confidence Intervals for Partially Identified Parameters," Econometrica, Econometric Society, vol. 77(4), pages 1299-1315, July.
    41. Jeffrey A. Flory & Andreas Leibbrandt & John A. List, 2010. "Do Competitive Work Places Deter Female Workers? A Large-Scale Natural Field Experiment on Gender Differences in Job-Entry Decisions," NBER Working Papers 16546, National Bureau of Economic Research, Inc.
    42. Jessica Cohen & Pascaline Dupas, 2010. "Free Distribution or Cost-Sharing? Evidence from a Randomized Malaria Prevention Experiment," The Quarterly Journal of Economics, President and Fellows of Harvard College, vol. 125(1), pages 1-45.
    43. Benjamin A. Olken, 2007. "Monitoring Corruption: Evidence from a Field Experiment in Indonesia," Journal of Political Economy, University of Chicago Press, vol. 115(2), pages 200-249.
    44. repec:cup:apsrev:v:113:y:2019:i:03:p:838-859_00 is not listed on IDEAS
    45. Dean S. Karlan & Jonathan Zinman, 2008. "Credit Elasticities in Less-Developed Economies: Implications for Microfinance," American Economic Review, American Economic Association, vol. 98(3), pages 1040-1068, June.
    46. repec:oup:restud:v:84:y::i:1:p:143-181. is not listed on IDEAS
    47. Jennifer Brown & Tanjim Hossain & John Morgan, 2010. "Shrouded Attributes and Information Suppression: Evidence from the Field," The Quarterly Journal of Economics, President and Fellows of Harvard College, vol. 125(2), pages 859-876.
    48. Johannes Haushofer & Jeremy Shapiro, 2016. "The Short-term Impact of Unconditional Cash Transfers to the Poor: ExperimentalEvidence from Kenya," The Quarterly Journal of Economics, President and Fellows of Harvard College, vol. 131(4), pages 1973-2042.
    49. Daniel Kassler & Ira Nichols-Barrer & Mariel Finucane, "undated". "Beyond "Treatment versus Control": How Bayesian Analysis Makes Factorial Experiments Feasible in Education Research," Mathematica Policy Research Reports 58782fcdf93d4edbb20f89f81, Mathematica Policy Research.
    50. Donald W. K. Andrews & Patrik Guggenberger, 2009. "Hybrid and Size-Corrected Subsampling Methods," Econometrica, Econometric Society, vol. 77(3), pages 721-762, May.
    51. Blair, Graeme & Cooper, Jasper & Coppock, Alexander & Humphreys, Macartan, 2019. "Declaring and Diagnosing Research Designs," EconStor Open Access Articles and Book Chapters, ZBW - Leibniz Information Centre for Economics, vol. 113(3), pages 838-859.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Peters, Jörg & Langbein, Jörg & Roberts, Gareth, 2016. "Policy evaluation, randomized controlled trials, and external validity—A systematic review," Economics Letters, Elsevier, vol. 147(C), pages 51-54.
    2. Karthik Muralidharan & Paul Niehaus, 2017. "Experimentation at Scale," Journal of Economic Perspectives, American Economic Association, vol. 31(4), pages 103-124, Fall.
    3. Jörg Peters & Jörg Langbein & Gareth Roberts, 2018. "Generalization in the Tropics – Development Policy, Randomized Controlled Trials, and External Validity," The World Bank Research Observer, World Bank, vol. 33(1), pages 34-64.
    4. Eszter Czibor & David Jimenez‐Gomez & John A. List, 2019. "The Dozen Things Experimental Economists Should Do (More of)," Southern Economic Journal, John Wiley & Sons, vol. 86(2), pages 371-432, October.
    5. Abhijit V. Banerjee & Esther Duflo, 2009. "The Experimental Approach to Development Economics," Annual Review of Economics, Annual Reviews, vol. 1(1), pages 151-178, May.
    6. Levitt, Steven D. & List, John A., 2009. "Field experiments in economics: The past, the present, and the future," European Economic Review, Elsevier, vol. 53(1), pages 1-18, January.
    7. Guido W. Imbens & Jeffrey M. Wooldridge, 2009. "Recent Developments in the Econometrics of Program Evaluation," Journal of Economic Literature, American Economic Association, vol. 47(1), pages 5-86, March.
    8. Paulina Oliva & B. Kelsey Jack & Samuel Bell & Elizabeth Mettetal & Christopher Severen, 2020. "Technology Adoption under Uncertainty: Take-Up and Subsequent Investment in Zambia," The Review of Economics and Statistics, MIT Press, vol. 102(3), pages 617-632, July.
    9. Abhijit V. Banerjee & Esther Duflo, 2010. "Giving Credit Where It Is Due," Journal of Economic Perspectives, American Economic Association, vol. 24(3), pages 61-80, Summer.
    10. Gregory Fletcher Cox, 2024. "A Simple and Adaptive Confidence Interval when Nuisance Parameters Satisfy an Inequality," Papers 2409.09962, arXiv.org.
    11. McCloskey, Adam, 2017. "Bonferroni-based size-correction for nonstandard testing problems," Journal of Econometrics, Elsevier, vol. 200(1), pages 17-35.
    12. Abhijit Banerjee & Sylvain Chassang & Erik Snowberg, 2016. "Decision Theoretic Approaches to Experiment Design and External Validity," NBER Working Papers 22167, National Bureau of Economic Research, Inc.
    13. B Kelsey Jack, "undated". "Market Inefficiencies and the Adoption of Agricultural Technologies in Developing Countries," CID Working Papers 50, Center for International Development at Harvard University.
    14. Pedro Carneiro & Sokbae Lee & Daniel Wilhelm, 2020. "Optimal data collection for randomized control trials," The Econometrics Journal, Royal Economic Society, vol. 23(1), pages 1-31.
    15. Jason T. Kerwin & Rebecca L. Thornton, 2021. "Making the Grade: The Sensitivity of Education Program Effectiveness to Input Choices and Outcome Measures," The Review of Economics and Statistics, MIT Press, vol. 103(2), pages 251-264, May.
    16. Duflo, Esther & Glennerster, Rachel & Kremer, Michael, 2008. "Using Randomization in Development Economics Research: A Toolkit," Handbook of Development Economics, in: T. Paul Schultz & John A. Strauss (ed.), Handbook of Development Economics, edition 1, volume 4, chapter 61, pages 3895-3962, Elsevier.
    17. Eduard Marinov, 2019. "The 2019 Nobel Prize in Economics," Economic Thought journal, Bulgarian Academy of Sciences - Economic Research Institute, issue 6, pages 78-116.
    18. Philipp Ketz & Adam McCloskey, 2021. "Short and Simple Confidence Intervals when the Directions of Some Effects are Known," Papers 2109.08222, arXiv.org.
    19. Belot, Michèle & James, Jonathan, 2016. "Partner selection into policy relevant field experiments," Journal of Economic Behavior & Organization, Elsevier, vol. 123(C), pages 31-56.
    20. James Berry & Greg Fischer & Raymond Guiteras, 2020. "Eliciting and Utilizing Willingness to Pay: Evidence from Field Trials in Northern Ghana," Journal of Political Economy, University of Chicago Press, vol. 128(4), pages 1436-1473.

    More about this item

    JEL classification:

    • C12 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods and Methodology: General - - - Hypothesis Testing: General
    • C18 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods and Methodology: General - - - Methodolical Issues: General
    • C90 - Mathematical and Quantitative Methods - - Design of Experiments - - - General
    • C93 - Mathematical and Quantitative Methods - - Design of Experiments - - - Field Experiments

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nbr:nberwo:26562. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: the person in charge (email available below). General contact details of provider: https://edirc.repec.org/data/nberrus.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.