IDEAS home Printed from https://ideas.repec.org/a/spr/qualqt/v59y2025i2d10.1007_s11135-024-02013-6.html
   My bibliography  Save this article

Statistical variable selection and causality in the social and behavioral sciences

Author

Listed:
  • Harold Kincaid

    (University of Cape Town)

Abstract

The problem of “variable selection” is a fundamental one across the sciences. In its broadest terms, this problem would be at least part of the general issue of theory selection and comparison. However, there is a more circumscribed problem that concerns primarily the choice of variables for the best fitting model, given some set of data, usually observational in nature, and specific statistical techniques, typically multiple regression. There is a deep strand in econometrics and other applied social, behavioral, and biomedical science statistics to want formal decision rules or algorithms to pick out variables. The paper examines seven such formal procedures using a simulated data set with known causal relations. The conclusion is that seven often-used procedures make systematic causal errors. Some suggestions about better alternatives conclude.

Suggested Citation

  • Harold Kincaid, 2025. "Statistical variable selection and causality in the social and behavioral sciences," Quality & Quantity: International Journal of Methodology, Springer, vol. 59(2), pages 1383-1404, April.
  • Handle: RePEc:spr:qualqt:v:59:y:2025:i:2:d:10.1007_s11135-024-02013-6
    DOI: 10.1007/s11135-024-02013-6
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11135-024-02013-6
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11135-024-02013-6?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Kevin D. Hoover & Stephen J. Perez, 2004. "Truth and Robustness in Cross‐country Growth Regressions," Oxford Bulletin of Economics and Statistics, Department of Economics, University of Oxford, vol. 66(5), pages 765-798, December.
    2. Tiago M. Fragoso & Wesley Bertoli & Francisco Louzada, 2018. "Bayesian Model Averaging: A Systematic Review and Conceptual Classification," International Statistical Review, International Statistical Institute, vol. 86(1), pages 1-28, April.
    3. Martin Huber & Michael Lechner & Giovanni Mellace, 2016. "The Finite Sample Performance of Estimators for Mediation Analysis Under Sequential Conditional Independence," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 34(1), pages 139-160, January.
    4. Heckman, James & Pinto, Rodrigo, 2015. "Causal Analysis After Haavelmo," Econometric Theory, Cambridge University Press, vol. 31(1), pages 115-151, February.
    5. Arthur Lewbel, 2019. "The Identification Zoo: Meanings of Identification in Econometrics," Journal of Economic Literature, American Economic Association, vol. 57(4), pages 835-903, December.
    6. Mark F. J. Steel, 2020. "Model Averaging and Its Use in Economics," Journal of Economic Literature, American Economic Association, vol. 58(3), pages 644-719, September.
    7. Gauthier T. Kashalala & Steven F. Koch, 2014. "The Economic Approach to Fertility: A Causal Mediation Analysis," Working Papers 201434, University of Pretoria, Department of Economics.
    8. Leamer, Edward E & Leonard, Herman B, 1983. "Reporting the Fragility of Regression Estimates," The Review of Economics and Statistics, MIT Press, vol. 65(2), pages 306-317, May.
    9. Hlavac, Marek, 2016. "ExtremeBounds: Extreme Bounds Analysis in R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 72(i09).
    10. Hamparsum Bozdogan, 1987. "Model selection and Akaike's Information Criterion (AIC): The general theory and its analytical extensions," Psychometrika, Springer;The Psychometric Society, vol. 52(3), pages 345-370, September.
    11. Rosseel, Yves, 2012. "lavaan: An R Package for Structural Equation Modeling," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 48(i02).
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Beck, Krzysztof & Wyszyński, Mateusz & Dubel, Marcin, 2025. "Bayesian dynamic systems modelling. Bayesian model averaging for dynamic panels with weakly exogenous regressors," MPRA Paper 124689, University Library of Munich, Germany.
    2. Ahmed, Walid M.A., 2022. "Robust drivers of Bitcoin price movements: An extreme bounds analysis," The North American Journal of Economics and Finance, Elsevier, vol. 62(C).
    3. Yin‐Wong Cheung & Shi He, 2022. "RMB misalignment: What does a meta‐analysis tell us?," Review of International Economics, Wiley Blackwell, vol. 30(4), pages 1038-1086, September.
    4. Neil R. Ericsson, 2008. "The Fragility of Sensitivity Analysis: An Encompassing Perspective," Oxford Bulletin of Economics and Statistics, Department of Economics, University of Oxford, vol. 70(s1), pages 895-914, December.
    5. Ulaşan, Bülent, 2011. "Cross-country growth empirics and model uncertainty: An overview," Economics Discussion Papers 2011-37, Kiel Institute for the World Economy (IfW Kiel).
    6. Heckman, James & Pinto, Rodrigo, 2024. "Econometric causality: The central role of thought experiments," Journal of Econometrics, Elsevier, vol. 243(1).
    7. Huihang Liu & Xinyu Zhang, 2023. "Frequentist model averaging for undirected Gaussian graphical models," Biometrics, The International Biometric Society, vol. 79(3), pages 2050-2062, September.
    8. Ulaşan, Bülent, 2012. "Cross-country growth empirics and model uncertainty: An overview," Economics - The Open-Access, Open-Assessment E-Journal (2007-2020), Kiel Institute for the World Economy (IfW Kiel), vol. 6, pages 1-69.
    9. Grover,Arti Goswami & Lall,Somik V. & Timmis,Jonathan David, 2021. "Agglomeration Economies in Developing Countries : A Meta-Analysis," Policy Research Working Paper Series 9730, The World Bank.
    10. Tong Zeng, 2024. "Frequentist model averaging in the generalized multinomial logit model," Computational Statistics, Springer, vol. 39(2), pages 605-627, April.
    11. Enrique Labrada & Luis Huesca, "undated". "Data management in household income and expenditure surveys: Working with extended families using Stata," Mexican Stata Conference 2023 19, Stata Users Group.
    12. Guillaume Coqueret, 2023. "Forking paths in financial economics," Papers 2401.08606, arXiv.org.
    13. Guo, Li-Yang & Feng, Chao & Yang, Jun, 2022. "Can energy predict the regional prices of carbon emission allowances in China?," International Review of Financial Analysis, Elsevier, vol. 82(C).
    14. David F. Hendry & Hans‐Martin Krolzig, 2004. "We Ran One Regression," Oxford Bulletin of Economics and Statistics, Department of Economics, University of Oxford, vol. 66(5), pages 799-810, December.
    15. Paul Hunermund & Elias Bareinboim, 2019. "Causal Inference and Data Fusion in Econometrics," Papers 1912.09104, arXiv.org, revised Mar 2023.
    16. Christian Gische & Manuel C. Voelkle, 2022. "Beyond the Mean: A Flexible Framework for Studying Causal Effects Using Linear Models," Psychometrika, Springer;The Psychometric Society, vol. 87(3), pages 868-901, September.
    17. Grover, Arti & Lall, Somik & Timmis, Jonathan, 2023. "Agglomeration economies in developing countries: A meta-analysis," Regional Science and Urban Economics, Elsevier, vol. 101(C).
    18. John Aldrich, 2006. "When are inferences too fragile to be believed?," Journal of Economic Methodology, Taylor & Francis Journals, vol. 13(2), pages 161-177.
    19. Mansour Zarra-Nejad & Fatimah Hosseinpour & Seyed Aziz Arman, 2014. "Trade-Growth Nexus in Developing and Developed Countries: An Application of Extreme Bounds Analysis," Asian Economic and Financial Review, Asian Economic and Social Society, vol. 4(7), pages 915-929, July.
    20. Drachal, Krzysztof, 2021. "Forecasting selected energy commodities prices with Bayesian dynamic finite mixtures," Energy Economics, Elsevier, vol. 99(C).

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:qualqt:v:59:y:2025:i:2:d:10.1007_s11135-024-02013-6. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.