IDEAS home Printed from https://ideas.repec.org/a/spr/stmapp/v30y2021i2d10.1007_s10260-020-00536-4.html
   My bibliography  Save this article

Yule–Simpson’s paradox: the probabilistic versus the empirical conundrum

Author

Listed:
  • Aris Spanos

    (Virginia Tech)

Abstract

The current literature views Simpson’s paradox as a probabilistic conundrum by taking the premises (probabilities/parameters/ frequencies) as known. In such a context, it is shown that the paradox arises within a very small subset of the relevant parameter space, rendering the paradox unlikely to occur in real data. The problem, however, is that the probabilistic perspective, ignores certain crucial empirical (data, statistical) issues raised by the original Pearson and Yule papers on ‘spurious’ association reversals. Placing the paradox in a broader empirical framework that begins with the raw data $${\mathbf {z}}_{0}$$ z 0 and an appropriately selected statistical model $${\mathcal {M}}_{{\varvec{{\theta }}}}({\mathbf {x}})$$ M θ ( x ) , the discussion elucidates the original Yule–Pearson conundrum by formalizing its notion of ‘spurious or fictitious associations’ into ‘statistically untrustworthy associations’ stemming from a misspecified $${\mathcal {M}}_{{\varvec{{\theta }}}}( {\mathbf {x}})$$ M θ ( x ) ; invalid probabilistic assumptions imposed on $${\mathbf {z}}_{0}$$ z 0 . It is shown that several empirical examples used to illustrate Simpson’s paradox in the current literature constitute examples of the Yule–Pearson untrustworthy association reversals. The empirical perspective is used to revisit the causal explanation of the paradox and make a case that several widely accepted causal claims are questionable on statistical adequacy grounds. It is also used to propose a procedure to detect and account for the ‘third entity’ in the paradox, as well as (reliably) select among different potential causal explanations, such as collider, mediator or confounder, on empirical grounds.

Suggested Citation

  • Aris Spanos, 2021. "Yule–Simpson’s paradox: the probabilistic versus the empirical conundrum," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 30(2), pages 605-635, June.
  • Handle: RePEc:spr:stmapp:v:30:y:2021:i:2:d:10.1007_s10260-020-00536-4
    DOI: 10.1007/s10260-020-00536-4
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s10260-020-00536-4
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s10260-020-00536-4?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Spanos, Aris, 2010. "Statistical adequacy and the trustworthiness of empirical evidence: Statistical vs. substantive information," Economic Modelling, Elsevier, vol. 27(6), pages 1436-1452, November.
    2. Judea Pearl, 2014. "Comment: Understanding Simpson's Paradox," The American Statistician, Taylor & Francis Journals, vol. 68(1), pages 8-13, February.
    3. Aldrich, J., 1995. "Correlations genuine and spurious in Pearson and Yule," Discussion Paper Series In Economics And Econometrics 9502, Economics Division, School of Social Sciences, University of Southampton.
    4. Spanos,Aris, 2019. "Probability Theory and Statistical Inference," Cambridge Books, Cambridge University Press, number 9781316636374.
    5. Pavlides, Marios G. & Perlman, Michael D., 2009. "How Likely Is Simpson’s Paradox?," The American Statistician, American Statistical Association, vol. 63(3), pages 226-233.
    6. Timothy W. Armistead, 2014. "Resurrecting the Third Variable: A Critique of Pearl's Causal Analysis of Simpson's Paradox," The American Statistician, Taylor & Francis Journals, vol. 68(1), pages 1-7, February.
    7. Aris Spanos & Anya McGuirk, 2001. "The Model Specification Problem from a Probabilistic Reduction Perspective," American Journal of Agricultural Economics, Agricultural and Applied Economics Association, vol. 83(5), pages 1168-1176.
    8. Aris Spanos, 2006. "Revisiting the omitted variables argument: Substantive vs. statistical adequacy," Journal of Economic Methodology, Taylor & Francis Journals, vol. 13(2), pages 179-218.
    9. Keli Liu & Xiao-Li Meng, 2014. "Comment: A Fruitful Resolution to Simpson's Paradox via Multiresolution Inference," The American Statistician, Taylor & Francis Journals, vol. 68(1), pages 17-29, February.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Aris Spanos, 2016. "Transforming structural econometrics: substantive vs. statistical premises of inference," Review of Political Economy, Taylor & Francis Journals, vol. 28(3), pages 426-437, July.
    2. Swamy, P.A.V.B. & Mehta, J.S. & Tavlas, G.S. & Hall, S.G., 2015. "Two applications of the random coefficient procedure: Correcting for misspecifications in a small area level model and resolving Simpson's paradox," Economic Modelling, Elsevier, vol. 45(C), pages 93-98.
    3. Aris Spanos, 2022. "Frequentist Model-based Statistical Induction and the Replication Crisis," Journal of Quantitative Economics, Springer;The Indian Econometric Society (TIES), vol. 20(1), pages 133-159, September.
    4. Spanos, Aris, 2010. "Statistical adequacy and the trustworthiness of empirical evidence: Statistical vs. substantive information," Economic Modelling, Elsevier, vol. 27(6), pages 1436-1452, November.
    5. Spanos, Aris, 2010. "Akaike-type criteria and the reliability of inference: Model selection versus statistical model specification," Journal of Econometrics, Elsevier, vol. 158(2), pages 204-220, October.
    6. Aris Spanos, 2022. "Statistical modeling and inference in the era of Data Science and Graphical Causal modeling," Journal of Economic Surveys, Wiley Blackwell, vol. 36(5), pages 1251-1287, December.
    7. Y. Ma, 2015. "Simpson’s paradox in GDP and per capita GDP growths," Empirical Economics, Springer, vol. 49(4), pages 1301-1315, December.
    8. Francisco Estrada & Víctor Guerrero & Carlos Gay-García & Benjamín Martínez-López, 2013. "A cautionary note on automated statistical downscaling methods for climate change," Climatic Change, Springer, vol. 120(1), pages 263-276, September.
    9. Aris Spanos, 2018. "Mis†Specification Testing In Retrospect," Journal of Economic Surveys, Wiley Blackwell, vol. 32(2), pages 541-577, April.
    10. Niraj Poudyal & Aris Spanos, 2022. "Model Validation and DSGE Modeling," Econometrics, MDPI, vol. 10(2), pages 1-25, April.
    11. Anya McGuirk & Aris Spanos, 2009. "Revisiting Error‐Autocorrelation Correction: Common Factor Restrictions and Granger Non‐Causality," Oxford Bulletin of Economics and Statistics, Department of Economics, University of Oxford, vol. 71(2), pages 273-294, April.
    12. repec:rdg:wpaper:em-dp2013-03 is not listed on IDEAS
    13. Tobias Wand & Oliver Kamps & Hiroshi Iyetomi, 2024. "Causal Hierarchy in the Financial Market Network -- Uncovered by the Helmholtz-Hodge-Kodaira Decomposition," Papers 2408.12839, arXiv.org.
    14. Fort, Margherita & Ichino, Andrea & Rettore, Enrico & Zanella, Giulio, 2022. "Multi-cutoff RD designs with observations located at each cutoff: problems and solutions," CEPR Discussion Papers 16974, C.E.P.R. Discussion Papers.
    15. Geri Skenderi & Christian Joppi & Matteo Denitto & Marco Cristani, 2024. "Well googled is half done: Multimodal forecasting of new fashion product sales with image‐based google trends," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 43(6), pages 1982-1997, September.
    16. Hanck, Christoph, 2011. "Now, whose schools are really better (or weaker) than Germany's? A multiple testing approach," Economic Modelling, Elsevier, vol. 28(4), pages 1739-1746, July.
    17. Chatelain, Jean-Bernard & Ralf, Kirsten, 2014. "Spurious regressions and near-multicollinearity, with an application to aid, policies and growth," EconStor Open Access Articles and Book Chapters, ZBW - Leibniz Information Centre for Economics, vol. 39(A), pages 85-96.
    18. Marek Hanusch, 2013. "Islam and democracy: a response," Public Choice, Springer, vol. 154(3), pages 315-321, March.
    19. Peter C. B. Phillips, 2023. "Discrete Fourier Transforms of Fractional Processes with Econometric Applications," Advances in Econometrics, in: Essays in Honor of Joon Y. Park: Econometric Theory, volume 45, pages 3-71, Emerald Group Publishing Limited.
    20. Wulczyn, Fred, 2020. "Race/ethnicity and running away from foster care," Children and Youth Services Review, Elsevier, vol. 119(C).
    21. Kim, Ji-Hyun, 1999. "Spurious correlation between ratios with a common divisor," Statistics & Probability Letters, Elsevier, vol. 44(4), pages 383-386, October.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:stmapp:v:30:y:2021:i:2:d:10.1007_s10260-020-00536-4. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.