IDEAS home Printed from
   My bibliography  Save this article

Ecological inference for 2 × 2 tables (with discussion)


  • Jon Wakefield


A fundamental problem in many disciplines, including political science, sociology and epidemiology, is the examination of the association between two binary variables across a series of 2 × 2 tables, when only the margins are observed, and one of the margins is fixed. Two unobserved fractions are of interest, with only a single response per table, and it is this non-identifiability that is the inherent difficulty lying at the heart of ecological inference. Many methods have been suggested for ecological inference, often without a probabilistic model; we clarify the form of the sampling distribution and critique previous approaches within a formal statistical framework, thus allowing clarification and examination of the assumptions that are required under all approaches. A particularly difficult problem is choosing between models with and without contextual effects. Various Bayesian hierarchical modelling approaches are proposed to allow the formal inclusion of supplementary data, and/or prior information, without which ecological inference is unreliable. Careful choice of the prior within such models is required, however, since there may be considerable sensitivity to this choice, even when the model assumed is correct and there are no contextual effects. This sensitivity is shown to be a function of the number of areas and the distribution of the proportions in the fixed margin across areas. By explicitly providing a likelihood for each table, the combination of individual level survey data and aggregate level data is straightforward and we illustrate that survey data can be highly informative, particularly if these data are from a survey of the minority population within each area. This strategy is related to designs that are used in survey sampling and in epidemiology. An approximation to the suggested likelihood is discussed, and various computational approaches are described. Some extensions are outlined including the consideration of multiway tables, spatial dependence and area-specific (contextual) variables. Voter registration-race data from 64 counties in the US state of Louisiana are used to illustrate the methods. Copyright 2004 Royal Statistical Society.

Suggested Citation

  • Jon Wakefield, 2004. "Ecological inference for 2 × 2 tables (with discussion)," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 167(3), pages 385-445.
  • Handle: RePEc:bla:jorssa:v:167:y:2004:i:3:p:385-445

    Download full text from publisher

    File URL:
    File Function: link to full text
    Download Restriction: Access to full text is restricted to subscribers.

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    1. Trivellore E. Raghunathan & Paula K. Diehr & Allen D. Cheadle, 2003. "Combining Aggregate and Individual Level Data to Estimate an Individual Level Correlation Coefficient," Journal of Educational and Behavioral Statistics, , vol. 28(1), pages 1-19, March.
    2. Heckman, James, 2013. "Sample selection bias as a specification error," Applied Econometrics, Publishing House "SINERGIA PRESS", vol. 31(3), pages 129-137.
    3. Little, Roderick J A, 1985. "A Note about Models for Selectivity Bias," Econometrica, Econometric Society, vol. 53(6), pages 1469-1474, November.
    4. N. E. Breslow & N. Chatterjee, 1999. "Design and analysis of two-phase studies with binary outcome applied to Wilms tumour prognosis," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 48(4), pages 457-468.
    5. Julian Besag & Jeremy York & Annie Mollié, 1991. "Bayesian image restoration, with two applications in spatial statistics," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 43(1), pages 1-20, March.
    6. R. L. Chambers & D. G. Steel, 2001. "Simple methods for ecological inference in 2×2 tables," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 164(1), pages 175-192.
    7. Leonhard Knorr-Held & Nicola G. Best, 2001. "A shared component model for detecting joint and selective clustering of two diseases," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 164(1), pages 73-85.
    8. Jon Wakefield, 2003. "Sensitivity Analyses for Ecological Regression," Biometrics, The International Biometric Society, vol. 59(1), pages 9-17, March.
    9. Katherine A. Guthrie & Lianne Sheppard & Jon Wakefield, 2002. "A Hierarchical Aggregate Data Model with Spatially Correlated Disease Rates," Biometrics, The International Biometric Society, vol. 58(4), pages 898-905, December.
    10. repec:cup:apsrev:v:88:y:1994:i:02:p:317-326_09 is not listed on IDEAS
    11. Andrew Gelman & David K. Park & Stephen Ansolabehere & Phillip N. Price & Lorraine C. Minnite, 2001. "Models, assumptions and model checking in ecological regressions," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 164(1), pages 101-118.
    12. Adolph, Christopher & King, Gary & Herron, Michael C. & Shotts, Kenneth W., 2003. "A Consensus on Second-Stage Analyses in Ecological Inference Models," Political Analysis, Cambridge University Press, vol. 11(01), pages 86-94, December.
    13. Sander Greenland, 2000. "When Should Epidemiologic Regressions Use Random Coefficients?," Biometrics, The International Biometric Society, vol. 56(3), pages 915-921, September.
    14. Jonathan Wakefield & Ruth Salway, 2001. "A statistical framework for ecological and aggregate studies," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 164(1), pages 119-137.
    15. Ori Rosen, 2001. "Bayesian and Frequentist Inference for Ecological Inference: The "R"×"C" Case," Statistica Neerlandica, Netherlands Society for Statistics and Operations Research, vol. 55(2), pages 134-156.
    16. Gary King & Ori Rosen & Martin A. Tanner, 1999. "Binomial-Beta Hierarchical Models for Ecological Inference," Sociological Methods & Research, , vol. 28(1), pages 61-90, August.
    17. David A. Freedman & Stephen P. Klein & Jerome Sacks & Charles A. Smyth & Charles G. Everett, 1991. "Ecological Regression and Voting Rights," Evaluation Review, , vol. 15(6), pages 673-711, December.
    18. Burden, Barry C., 2000. "Voter Turnout and the National Election Studies," Political Analysis, Cambridge University Press, vol. 8(04), pages 389-398, July.
    Full references (including those not matched with items on IDEAS)


    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.

    Cited by:

    1. Xiaohui Chang & Rasmus Waagepetersen & Herbert Yu & Xiaomei Ma & Theodore R. Holford & Rong Wang & Yongtao Guan, 2015. "Disease risk estimation by combining case–control data with aggregated information on the population at risk," Biometrics, The International Biometric Society, vol. 71(1), pages 114-121, March.
    2. Rob Eisinga, 2009. "The beta-binomial convolution model for 2×2 tables with missing cell counts," Statistica Neerlandica, Netherlands Society for Statistics and Operations Research, vol. 63(1), pages 24-42.
    3. Sebastien J.-P. A. Haneuse & Jonathan C. Wakefield, 2008. "The combination of ecological and case-control data," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 70(1), pages 73-93.
    4. Sebastien J-P. A. Haneuse & Jonathan C. Wakefield, 2007. "Hierarchical Models for Combining Ecological and Case–Control Data," Biometrics, The International Biometric Society, vol. 63(1), pages 128-136, March.

    More about this item


    Access and download statistics


    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:jorssa:v:167:y:2004:i:3:p:385-445. See general information about how to correct material in RePEc.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: (Wiley-Blackwell Digital Licensing) or (Christopher F. Baum). General contact details of provider: .

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service hosted by the Research Division of the Federal Reserve Bank of St. Louis . RePEc uses bibliographic data supplied by the respective publishers.