IDEAS home Printed from https://ideas.repec.org/a/bla/jorssa/v167y2004i3p385-445.html
   My bibliography  Save this article

Ecological inference for 2 × 2 tables (with discussion)

Author

Listed:
  • Jon Wakefield

Abstract

Summary. A fundamental problem in many disciplines, including political science, sociology and epidemiology, is the examination of the association between two binary variables across a series of 2 × 2 tables, when only the margins are observed, and one of the margins is fixed. Two unobserved fractions are of interest, with only a single response per table, and it is this non‐identifiability that is the inherent difficulty lying at the heart of ecological inference. Many methods have been suggested for ecological inference, often without a probabilistic model; we clarify the form of the sampling distribution and critique previous approaches within a formal statistical framework, thus allowing clarification and examination of the assumptions that are required under all approaches. A particularly difficult problem is choosing between models with and without contextual effects. Various Bayesian hierarchical modelling approaches are proposed to allow the formal inclusion of supplementary data, and/or prior information, without which ecological inference is unreliable. Careful choice of the prior within such models is required, however, since there may be considerable sensitivity to this choice, even when the model assumed is correct and there are no contextual effects. This sensitivity is shown to be a function of the number of areas and the distribution of the proportions in the fixed margin across areas. By explicitly providing a likelihood for each table, the combination of individual level survey data and aggregate level data is straightforward and we illustrate that survey data can be highly informative, particularly if these data are from a survey of the minority population within each area. This strategy is related to designs that are used in survey sampling and in epidemiology. An approximation to the suggested likelihood is discussed, and various computational approaches are described. Some extensions are outlined including the consideration of multiway tables, spatial dependence and area‐specific (contextual) variables. Voter registration–race data from 64 counties in the US state of Louisiana are used to illustrate the methods.

Suggested Citation

  • Jon Wakefield, 2004. "Ecological inference for 2 × 2 tables (with discussion)," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 167(3), pages 385-445, July.
  • Handle: RePEc:bla:jorssa:v:167:y:2004:i:3:p:385-445
    DOI: 10.1111/j.1467-985x.2004.02046.x
    as

    Download full text from publisher

    File URL: https://doi.org/10.1111/j.1467-985x.2004.02046.x
    Download Restriction: no

    File URL: https://libkey.io/10.1111/j.1467-985x.2004.02046.x?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Little, Roderick J A, 1985. "A Note about Models for Selectivity Bias," Econometrica, Econometric Society, vol. 53(6), pages 1469-1474, November.
    2. N. E. Breslow & N. Chatterjee, 1999. "Design and analysis of two‐phase studies with binary outcome applied to Wilms tumour prognosis," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 48(4), pages 457-468.
    3. Julian Besag & Jeremy York & Annie Mollié, 1991. "Bayesian image restoration, with two applications in spatial statistics," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 43(1), pages 1-20, March.
    4. Jon Wakefield, 2003. "Sensitivity Analyses for Ecological Regression," Biometrics, The International Biometric Society, vol. 59(1), pages 9-17, March.
    5. Giles, Micheal W. & Hertz, Kaenan, 1994. "Racial Threat and Partisan Identification," American Political Science Review, Cambridge University Press, vol. 88(2), pages 317-326, June.
    6. Adolph, Christopher & King, Gary & Herron, Michael C. & Shotts, Kenneth W., 2003. "A Consensus on Second-Stage Analyses in Ecological Inference Models," Political Analysis, Cambridge University Press, vol. 11(1), pages 86-94, January.
    7. Sander Greenland, 2000. "When Should Epidemiologic Regressions Use Random Coefficients?," Biometrics, The International Biometric Society, vol. 56(3), pages 915-921, September.
    8. Jonathan Wakefield & Ruth Salway, 2001. "A statistical framework for ecological and aggregate studies," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 164(1), pages 119-137.
    9. Burden, Barry C., 2000. "Voter Turnout and the National Election Studies," Political Analysis, Cambridge University Press, vol. 8(4), pages 389-398, July.
    10. Trivellore E. Raghunathan & Paula K. Diehr & Allen D. Cheadle, 2003. "Combining Aggregate and Individual Level Data to Estimate an Individual Level Correlation Coefficient," Journal of Educational and Behavioral Statistics, , vol. 28(1), pages 1-19, March.
    11. Heckman, James, 2013. "Sample selection bias as a specification error," Applied Econometrics, Russian Presidential Academy of National Economy and Public Administration (RANEPA), vol. 31(3), pages 129-137.
    12. R. L. Chambers & D. G. Steel, 2001. "Simple methods for ecological inference in 2×2 tables," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 164(1), pages 175-192.
    13. Leonhard Knorr‐Held & Nicola G. Best, 2001. "A shared component model for detecting joint and selective clustering of two diseases," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 164(1), pages 73-85.
    14. Katherine A. Guthrie & Lianne Sheppard & Jon Wakefield, 2002. "A Hierarchical Aggregate Data Model with Spatially Correlated Disease Rates," Biometrics, The International Biometric Society, vol. 58(4), pages 898-905, December.
    15. Andrew Gelman & David K. Park & Stephen Ansolabehere & Phillip N. Price & Lorraine C. Minnite, 2001. "Models, assumptions and model checking in ecological regressions," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 164(1), pages 101-118.
    16. Ori Rosen & Wenxin Jiang & Gary King & Martin A. Tanner, 2001. "Bayesian and Frequentist Inference for Ecological Inference: The R×C Case," Statistica Neerlandica, Netherlands Society for Statistics and Operations Research, vol. 55(2), pages 134-156, July.
    17. Gary King & Ori Rosen & Martin A. Tanner, 1999. "Binomial-Beta Hierarchical Models for Ecological Inference," Sociological Methods & Research, , vol. 28(1), pages 61-90, August.
    18. David A. Freedman & Stephen P. Klein & Jerome Sacks & Charles A. Smyth & Charles G. Everett, 1991. "Ecological Regression and Voting Rights," Evaluation Review, , vol. 15(6), pages 673-711, December.
    19. Alice S. Whittemore, 1997. "Multistage Sampling Designs and Estimating Equations," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 59(3), pages 589-602.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Carolina Plescia & Lorenzo De Sio, 2018. "An evaluation of the performance and suitability of R × C methods for ecological inference with known true values," Quality & Quantity: International Journal of Methodology, Springer, vol. 52(2), pages 669-683, March.
    2. Xiaohui Chang & Rasmus Waagepetersen & Herbert Yu & Xiaomei Ma & Theodore R. Holford & Rong Wang & Yongtao Guan, 2015. "Disease risk estimation by combining case–control data with aggregated information on the population at risk," Biometrics, The International Biometric Society, vol. 71(1), pages 114-121, March.
    3. Puig, Xavier & Ginebra, Josep, 2014. "A cluster analysis of vote transitions," Computational Statistics & Data Analysis, Elsevier, vol. 70(C), pages 328-344.
    4. Katie Wilson & Jon Wakefield, 2022. "A probabilistic model for analyzing summary birth history data," Demographic Research, Max Planck Institute for Demographic Research, Rostock, Germany, vol. 47(11), pages 291-344.
    5. Irene L. Hudson & Linda Moore & Eric J. Beh & David G. Steel, 2010. "Ecological inference techniques: an empirical evaluation using data describing gender and voter turnout at New Zealand elections, 1893–1919," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 173(1), pages 185-213, January.
    6. Beh, Eric J., 2010. "The aggregate association index," Computational Statistics & Data Analysis, Elsevier, vol. 54(6), pages 1570-1580, June.
    7. Rob Eisinga, 2009. "The beta‐binomial convolution model for 2×2 tables with missing cell counts," Statistica Neerlandica, Netherlands Society for Statistics and Operations Research, vol. 63(1), pages 24-42, February.
    8. Arie ten Cate, 2014. "Maximum likelihood estimation of the Markov chain model with macro data and the ecological inference model," CPB Discussion Paper 284.rdf, CPB Netherlands Bureau for Economic Policy Analysis.
    9. Antonio Forcina & Davide Pellegrino, 2019. "Estimation of voter transitions and the ecological fallacy," Quality & Quantity: International Journal of Methodology, Springer, vol. 53(4), pages 1859-1874, July.
    10. Nathan Kallus & Xiaojie Mao & Angela Zhou, 2022. "Assessing Algorithmic Fairness with Unobserved Protected Class Using Data Combination," Management Science, INFORMS, vol. 68(3), pages 1959-1981, March.
    11. D. James Greiner & Kevin M. Quinn, 2009. "R×C ecological inference: bounds, correlations, flexibility and transparency of assumptions," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 172(1), pages 67-81, January.
    12. Sebastien J.‐P. A. Haneuse & And Jonathan C. Wakefield, 2008. "The combination of ecological and case–control data," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 70(1), pages 73-93, February.
    13. Sebastien J-P. A. Haneuse & Jonathan C. Wakefield, 2007. "Hierarchical Models for Combining Ecological and Case–Control Data," Biometrics, The International Biometric Society, vol. 63(1), pages 128-136, March.
    14. Shuai Shao & Göran Kauermann, 2020. "Understanding price elasticity for airline ancillary services," Journal of Revenue and Pricing Management, Palgrave Macmillan, vol. 19(1), pages 74-82, February.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Irene L. Hudson & Linda Moore & Eric J. Beh & David G. Steel, 2010. "Ecological inference techniques: an empirical evaluation using data describing gender and voter turnout at New Zealand elections, 1893–1919," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 173(1), pages 185-213, January.
    2. Gillian A. Lancaster & Mick Green & Steven Lane, 2006. "Reducing bias in ecological studies: an evaluation of different methodologies," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 169(4), pages 681-700, October.
    3. Ying C. MacNab, 2018. "Some recent work on multivariate Gaussian Markov random fields," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 27(3), pages 497-541, September.
    4. Petropoulos, Fotios & Apiletti, Daniele & Assimakopoulos, Vassilios & Babai, Mohamed Zied & Barrow, Devon K. & Ben Taieb, Souhaib & Bergmeir, Christoph & Bessa, Ricardo J. & Bijak, Jakub & Boylan, Joh, 2022. "Forecasting: theory and practice," International Journal of Forecasting, Elsevier, vol. 38(3), pages 705-871.
      • Fotios Petropoulos & Daniele Apiletti & Vassilios Assimakopoulos & Mohamed Zied Babai & Devon K. Barrow & Souhaib Ben Taieb & Christoph Bergmeir & Ricardo J. Bessa & Jakub Bijak & John E. Boylan & Jet, 2020. "Forecasting: theory and practice," Papers 2012.03854, arXiv.org, revised Jan 2022.
    5. Carolina Plescia & Lorenzo De Sio, 2018. "An evaluation of the performance and suitability of R × C methods for ecological inference with known true values," Quality & Quantity: International Journal of Methodology, Springer, vol. 52(2), pages 669-683, March.
    6. D. James Greiner & Kevin M. Quinn, 2009. "R×C ecological inference: bounds, correlations, flexibility and transparency of assumptions," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 172(1), pages 67-81, January.
    7. Jon Wakefield, 2003. "Sensitivity Analyses for Ecological Regression," Biometrics, The International Biometric Society, vol. 59(1), pages 9-17, March.
    8. Olga Orlanski & Günther G. Schulze, 2017. "The Determinants of Islamophobia - An Empirical Analysis of the Swiss Minaret Referendum," CESifo Working Paper Series 6741, CESifo.
    9. Michelle Ross & Jon Wakefield, 2013. "Bayesian Inference for Two-Phase Studies with Categorical Covariates," Biometrics, The International Biometric Society, vol. 69(2), pages 469-477, June.
    10. Verbeek, M.J.C.M. & Nijman, T.E., 1992. "Incomplete panels and selection bias : A survey," Discussion Paper 1992-7, Tilburg University, Center for Economic Research.
    11. Francesca Dominici & Lianne Sheppard & Merlise Clyde, 2003. "Health Effects of Air Pollution: A Statistical Review," International Statistical Review, International Statistical Institute, vol. 71(2), pages 243-276, August.
    12. Guilhem Bascle, 2008. "Controlling for endogeneity with instrumental variables in strategic management research," Post-Print hal-00576795, HAL.
    13. Benchimol, Jonathan & El-Shagi, Makram & Saadon, Yossi, 2022. "Do expert experience and characteristics affect inflation forecasts?," Journal of Economic Behavior & Organization, Elsevier, vol. 201(C), pages 205-226.
    14. Denis Conniffe & Vanessa Gash & Philip J. O'Connell, 2000. "Evaluating State Programmes - “Natural Experiments” and Propensity Scores," The Economic and Social Review, Economic and Social Studies, vol. 31(4), pages 283-308.
    15. Filiz Garip, 2012. "An Integrated Analysis of Migration and Remittances: Modeling Migration as a Mechanism for Selection," Population Research and Policy Review, Springer;Southern Demographic Association (SDA), vol. 31(5), pages 637-663, October.
    16. Douglas R. M. Azevedo & Marcos O. Prates & Dipankar Bandyopadhyay, 2021. "MSPOCK: Alleviating Spatial Confounding in Multivariate Disease Mapping Models," Journal of Agricultural, Biological and Environmental Statistics, Springer;The International Biometric Society;American Statistical Association, vol. 26(3), pages 464-491, September.
    17. Chen, Xiao & Huang, Bihong & Ye, Dezhu, 2019. "The Gender Gap in Peer-to-Peer Lending: Evidence from the People’s Republic of China," ADBI Working Papers 977, Asian Development Bank Institute.
    18. Li Xu & Qingshan Jiang & David R. Lairson, 2019. "Spatio-Temporal Variation of Gender-Specific Hypertension Risk: Evidence from China," IJERPH, MDPI, vol. 16(22), pages 1-26, November.
    19. Puig, Xavier & Ginebra, Josep, 2014. "A cluster analysis of vote transitions," Computational Statistics & Data Analysis, Elsevier, vol. 70(C), pages 328-344.
    20. Rob Eisinga, 2009. "The beta‐binomial convolution model for 2×2 tables with missing cell counts," Statistica Neerlandica, Netherlands Society for Statistics and Operations Research, vol. 63(1), pages 24-42, February.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:jorssa:v:167:y:2004:i:3:p:385-445. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: https://edirc.repec.org/data/rssssea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.