IDEAS home Printed from https://ideas.repec.org/p/ucm/doicae/1724.html
   My bibliography  Save this paper

Automatic regrouping of strata in the chi-square test

Author

Listed:
  • Juan Manuel Pérez-Salamero González

    (Department of Financial Economics and Actuarial Science University of Valencia. (Spain).)

  • Marta Regúlez-Castillo

    (Department of Applied Economics III University of the Basque Country (UPV/EHU) Bilbao (Spain).)

  • Manuel Ventura-Marco

    (Department of Financial Economics and Actuarial Science University of Valencia. (Spain).)

  • Carlos Vidal-Meliá

    (Department of Financial Economics and Actuarial Science, University of Valencia and Research Institute of Economic Analysis (ICAE), Complutense University of Madrid.)

Abstract

Pearson´s chi-square test is widely employed in social and health science to analyze categorical data and contingency tables and to assess sample representativeness. For the test to be valid the sample size must be big enough to provide a minimum number of expected elements per category. If the researcher chooses to regroup the strata in order to solve the failure on the minimum size requirement, the existence of automatic re-grouping procedures in statistical software would be very useful, especially when tests are applied sequentially. After comprehensively reviewing the software that can carry out this test, we find that, with a few exceptions, there is no automatic regrouping of the strata to meet this requirement, although it would be very useful if this were available. This paper develops some functions for regrouping strata automatically no matter where they are located, thus enabling the test to be performed within an iterative procedure. The functions are written in Excel VBA (Visual Basic for Applications) and in Mathematica, so it would not be hard to implement them in other languages. The utility of these functions is shown by using three different datasets. Finally, the iterative use of the functions is applied to the Continuous Sample of Working Lives, a dataset that has been used in a considerable number of studies, especially on labor economics and the Spanish public pension system.

Suggested Citation

  • Juan Manuel Pérez-Salamero González & Marta Regúlez-Castillo & Manuel Ventura-Marco & Carlos Vidal-Meliá, 2017. "Automatic regrouping of strata in the chi-square test," Documentos de Trabajo del ICAE 2017-24, Universidad Complutense de Madrid, Facultad de Ciencias Económicas y Empresariales, Instituto Complutense de Análisis Económico.
  • Handle: RePEc:ucm:doicae:1724
    as

    Download full text from publisher

    File URL: https://eprints.ucm.es/id/eprint/45317/1/1724.pdf
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Anton Grafström & Lina Schelin, 2014. "How to Select Representative Samples," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 41(2), pages 277-290, June.
    2. Shalabh, 2006. "Exact Analysis of Discrete Data by K. F. Hirji," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 169(4), pages 1009-1009, October.
    3. David J. Bartholomew & Panagiota Tzamourani, 1999. "The Goodness of Fit of Latent Trait Models in Attitude Measurement," Sociological Methods & Research, , vol. 27(4), pages 525-546, May.
    4. McCullough, B.D., 2008. "Special section on Microsoft Excel 2007," Computational Statistics & Data Analysis, Elsevier, vol. 52(10), pages 4568-4569, June.
    5. Khan, Haseeb Ahmad, 2003. "A Visual Basic Software for Computing Fisher's Exact Probability," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 8(i21).
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Alberto Maydeu-Olivares & Rosa Montaño, 2013. "How Should We Assess the Fit of Rasch-Type Models? Approximating the Power of Goodness-of-Fit Statistics in Categorical Data Analysis," Psychometrika, Springer;The Psychometric Society, vol. 78(1), pages 116-133, January.
    2. Carolina Navarro & Luis Ayala & José Labeaga, 2010. "Housing deprivation and health status: evidence from Spain," Empirical Economics, Springer, vol. 38(3), pages 555-582, June.
    3. Juan Manuel Pérez-Salamero González & Marta Regúlez-Castillo & Carlos Vidal-Meliá, 2017. "The continuous sample of working lives: improving its representativeness," SERIEs: Journal of the Spanish Economic Association, Springer;Spanish Economic Association, vol. 8(1), pages 43-95, March.
    4. Jan Klaschka & Jenő Reiczigel, 2021. "On matching confidence intervals and tests for some discrete distributions: methodological and computational aspects," Computational Statistics, Springer, vol. 36(3), pages 1775-1790, September.
    5. Raphaël Jauslin & Bardia Panahbehagh & Yves Tillé, 2022. "Sequential spatially balanced sampling," Environmetrics, John Wiley & Sons, Ltd., vol. 33(8), December.
    6. Xin Zhao & Anton Grafström, 2020. "A sample coordination method to monitor totals of environmental variables," Environmetrics, John Wiley & Sons, Ltd., vol. 31(6), September.
    7. Li Cai, 2010. "A Two-Tier Full-Information Item Factor Analysis Model with Applications," Psychometrika, Springer;The Psychometric Society, vol. 75(4), pages 581-612, December.
    8. Omer Ozturk & Olena Kravchuk & Raymond Correll, 2022. "Row–Column Sampling Design Using Auxiliary Ranking Variables," Journal of Agricultural, Biological and Environmental Statistics, Springer;The International Biometric Society;American Statistical Association, vol. 27(4), pages 652-673, December.
    9. Albert Maydeu-Olivares & Harry Joe, 2006. "Limited Information Goodness-of-fit Testing in Multidimensional Contingency Tables," Psychometrika, Springer;The Psychometric Society, vol. 71(4), pages 713-732, December.
    10. Jennifer Proper & Thomas A. Murray, 2023. "An alternative metric for evaluating the potential patient benefit of response‐adaptive randomization procedures," Biometrics, The International Biometric Society, vol. 79(2), pages 1433-1445, June.
    11. B. L. Robertson & O. Ozturk & O. Kravchuk & J. A. Brown, 2022. "Spatially Balanced Sampling with Local Ranking," Journal of Agricultural, Biological and Environmental Statistics, Springer;The International Biometric Society;American Statistical Association, vol. 27(4), pages 622-639, December.
    12. Habiger, Joshua D. & McCann, Melinda H. & Tebbs, Joshua M., 2013. "On optimal confidence sets for parameters in discrete distributions," Statistics & Probability Letters, Elsevier, vol. 83(1), pages 297-303.
    13. Yuqi Gu & Jingchen Liu & Gongjun Xu & Zhiliang Ying, 2018. "Hypothesis Testing of the Q-matrix," Psychometrika, Springer;The Psychometric Society, vol. 83(3), pages 515-537, September.
    14. Anastasios Evgenidis & Apostolos Fasianos, 2021. "Unconventional Monetary Policy and Wealth Inequalities in Great Britain," Oxford Bulletin of Economics and Statistics, Department of Economics, University of Oxford, vol. 83(1), pages 115-175, February.
    15. Katherine von Stackelberg & Pamela R.D. Williams & Ernesto Sánchez-Triana, 2021. "A Systematic Framework for Collecting Site-Specific Sampling and Survey Data to Support Analyses of Health Impacts from Land-Based Pollution in Low- and Middle-Income Countries," IJERPH, MDPI, vol. 18(9), pages 1-24, April.
    16. Vicente Núñez-Antón & Juan Manuel Pérez-Salamero González & Marta Regúlez-Castillo & Carlos Vidal-Meliá, 2020. "Improving the Representativeness of a Simple Random Sample: An Optimization Model and Its Application to the Continuous Sample of Working Lives," Mathematics, MDPI, vol. 8(8), pages 1-27, July.
    17. Moustaki, Irini & Papageorgiou, Ioulia, 2005. "Latent class models for mixed variables with applications in Archaeometry," Computational Statistics & Data Analysis, Elsevier, vol. 48(3), pages 659-675, March.
    18. Xin Zhao & Anton Grafström, 2024. "Estimation of change with partially overlapping and spatially balanced samples," Environmetrics, John Wiley & Sons, Ltd., vol. 35(1), February.
    19. Isabella Morlini, 2012. "A latent variables approach for clustering mixed binary and continuous variables within a Gaussian mixture model," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 6(1), pages 5-28, April.
    20. P. M. Kroonenberg & Albert Verbeek, 2018. "The Tale of Cochran's Rule: My Contingency Table has so Many Expected Values Smaller than 5, What Am I to Do?," The American Statistician, Taylor & Francis Journals, vol. 72(2), pages 175-183, April.

    More about this item

    Keywords

    Chi-square test; statistical software; VBA; Mathematica; Continuous Sample of Working Lives.;
    All these keywords.

    JEL classification:

    • C46 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods: Special Topics - - - Specific Distributions
    • C88 - Mathematical and Quantitative Methods - - Data Collection and Data Estimation Methodology; Computer Programs - - - Other Computer Software
    • H55 - Public Economics - - National Government Expenditures and Related Policies - - - Social Security and Public Pensions

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:ucm:doicae:1724. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Águeda González Abad (email available below). General contact details of provider: https://edirc.repec.org/data/feucmes.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.