Correlation and regression in contingency tables. A measure of association or correlation in nominal data (contingency tables), using determinants
AbstractNominal data currently lack a correlation coefficient, such as has already defined for real data. A measure is possible using the determinant, with the useful interpretation that the determinant gives the ratio between volumes. With M a m × n contingency table and n ≤ m the suggested measure is r = Sqrt[det[A'A]] with A = Normalized[M]. With M an n1 × n2 × ... × nk contingency matrix, we can construct a matrix of pairwise correlations R. A matrix of such pairwise correlations is called an association matrix. If that matrix is also positive semi-definite (PSD) then it is a proper correlation matrix. The overall correlation then is R = f[R] where f can be chosen to impose PSD-ness. An option is to use f[R] = Sqrt[1 - det[R]]. However, for both nominal and cardinal data the advisable choice is to take the maximal multiple correlation within R. The resulting measure of “nominal correlation” measures the distance between a main diagonal and the off-diagonal elements, and thus is a measure of strong correlation. Cramer’s V measure for pairwise correlation can be generalized in this manner too. It measures the distance between all diagonals (including cross-diagaonals and subdiagonals) and statistical independence, and thus is a measure of weaker correlation. Finally, when also variances are defined then regression coefficients can be determined from the variance-covariance matrix. The volume ratio measure can be related to the regression coefficients, not of the variables, but of the categories in the contingency matrix, using the conditional probabilities given the row and column sums.
Download InfoIf you experience problems downloading a file, check if you have the proper application to view it first. In case of further problems read the IDEAS help page. Note that these files are not on the IDEAS site. Please be patient as the files may be large.
Bibliographic InfoPaper provided by University Library of Munich, Germany in its series MPRA Paper with number 3394.
Date of creation: 15 Mar 2007
Date of revision: 07 Jun 2007
association; correlation; contingency table; volume ratio; determinant; nonparametric methods; nominal data; nominal scale; categorical data; Fisher’s exact test; odds ratio; tetrachoric correlation coefficient; phi; Cramer’s V; Pearson; contingency coefficient; uncertainty coefficient; Theil’s U; eta; meta-analysis; Simpson’s paradox; causality; statistical independence; regression;
Find related papers by JEL classification:
- C10 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods and Methodology: General - - - General
This paper has been announced in the following NEP Reports:
Please report citation or reference errors to , or , if you are the registered author of the cited work, log in to your RePEc Author Service profile, click on "citations" and make appropriate adjustments.:
- Colignatus, Thomas, 2007. "A measure of association (correlation) in nominal data (contingency tables), using determinants," MPRA Paper 2662, University Library of Munich, Germany, revised 10 Apr 2007.
- Colignatus, Thomas, 2007. "A comparison of nominal regression and logistic regression for contingency tables, including the 2 × 2 × 2 case in causality," MPRA Paper 3615, University Library of Munich, Germany, revised 19 Jun 2007.
- Colignatus, Thomas, 2007. "The 2 x 2 x 2 case in causality, of an effect, a cause and a confounder. A cross-over’s guide to the 2 x 2 x 2 contingency table," MPRA Paper 3351, University Library of Munich, Germany, revised 14 May 2007.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: (Ekkehart Schlicht).
If references are entirely missing, you can add them using this form.