IDEAS home Printed from
   My bibliography  Save this paper

Case-control study power and sample size calculations using Stata


  • Katie Saunders

    () (Cancer Research UK, Genetic Epidemiology Division, University of Leeds)

  • Tim Bishop

    (Cancer Research UK, Genetic Epidemiology Division, University of Leeds)

  • Jenny Barrett

    (Cancer Research UK, Genetic Epidemiology Division, University of Leeds)


We use Stata's npnchi2 and nchi2 functions to calculated power and required sample size for case-control studies. Following the method described by Self et al (1992) a large exemplary data set with expected risk factor frequencies among cases and controls under any alternative hypothesis is created. The likelihood ratio test statistic for the hypothesis of interest is distributed as a non-central chi-squared statistic under the alternative hypothesis, and the likelihood ratio test statistic from the analysis of the exemplary data set is an approximation to the non-centrality parameter for this distribution. We apply these methods to power and sample-size calculations for case-control studies of gene-gene and gene-environment interactions. Because of the low power of case-control studies to detect interactions, a wide range of different strategies have been proposed. Required sample size depends on several design parameters and so the simplicity of these methods means that the efficiency of many designs can be compared over different ranges, a valuable tool at the planning stage of a study. Results are presented for population based, family and matching schemes that have been proposed to improve power, and comparisons of the power of different designs are made. Stata programs are available for these comparisons.

Suggested Citation

  • Katie Saunders & Tim Bishop & Jenny Barrett, 2003. "Case-control study power and sample size calculations using Stata," North American Stata Users' Group Meetings 2003 07, Stata Users Group.
  • Handle: RePEc:boc:asug03:07

    Download full text from publisher

    File URL:
    Download Restriction: no

    File URL:
    Download Restriction: no

    File URL:
    Download Restriction: no

    References listed on IDEAS

    1. Wu, De-Min, 1973. "Alternative Tests of Independence Between Stochastic Regressors and Disturbances," Econometrica, Econometric Society, vol. 41(4), pages 733-750, July.
    2. Cumby, Robert E. & Huizinga, John & Obstfeld, Maurice, 1983. "Two-step two-stage least squares estimation in models with rational expectations," Journal of Econometrics, Elsevier, vol. 21(3), pages 333-355, April.
    3. Hansen, Lars Peter, 1982. "Large Sample Properties of Generalized Method of Moments Estimators," Econometrica, Econometric Society, vol. 50(4), pages 1029-1054, July.
    4. White, Halbert, 1980. "A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity," Econometrica, Econometric Society, vol. 48(4), pages 817-838, May.
    5. Caroline Hoxby & M. Daniele Paserman, 1998. "Overidentification Tests with Grouped Data," NBER Technical Working Papers 0223, National Bureau of Economic Research, Inc.
    6. Pesaran, M Hashem & Taylor, Larry W, 1999. " Diagnostics for IV Regressions," Oxford Bulletin of Economics and Statistics, Department of Economics, University of Oxford, vol. 61(2), pages 255-281, May.
    7. Arellano, M, 1987. "Computing Robust Standard Errors for Within-Groups Estimators," Oxford Bulletin of Economics and Statistics, Department of Economics, University of Oxford, vol. 49(4), pages 431-434, November.
    8. Hausman, Jerry, 2015. "Specification tests in econometrics," Applied Econometrics, Publishing House "SINERGIA PRESS", vol. 38(2), pages 112-134.
    9. Nakamura, Alice & Nakamura, Masao, 1981. "On the Relationships among Several Specification Error Tests Presented by Durbin, Wu, and Hausman," Econometrica, Econometric Society, vol. 49(6), pages 1583-1588, November.
    10. Breusch, T S & Pagan, A R, 1979. "A Simple Test for Heteroscedasticity and Random Coefficient Variation," Econometrica, Econometric Society, vol. 47(5), pages 1287-1294, September.
    11. Hausman, Jerry A. & Taylor, William E., 1981. "A generalized specification test," Economics Letters, Elsevier, vol. 8(3), pages 239-245.
    12. Martin S. Eichenbaum & Lars Peter Hansen & Kenneth J. Singleton, 1988. "A Time Series Analysis of Representative Agent Models of Consumption and Leisure Choice Under Uncertainty," The Quarterly Journal of Economics, Oxford University Press, vol. 103(1), pages 51-78.
    13. Hansen, Lars Peter & Heaton, John & Yaron, Amir, 1996. "Finite-Sample Properties of Some Alternative GMM Estimators," Journal of Business & Economic Statistics, American Statistical Association, vol. 14(3), pages 262-280, July.
    14. John Shea, 1997. "Instrument Relevance in Multivariate Linear Models: A Simple Measure," The Review of Economics and Statistics, MIT Press, vol. 79(2), pages 348-352, May.
    15. Jinyong Hahn & Jerry Hausman, 2002. "A New Specification Test for the Validity of Instrumental Variables," Econometrica, Econometric Society, vol. 70(1), pages 163-189, January.
    16. Douglas Staiger & James H. Stock, 1997. "Instrumental Variables Regression with Weak Instruments," Econometrica, Econometric Society, vol. 65(3), pages 557-586, May.
    17. K. Newey, Whitney, 1985. "Generalized method of moments specification testing," Journal of Econometrics, Elsevier, vol. 29(3), pages 229-256, September.
    18. Godfrey, Leslie G., 1978. "Testing for multiplicative heteroskedasticity," Journal of Econometrics, Elsevier, vol. 8(2), pages 227-236, October.
    19. Chamberlain, Gary, 1982. "Multivariate regression models for panel data," Journal of Econometrics, Elsevier, vol. 18(1), pages 5-46, January.
    20. Moulton, Brent R., 1986. "Random group effects and the precision of regression estimates," Journal of Econometrics, Elsevier, vol. 32(3), pages 385-397, August.
    21. Koenker, Roger, 1981. "A note on studentizing a test for heteroscedasticity," Journal of Econometrics, Elsevier, vol. 17(1), pages 107-112, September.
    22. Cragg, John G, 1983. "More Efficient Estimation in the Presence of Heteroscedasticity of Unknown Form," Econometrica, Econometric Society, vol. 51(3), pages 751-763, May.
    Full references (including those not matched with items on IDEAS)

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:


    Access and download statistics


    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:boc:asug03:07. See general information about how to correct material in RePEc.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: (Christopher F Baum). General contact details of provider: .

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service hosted by the Research Division of the Federal Reserve Bank of St. Louis . RePEc uses bibliographic data supplied by the respective publishers.