IDEAS home Printed from
MyIDEAS: Log in (now much improved!) to save this article

A note on bias due to fitting prospective multivariate generalized linear models to categorical outcomes ignoring retrospective sampling schemes

Listed author(s):
  • Mukherjee, Bhramar
  • Liu, Ivy
Registered author(s):

    Outcome-dependent sampling designs are commonly used in economics, market research and epidemiological studies. Case-control sampling design is a classic example of outcome-dependent sampling, where exposure information is collected on subjects conditional on their disease status. In many situations, the outcome under consideration may have multiple categories instead of a simple dichotomization. For example, in a case-control study, there may be disease sub-classification among the "cases" based on progression of the disease, or in terms of other histological and morphological characteristics of the disease. In this note, we investigate the issue of fitting prospective multivariate generalized linear models to such multiple-category outcome data, ignoring the retrospective nature of the sampling design. We first provide a set of necessary and sufficient conditions for the link functions that will allow for equivalence of prospective and retrospective inference for the parameters of interest. We show that for categorical outcomes, prospective-retrospective equivalence does not hold beyond the generalized multinomial logit link. We then derive an approximate expression for the bias incurred when link functions outside this class are used. Most popular models for ordinal response fall outside the multiplicative intercept class and one should be cautious while performing a naive prospective analysis of such data as the bias could be substantial. We illustrate the extent of bias through a real data example, based on the ongoing Prostate, Lung, Colorectal and Ovarian (PLCO) cancer screening trial by the National Cancer Institute. The simulations based on the real study illustrate that the bias approximations work well in practice.

    If you experience problems downloading a file, check if you have the proper application to view it first. In case of further problems read the IDEAS help page. Note that these files are not on the IDEAS site. Please be patient as the files may be large.

    File URL:
    Download Restriction: Full text for ScienceDirect subscribers only

    As the access to this document is restricted, you may want to look for a different version under "Related research" (further below) or search for a different version of it.

    Article provided by Elsevier in its journal Journal of Multivariate Analysis.

    Volume (Year): 100 (2009)
    Issue (Month): 3 (March)
    Pages: 459-472

    in new window

    Handle: RePEc:eee:jmvana:v:100:y:2009:i:3:p:459-472
    Contact details of provider: Web page:

    Order Information: Postal:

    References listed on IDEAS
    Please report citation or reference errors to , or , if you are the registered author of the cited work, log in to your RePEc Author Service profile, click on "citations" and make appropriate adjustments.:

    in new window

    1. Chatterjee, Nilanjan, 2004. "A Two-Stage Regression Model for Epidemiological Studies With Multivariate Disease Classification Data," Journal of the American Statistical Association, American Statistical Association, vol. 99, pages 127-138, January.
    2. N. E. Breslow & N. Chatterjee, 1999. "Design and analysis of two-phase studies with binary outcome applied to Wilms tumour prognosis," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 48(4), pages 457-468.
    3. Shaun R. Seaman, 2004. "Equivalence of prospective and retrospective models in the Bayesian analysis of case-control studies," Biometrika, Biometrika Trust, vol. 91(1), pages 15-25, March.
    4. Zhang, Biao, 2006. "Prospective and retrospective analyses under logistic regression models," Journal of Multivariate Analysis, Elsevier, vol. 97(1), pages 211-230, January.
    5. White, Halbert, 1982. "Maximum Likelihood Estimation of Misspecified Models," Econometrica, Econometric Society, vol. 50(1), pages 1-25, January.
    6. Wang, C. Y. & Wang, Suojin & Carroll, R. J., 1997. "Estimation in choice-based sampling with measurement error and bootstrap analysis," Journal of Econometrics, Elsevier, vol. 77(1), pages 65-86, March.
    7. John M. Neuhaus, 2002. "Bias due to Ignoring the Sample Design in Case-Control Studies," Australian & New Zealand Journal of Statistics, Australian Statistical Publishing Association Inc., vol. 44(3), pages 285-293, 09.
    Full references (including those not matched with items on IDEAS)

    This item is not listed on Wikipedia, on a reading list or among the top items on IDEAS.

    When requesting a correction, please mention this item's handle: RePEc:eee:jmvana:v:100:y:2009:i:3:p:459-472. See general information about how to correct material in RePEc.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: (Dana Niculescu)

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If references are entirely missing, you can add them using this form.

    If the full references list an item that is present in RePEc, but the system did not link to it, you can help with this form.

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your profile, as there may be some citations waiting for confirmation.

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    This information is provided to you by IDEAS at the Research Division of the Federal Reserve Bank of St. Louis using RePEc data.