IDEAS home Printed from https://ideas.repec.org/a/bla/jorssb/v70y2008i4p643-677.html
   My bibliography  Save this article

Sampling bias and logistic models

Author

Listed:
  • Peter McCullagh

Abstract

In a regression model, the joint distribution for each finite sample of units is determined by a function "p" x ( y ) depending only on the list of covariate values x =("x"("u" 1 ),…,"x"("u" "n" )) on the sampled units. No random sampling of units is involved. In biological work, random sampling is frequently unavoidable, in which case the joint distribution "p"( y,x ) depends on the sampling scheme. Regression models can be used for the study of dependence provided that the conditional distribution "p"( y | x ) for random samples agrees with "p" x ( y ) as determined by the regression model for a fixed sample having a non-random configuration x . The paper develops a model that avoids the concept of a fixed population of units, thereby forcing the sampling plan to be incorporated into the sampling distribution. For a quota sample having a predetermined covariate configuration x , the sampling distribution agrees with the standard logistic regression model with correlated components. For most natural sampling plans such as sequential or simple random sampling, the conditional distribution "p"( y | x ) is not the same as the regression distribution unless "p" x ( y ) has independent components. In this sense, most natural sampling schemes involving binary random-effects models are biased. The implications of this formulation for subject-specific and population-averaged procedures are explored. Copyright (c) 2008 Royal Statistical Society.

Suggested Citation

  • Peter McCullagh, 2008. "Sampling bias and logistic models," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 70(4), pages 643-677.
  • Handle: RePEc:bla:jorssb:v:70:y:2008:i:4:p:643-677
    as

    Download full text from publisher

    File URL: http://www.blackwell-synergy.com/doi/abs/10.1111/j.1467-9868.2007.00660.x
    File Function: link to full text
    Download Restriction: Access to full text is restricted to subscribers.

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Jonathan S. Schildcrout & Patrick J. Heagerty, 2007. "Marginalized Models for Moderate to Long Series of Longitudinal Binary Response Data," Biometrics, The International Biometric Society, vol. 63(2), pages 322-331, June.
    2. Sun, Jianguo & Sun, Liuquan & Liu, Dandan, 2007. "Regression Analysis of Longitudinal Data in the Presence of Informative Observation and Censoring Times," Journal of the American Statistical Association, American Statistical Association, vol. 102, pages 1397-1406, December.
    3. Haiqun Lin & Daniel O. Scharfstein & Robert A. Rosenheck, 2004. "Analysis of longitudinal data with irregular, outcome-dependent follow-up," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 66(3), pages 791-813.
    4. Patrick J. Heagerty, 1999. "Marginally Specified Logistic-Normal Models for Longitudinal Binary Data," Biometrics, The International Biometric Society, vol. 55(3), pages 688-698, September.
    5. Youngjo Lee & John A. Nelder, 2006. "Double hierarchical generalized linear models (with discussion)," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 55(2), pages 139-185.
    6. Donald B. Rubin, 2005. "Causal Inference Using Potential Outcomes: Design, Modeling, Decisions," Journal of the American Statistical Association, American Statistical Association, vol. 100, pages 322-331, March.
    7. Stuart R. Lipsitz & Garrett M. Fitzmaurice & Joseph G. Ibrahim & Richard Gelber & Steven Lipshultz, 2002. "Parameter Estimation in Longitudinal Studies with Outcome-Dependent Follow-Up," Biometrics, The International Biometric Society, vol. 58(3), pages 621-630, September.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Jing Ning & Jing Qin & Yu Shen, 2010. "Non-parametric tests for right-censored data with biased sampling," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 72(5), pages 609-630.
    2. Peter J. Diggle & Raquel Menezes & Ting-li Su, 2010. "Geostatistical inference under preferential sampling," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 59(2), pages 191-232.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:jorssb:v:70:y:2008:i:4:p:643-677. See general information about how to correct material in RePEc.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: (Wiley-Blackwell Digital Licensing) or (Christopher F. Baum). General contact details of provider: http://edirc.repec.org/data/rssssea.html .

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service hosted by the Research Division of the Federal Reserve Bank of St. Louis . RePEc uses bibliographic data supplied by the respective publishers.