IDEAS home Printed from https://ideas.repec.org/a/bla/jorssb/v70y2008i4p643-677.html
   My bibliography  Save this article

Sampling bias and logistic models

Author

Listed:
  • Peter McCullagh

Abstract

Summary. In a regression model, the joint distribution for each finite sample of units is determined by a function px(y) depending only on the list of covariate values x=(x(u1),…,x(un)) on the sampled units. No random sampling of units is involved. In biological work, random sampling is frequently unavoidable, in which case the joint distribution p(y,x) depends on the sampling scheme. Regression models can be used for the study of dependence provided that the conditional distribution p(y|x) for random samples agrees with px(y) as determined by the regression model for a fixed sample having a non‐random configuration x. The paper develops a model that avoids the concept of a fixed population of units, thereby forcing the sampling plan to be incorporated into the sampling distribution. For a quota sample having a predetermined covariate configuration x, the sampling distribution agrees with the standard logistic regression model with correlated components. For most natural sampling plans such as sequential or simple random sampling, the conditional distribution p(y|x) is not the same as the regression distribution unless px(y) has independent components. In this sense, most natural sampling schemes involving binary random‐effects models are biased. The implications of this formulation for subject‐specific and population‐averaged procedures are explored.

Suggested Citation

  • Peter McCullagh, 2008. "Sampling bias and logistic models," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 70(4), pages 643-677, September.
  • Handle: RePEc:bla:jorssb:v:70:y:2008:i:4:p:643-677
    DOI: 10.1111/j.1467-9868.2007.00660.x
    as

    Download full text from publisher

    File URL: https://doi.org/10.1111/j.1467-9868.2007.00660.x
    Download Restriction: no

    File URL: https://libkey.io/10.1111/j.1467-9868.2007.00660.x?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Jonathan S. Schildcrout & Patrick J. Heagerty, 2007. "Marginalized Models for Moderate to Long Series of Longitudinal Binary Response Data," Biometrics, The International Biometric Society, vol. 63(2), pages 322-331, June.
    2. Sun, Jianguo & Sun, Liuquan & Liu, Dandan, 2007. "Regression Analysis of Longitudinal Data in the Presence of Informative Observation and Censoring Times," Journal of the American Statistical Association, American Statistical Association, vol. 102, pages 1397-1406, December.
    3. Haiqun Lin & Daniel O. Scharfstein & Robert A. Rosenheck, 2004. "Analysis of longitudinal data with irregular, outcome‐dependent follow‐up," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 66(3), pages 791-813, August.
    4. Youngjo Lee & John A. Nelder, 2006. "Double hierarchical generalized linear models (with discussion)," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 55(2), pages 139-185, April.
    5. Donald B. Rubin, 2005. "Causal Inference Using Potential Outcomes: Design, Modeling, Decisions," Journal of the American Statistical Association, American Statistical Association, vol. 100, pages 322-331, March.
    6. Patrick J. Heagerty, 1999. "Marginally Specified Logistic-Normal Models for Longitudinal Binary Data," Biometrics, The International Biometric Society, vol. 55(3), pages 688-698, September.
    7. Stuart R. Lipsitz & Garrett M. Fitzmaurice & Joseph G. Ibrahim & Richard Gelber & Steven Lipshultz, 2002. "Parameter Estimation in Longitudinal Studies with Outcome-Dependent Follow-Up," Biometrics, The International Biometric Society, vol. 58(3), pages 621-630, September.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Jing Ning & Jing Qin & Yu Shen, 2010. "Non‐parametric tests for right‐censored data with biased sampling," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 72(5), pages 609-630, November.
    2. Peter J. Diggle & Raquel Menezes & Ting‐li Su, 2010. "Geostatistical inference under preferential sampling," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 59(2), pages 191-232, March.
    3. Walter Dempsey & Peter McCullagh, 2018. "Survival models and health sequences," Lifetime Data Analysis: An International Journal Devoted to Statistical Methods and Applications for Time-to-Event Data, Springer, vol. 24(4), pages 550-584, October.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Sun, Liuquan & Tong, Xingwei, 2009. "Analyzing longitudinal data with informative observation times under biased sampling," Statistics & Probability Letters, Elsevier, vol. 79(9), pages 1162-1168, May.
    2. Na Cai & Wenbin Lu & Hao Helen Zhang, 2012. "Time-Varying Latent Effect Model for Longitudinal Data with Informative Observation Times," Biometrics, The International Biometric Society, vol. 68(4), pages 1093-1102, December.
    3. Yingye Zheng & Patrick J. Heagerty, 2007. "Prospective Accuracy for Longitudinal Markers," Biometrics, The International Biometric Society, vol. 63(2), pages 332-341, June.
    4. Charles E. McCulloch & John M. Neuhaus & Rebecca L. Olin, 2016. "Biased and unbiased estimation in longitudinal studies with informative visit processes," Biometrics, The International Biometric Society, vol. 72(4), pages 1315-1324, December.
    5. Loni Philip Tabb & Eric J. Tchetgen Tchetgen & Greg A. Wellenius & Brent A. Coull, 2016. "Marginalized Zero-Altered Models for Longitudinal Count Data," Statistics in Biosciences, Springer;International Chinese Statistical Association, vol. 8(2), pages 181-203, October.
    6. Lee, Keunbaik & Sohn, Insuk & Kim, Donguk, 2016. "Analysis of long series of longitudinal ordinal data using marginalized models," Computational Statistics & Data Analysis, Elsevier, vol. 94(C), pages 363-371.
    7. Benjamin French & Patrick J. Heagerty, 2009. "Marginal Mark Regression Analysis of Recurrent Marked Point Process Data," Biometrics, The International Biometric Society, vol. 65(2), pages 415-422, June.
    8. Sun, Dayu & Zhao, Hui & Sun, Jianguo, 2021. "Regression analysis of asynchronous longitudinal data with informative observation processes," Computational Statistics & Data Analysis, Elsevier, vol. 157(C).
    9. Jonathan S. Schildcrout & Patrick J. Heagerty, 2011. "Outcome-Dependent Sampling from Existing Cohorts with Longitudinal Binary Response Data: Study Planning and Analysis," Biometrics, The International Biometric Society, vol. 67(4), pages 1583-1593, December.
    10. Lee, Keunbaik & Joo, Yongsung, 2019. "Marginalized models for longitudinal count data," Computational Statistics & Data Analysis, Elsevier, vol. 136(C), pages 47-58.
    11. Lianqiang Qu & Liuquan Sun & Xinyuan Song, 2018. "A Joint Modeling Approach for Longitudinal Data with Informative Observation Times and a Terminal Event," Statistics in Biosciences, Springer;International Chinese Statistical Association, vol. 10(3), pages 609-633, December.
    12. Yu Liang & Wenbin Lu & Zhiliang Ying, 2009. "Joint Modeling and Analysis of Longitudinal Data with Informative Observation Times," Biometrics, The International Biometric Society, vol. 65(2), pages 377-384, June.
    13. Peter J. Diggle & Raquel Menezes & Ting‐li Su, 2010. "Geostatistical inference under preferential sampling," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 59(2), pages 191-232, March.
    14. Qing Cai & Mei‐Cheng Wang & Kwun Chuen Gary Chan, 2017. "Joint modeling of longitudinal, recurrent events and failure time data for survivor's population," Biometrics, The International Biometric Society, vol. 73(4), pages 1150-1160, December.
    15. Jonathan S. Schildcrout & Paul J. Rathouz, 2010. "Longitudinal Studies of Binary Response Data Following Case–Control and Stratified Case–Control Sampling: Design and Analysis," Biometrics, The International Biometric Society, vol. 66(2), pages 365-373, June.
    16. Noémi Kreif & Richard Grieve & Iván Díaz & David Harrison, 2015. "Evaluation of the Effect of a Continuous Treatment: A Machine Learning Approach with an Application to Treatment for Traumatic Brain Injury," Health Economics, John Wiley & Sons, Ltd., vol. 24(9), pages 1213-1228, September.
    17. Martin Ravallion, 2022. "On the Gains from Tradable Benefits‐in‐kind: Evidence for Workfare in India," Economica, London School of Economics and Political Science, vol. 89(355), pages 770-787, July.
    18. Peter Abell & Ofer Engel, 2021. "Subjective Causality and Counterfactuals in the Social Sciences: Toward an Ethnographic Causality?," Sociological Methods & Research, , vol. 50(4), pages 1842-1862, November.
    19. Shonosuke Sugasawa & Hisashi Noma, 2021. "Efficient screening of predictive biomarkers for individual treatment selection," Biometrics, The International Biometric Society, vol. 77(1), pages 249-257, March.
    20. Salvatore Bimonte & Antonella D’Agostino, 2021. "Tourism development and residents’ well-being: Comparing two seaside destinations in Italy," Tourism Economics, , vol. 27(7), pages 1508-1525, November.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:jorssb:v:70:y:2008:i:4:p:643-677. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: https://edirc.repec.org/data/rssssea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.