IDEAS home Printed from https://ideas.repec.org/p/boc/osug04/6.html
   My bibliography  Save this paper

The effect of missing data on covariates in survival analysis

Author

Listed:
  • Irit Aitkin

    (Department of Psychology, University of Melbourne)

Abstract

We deal with this problem in the context of survival analysis with missing data on covariates. More specifically, we examine the factors affecting the duration of breastfeeding in Western Australia. Duration was studied in 556 women delivering at two maternity hospitals in Perth, Australia. The study was carried out over the period September 1992 to April 1993. 466 women breastfed when they left the hospital. In a previous analysis, the Cox proportional hazards model was fitted to determine the factors affecting duration of breastfeeding. However, because of missing data, a covariate known to be important, smoking, could not be used as it would have resulted in a loss of almost 50% of the available sample. In this analysis, we incorporate the incomplete data on smoking omitted from the previous analysis. We deal with the missing data on covariates in survival analysis in two ways--the first is by maximum likelihood and the second by multiple imputation. Direct maximization of the likelihood with missing data is complicated, and most methods that perform maximum likelihood estimation (for example, the EM algorithm) use some form of data augmentation, which augments the observed data with latent (unobserved) data, so that very complicated calculations are replaced by much simpler ones given the "complete data". The distribution of response time for cases with smoking missing is no longer a Cox model but a mixture of two such models, in proportions given by the population proportions of smokers and non-smokers. The likelihood function is therefore different for complete and incomplete cases, and so maximizing it is more complicated in having to allow for this difference. We carried out the ML analysis in Stata using GLLAMM (Generalized Linear Latent And Mixed Models) routines (Rabe-Hesketh, Pickles, and Skrondal 2001). In the GLLAMM procedure, a latent smoking variable is defined for the cases with smoking missing, and the breastfeeding durations are regressed on the explanatory variables and smoking--the covariate when it is observed and the latent variable when not. The model for the smoking covariate is a "measurement model" when the covariate is observed and a "structural model" when it is not. We compared ML using GLLAMM with multiple imputation using the program written by J.L Schafer mainly for S-Plus/R. It is based on the data augmentation algorithm (Tanner and Wong 1987).

Suggested Citation

  • Irit Aitkin, "undated". "The effect of missing data on covariates in survival analysis," Australasian Stata Users' Group Meetings 2004 6, Stata Users Group.
  • Handle: RePEc:boc:osug04:6
    as

    Download full text from publisher

    To our knowledge, this item is not available for download. To find whether it is available, there are three options:
    1. Check below whether another version of this item is available online.
    2. Check on the provider's web page whether it is in fact available.
    3. Perform a
    for a similarly titled item that would be available.

    References listed on IDEAS

    as
    1. Murray Aitkin & David Clayton, 1980. "The Fitting of Exponential, Weibull and Extreme Value Distributions to Complex Censored Survival Data Using Glim," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 29(2), pages 156-163, June.
    2. J. F. Lawless & J. D. Kalbfleisch & C. J. Wild, 1999. "Semiparametric methods for response‐selective and missing data problems in regression," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 61(2), pages 413-438, April.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Powers, Daniel A. & Yun, Myeong-Su, 2009. "Multivariate Decomposition for Hazard Rate Models," IZA Discussion Papers 3971, Institute of Labor Economics (IZA).
    2. Ryo Kato & Takahiro Hoshino, 2020. "Semiparametric Bayesian multiple imputation for regression models with missing mixed continuous–discrete covariates," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 72(3), pages 803-825, June.
    3. Aubry, Philippe & Francesiaz, Charlotte & Guillemain, Matthieu, 2024. "On the impact of preferential sampling on ecological status and trend assessment," Ecological Modelling, Elsevier, vol. 492(C).
    4. A. Adam Ding & Natalie DelRocco & Samuel S. Wu, 2024. "Statistical Methods for Selective Biomarker Testing," Statistics in Biosciences, Springer;International Chinese Statistical Association, vol. 16(3), pages 693-722, December.
    5. Jonathan S. Schildcrout & Shawn P. Garbett & Patrick J. Heagerty, 2013. "Outcome Vector Dependent Sampling with Longitudinal Continuous Response Data: Stratified Sampling Based on Summary Statistics," Biometrics, The International Biometric Society, vol. 69(2), pages 405-416, June.
    6. J. F. Lawless, 2018. "Two-phase outcome-dependent studies for failure times and testing for effects of expensive covariates," Lifetime Data Analysis: An International Journal Devoted to Statistical Methods and Applications for Time-to-Event Data, Springer, vol. 24(1), pages 28-44, January.
    7. Takahiro Hoshino & Hiroshi Kurata & Kazuo Shigemasu, 2006. "A Propensity Score Adjustment for Multiple Group Structural Equation Modeling," Psychometrika, Springer;The Psychometric Society, vol. 71(4), pages 691-712, December.
    8. Sasaki, Yuya & Ura, Takuya, 2023. "Estimation and inference for policy relevant treatment effects," Journal of Econometrics, Elsevier, vol. 234(2), pages 394-450.
    9. Fatema Shafie Khorassani & Jeremy M. G. Taylor & Niko Kaciroti & Michael R. Elliott, 2023. "Incorporating Covariates into Measures of Surrogate Paradox Risk," Stats, MDPI, vol. 6(1), pages 1-23, February.
    10. A. J. Scallan, 1999. "Regression modelling of interval-censored failure time data using the Weibull distribution," Journal of Applied Statistics, Taylor & Francis Journals, vol. 26(5), pages 613-618.
    11. Trond Petersen, 1986. "Estimating Fully Parametric Hazard Rate Models with Time-Dependent Covariates," Sociological Methods & Research, , vol. 14(3), pages 219-246, February.
    12. Xue Yuan & Wang Jinjuan & Ding Juan & Zhang Sanguo & Li Qizhai, 2019. "A powerful test for ordinal trait genetic association analysis," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 18(2), pages 1-9, April.
    13. Sebastien J.‐P. A. Haneuse & And Jonathan C. Wakefield, 2008. "The combination of ecological and case–control data," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 70(1), pages 73-93, February.
    14. Donglin Zeng & Qingxia Chen, 2010. "Adjustment for Missingness Using Auxiliary Information in Semiparametric Regression," Biometrics, The International Biometric Society, vol. 66(1), pages 115-122, March.
    15. Göran Kauermann & Mehboob Ali, 2021. "Semi-parametric regression when some (expensive) covariates are missing by design," Statistical Papers, Springer, vol. 62(4), pages 1675-1696, August.
    16. Yuichi Hirose, 2011. "Efficiency of profile likelihood in semi-parametric models," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 63(6), pages 1247-1275, December.
    17. James Y. Dai & Michael LeBlanc & Charles Kooperberg, 2009. "Semiparametric Estimation Exploiting Covariate Independence in Two-Phase Randomized Trials," Biometrics, The International Biometric Society, vol. 65(1), pages 178-187, March.
    18. S. Haneuse & J. Chen, 2011. "A Multiphase Design Strategy for Dealing with Participation Bias," Biometrics, The International Biometric Society, vol. 67(1), pages 309-318, March.
    19. Leilei Zeng & Richard J. Cook & Theodore E. Warkentin, 2010. "Regression Analysis with a Misclassified Covariate from a Current Status Observation Scheme," Biometrics, The International Biometric Society, vol. 66(2), pages 415-425, June.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:boc:osug04:6. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Christopher F Baum (email available below). General contact details of provider: https://edirc.repec.org/data/stataea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.