This file is part of IDEAS, which uses RePEc data


[ Papers | Articles | Software | Books | Chapters | Authors | Institutions | JEL Classification | NEP reports | Search | New papers by email | Author registration | Rankings | Volunteers | FAQ | Blog | Help! ]

The effect of missing data on covariates in survival analysis

Author info | Abstract | Publisher info | Download info | Related research | Statistics
Author Info
Irit Aitkin (Department of Psychology, University of Melbourne)
Abstract

We deal with this problem in the context of survival analysis with missing data on covariates. More specifically, we examine the factors affecting the duration of breastfeeding in Western Australia. Duration was studied in 556 women delivering at two maternity hospitals in Perth, Australia. The study was carried out over the period September 1992 to April 1993. 466 women breastfed when they left the hospital. In a previous analysis, the Cox proportional hazards model was fitted to determine the factors affecting duration of breastfeeding. However, because of missing data, a covariate known to be important, smoking, could not be used as it would have resulted in a loss of almost 50% of the available sample. In this analysis, we incorporate the incomplete data on smoking omitted from the previous analysis. We deal with the missing data on covariates in survival analysis in two ways--the first is by maximum likelihood and the second by multiple imputation. Direct maximization of the likelihood with missing data is complicated, and most methods that perform maximum likelihood estimation (for example, the EM algorithm) use some form of data augmentation, which augments the observed data with latent (unobserved) data, so that very complicated calculations are replaced by much simpler ones given the "complete data". The distribution of response time for cases with smoking missing is no longer a Cox model but a mixture of two such models, in proportions given by the population proportions of smokers and non-smokers. The likelihood function is therefore different for complete and incomplete cases, and so maximizing it is more complicated in having to allow for this difference. We carried out the ML analysis in Stata using GLLAMM (Generalized Linear Latent And Mixed Models) routines (Rabe-Hesketh, Pickles, and Skrondal 2001). In the GLLAMM procedure, a latent smoking variable is defined for the cases with smoking missing, and the breastfeeding durations are regressed on the explanatory variables and smoking--the covariate when it is observed and the latent variable when not. The model for the smoking covariate is a "measurement model" when the covariate is observed and a "structural model" when it is not. We compared ML using GLLAMM with multiple imputation using the program written by J.L Schafer mainly for S-Plus/R. It is based on the data augmentation algorithm (Tanner and Wong 1987).

Download Info
To download:

If you experience problems downloading a file, check if you have the proper application to view it first. Information about this may be contained in the File-Format links below. In case of further problems read the IDEAS help page. Note that these files are not on the IDEAS site. Please be patient as the files may be large.

File URL: http://www.mas.ncl.ac.uk/~nia3/sug_anz1.pdf
Our checks indicate that this address may not be valid because: 404 Not Found. If this is indeed the case, please notify (Christopher F Baum)
File Format: application/pdf
File Function:
Download Restriction: no

Publisher Info
Paper provided by Stata Users Group in its series Australasian Stata Users' Group Meetings 2004 with number 6.

Download reference. The following formats are available: HTML (with abstract), plain text (with abstract), BibTeX, RIS (EndNote, RefMan, ProCite), ReDIF
Length:
Date of creation:
Date of revision:
Handle: RePEc:boc:osug04:6

Contact details of provider:
Postal: Administration Building, 140 Commonwealth Avenue, Chestnut Hill MA 02467
Phone: 617-552-3670
Fax: 617-552-2308
Email:
Web page: http://www.stata.com/meeting/1australia
More information through EDIRC

For technical questions regarding this item, or to correct its listing, contact: (Christopher F Baum).

Related research
Keywords:

This paper has been announced in the following NEP Reports:

References listed on IDEAS
Please report citation or reference errors to , or , if you are the registered author of the cited work, log in to your RePEc Author Service profile, click on "citations" and make appropriate adjustments.:
  1. J. F. Lawless & J. D. Kalbfleisch & C. J. Wild, 1999. "Semiparametric methods for response-selective and missing data problems in regression," Journal Of The Royal Statistical Society Series B, Royal Statistical Society, vol. 61(2), pages 413-438. [Downloadable!] (restricted)
Full references

Statistics
Access and download statistics

Did you know? About 1000 journals are listed on RePEc.

This page was last updated on 2009-10-23.


This information is provided to you by IDEAS at the Department of Economics, College of Liberal Arts and Sciences, University of Connecticut using RePEc data on a server sponsored by the Society for Economic Dynamics.