Testing the performance of the two fold FCS algorithm for multiple imputation of longitudinal clinical records
AbstractMultiple imputation is increasingly regarded as the standard method to account for partially observed data, but most methods have been based on cross-sectional imputation algorithms. Recently, a new multiple-imputation method, the two fold fully conditional specification (FCS) method, was developed to impute missing data in longitudinal datasets with nonmonotone missing data. (See Nevalainen J., Kenward M.G., and Virtanen S.M. 2009. Missing values in longitudinal dietary data: A multiple imputation approach based on a fully conditional specification. Statistics in Medicine 28: 3657-3669.) This method imputes missing data at a given time point based on measurements recorded at the previous and next time points. Up to now, the method has only been tested on a relatively small dataset and under very specific conditions. We have implemented the two fold FCS algorithm in Stata, and in this study we further challenge and evaluate the performance of the algorithm under different scenarios. In simulation studies, we generated 1,000 datasets, which were similar in structure to the longitudinal clinical records (The Health Improvement Network primary care database) to which we will apply the two fold FCS algorithm. Initially, these generated datasets included complete records. We then introduced different levels and patterns of partially observed data patterns and applied the algorithm to generate multiply imputed datasets. The results of our initial multiple imputations demonstrated that the algorithm provided acceptable results when using a linear substantive model and data were imputed over a limited time period for continuous variables such as weight and blood pressure. Introducing an exponential substantive model introduced some bias, but estimates were still within acceptable ranges. We will present results for simulation studies that include situations where categorical and continuous variables change over a 10-year period (for example, smokers become ex-smokers, weight increases or decreases) and large proportions of data are unobserved. We also explore how the algorithm deals with interactions and whether it has any impact on the final data distribution--whether the algorithm is initiated to run forward or backward in time.
Download InfoIf you experience problems downloading a file, check if you have the proper application to view it first. In case of further problems read the IDEAS help page. Note that these files are not on the IDEAS site. Please be patient as the files may be large.
Bibliographic InfoPaper provided by Stata Users Group in its series United Kingdom Stata Users' Group Meetings 2011 with number 11.
Date of creation: 26 Sep 2011
Date of revision:
This paper has been announced in the following NEP Reports:
You can help add them by filling out this form.
reading list or among the top items on IDEAS.Access and download statisticsgeneral information about how to correct material in RePEc.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: (Christopher F Baum).
If references are entirely missing, you can add them using this form.