IDEAS home Printed from
   My bibliography  Save this paper

Using pattern mixture modeling to account for informative attrition in the Whitehall II study: A simulation study


  • Catherine Welch

    (Research Department of Epidemiology and Public Health, UCL)

  • Martin Shipley

    (Research Department of Epidemiology and Public Health, UCL)

  • Séverine Sabia

    (INSERM U1018, Centre for Research in Epidemiology and Population Health, Villejuif, France)

  • Eric Brunner

    (Research Department of Epidemiology and Public Health, UCL)

  • Mika Kivim

    (Research Department of Epidemiology and Public Health, UCL)


Attrition is one potential bias that occurs in longitudinal studies when participants drop out and is informative when the reason for attrition is associated with the study outcome. However, this is impossible to check because the data we need to confirm informative attrition are missing. When data are missing at random (MAR), the probability of missingness not being associated with the missing values conditional on the observed data, one appropriate approach for handling missing data is multiple imputation (MI). However, when attrition results in the data being missing not at random (MNAR), the probability of missing data is associated with the values missing, so we cannot use MI directly. An alternative approach is pattern mixture modeling, which specifies the distribution of the observed data, which we know, and the missing data, which we dont know. We can estimate the missing data models, using observations about the data, and average the estimates of the two models using MI. Many longitudinal clinical trials have a monotone missing pattern (once participants drop out, they do not return), which simplifies MI, so use pattern mixture modeling as a sensitivity analysis. However, in observational studies, data are missing because of nonresponses and attrition, which is a more complex setting for handling attrition compared with clinical trials. For this study, we used data from the Whitehall II study. Data were first collected on over 10,000 civil servants in 1985 and data collection phases are repeated every 2-3 years. Participants complete a health and lifestyle questionnaire and, at alternate , odd-numbered phases, attend a screening clinic. Over 30 years, many epidemiological studies used these data. One study investigated how smoking status at baseline (Phase 5) was associated with a 10-year cognitive decline using a mixed model with random intercept and slope. In these analyses, the authors replaced missing values in non-responders with last observed values. However, participants with reduced cognitive function may be unable to continue participation in the Whitehall II study, which may bias the statistical analysis. Using Stata, we will simulate 1,000 datasets with the same distributions and associations as Whitehall II to perform the statistical analysis described above. First, we will develop a MAR missingness mechanism (conditional on previously observed values) and change cognitive function values to missing. Next, for attrition, we will use a MNAR missingness mechanism (conditional on measurements at the same phase). For both MAR and MNAR missingness mechanisms, we will compare the bias and precision from an analysis of simulated datasets without any missing data with a complete case analysis and an analysis of data imputed using MI; additionally, for the MNAR missingness mechanism, we will use pattern mixture modeling. We will use the twofold fully conditional specification (FCS) algorithm to impute missing values for nonresponders and to average estimates when using pattern mixture modeling. The twofold FCS algorithm imputes each phase sequentially conditional on observed information at adjacent phases, so is a suitable approach for imputing missing values in longitudinal data. The user-written package for this approach, twofold, is available on the Statistical Software Components (SSC) archive. We will present the methods used to perform the study and results from these comparisons.

Suggested Citation

  • Catherine Welch & Martin Shipley & Séverine Sabia & Eric Brunner & Mika Kivim, 2016. "Using pattern mixture modeling to account for informative attrition in the Whitehall II study: A simulation study," United Kingdom Stata Users' Group Meetings 2016 11, Stata Users Group.
  • Handle: RePEc:boc:usug16:11

    Download full text from publisher

    File URL:
    Download Restriction: no

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:


    Access and download statistics


    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:boc:usug16:11. See general information about how to correct material in RePEc.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: (Christopher F Baum). General contact details of provider: .

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service hosted by the Research Division of the Federal Reserve Bank of St. Louis . RePEc uses bibliographic data supplied by the respective publishers.