IDEAS home Printed from
   My bibliography  Save this paper

Prediction of Latent Variables in a Mixture of Structural Equation Models, with an Application to the Discrepancy Between Survey and Register Data


  • Erik Meijer
  • Susann Rohwedder
  • Tom Wansbeek


The authors study the prediction of latent variables in a finite mixture of linear structural equation models. The latent variables can be viewed as well-defined variables measured with error or as theoretical constructs that cannot be measured objectively, but for which proxies are observed. The finite mixture component may serve different purposes: it can denote an unobserved segmentation in subpopulations such as market segments, or it can be used as a nonparametric way to estimate an unknown distribution. In the first interpretation, it forms an additional discrete latent variable in an otherwise continuous latent variable model. Different criteria can be employed to derive “optimal” predictors of the latent variables, leading to a taxonomy of possible predictors. The authors derive the theoretical properties of these predictors. Special attention is given to a mixture that includes components with degenerate distributions. They then apply the theory to the optimal estimation of individual earnings when two independent observations are available: one from survey data and one from register data. The discrete components of the model represent observations with or without measurement error, and with either a correct match or a mismatch between the two data sources.

Suggested Citation

  • Erik Meijer & Susann Rohwedder & Tom Wansbeek, 2008. "Prediction of Latent Variables in a Mixture of Structural Equation Models, with an Application to the Discrepancy Between Survey and Register Data," Working Papers 584, RAND Corporation.
  • Handle: RePEc:ran:wpaper:584

    Download full text from publisher

    File URL:
    Download Restriction: no

    References listed on IDEAS

    1. Bound, John & Krueger, Alan B, 1991. "The Extent of Measurement Error in Longitudinal Earnings Data: Do Two Wrongs Make a Right?," Journal of Labor Economics, University of Chicago Press, vol. 9(1), pages 1-24, January.
    2. Xin-Yuan Song & Sik-Yum Lee, 2004. "Local Influence Analysis for Mixture of Structural Equation Models," Journal of Classification, Springer;The Classification Society, pages 111-137.
    3. Schneeweiss, Hans & Cheng, Chi-Lun, 2006. "Bias of the structural quasi-score estimator of a measurement error model under misspecification of the regressor distribution," Journal of Multivariate Analysis, Elsevier, pages 455-473.
    4. Robert F. Phillips, 2003. "Estimation of a Stratified Error-Components Model," International Economic Review, Department of Economics, University of Pennsylvania and Osaka University Institute of Social and Economic Research Association, vol. 44(2), pages 501-521, May.
    5. Jakob de Haan & Erik Leertouwer & Erik Meijer & Tom Wansbeek, 2003. "Measuring central bank independence: a latent variables approach," Scottish Journal of Political Economy, Scottish Economic Society, pages 326-340.
    6. Gerhard Arminger & Petra Stein & Jörg Wittenberg, 1999. "Mixtures of conditional mean- and covariance-structure models," Psychometrika, Springer;The Psychometric Society, pages 475-494.
    7. Sik-Yum Lee & Xin-Yuan Song, 2003. "Maximum Likelihood Estimation and Model Comparison for Mixtures of Structural Equation Models with Ignorable Missing Data," Journal of Classification, Springer;The Classification Society, pages 221-255.
    8. Arie Kapteyn & Jelmer Y. Ypma, 2007. "Measurement Error and Misclassification: A Comparison of Survey and Administrative Data," Journal of Labor Economics, University of Chicago Press, vol. 25, pages 513-551.
    Full references (including those not matched with items on IDEAS)

    More about this item


    factor scores; measurement error; finite mixture; validation study;

    JEL classification:

    • J39 - Labor and Demographic Economics - - Wages, Compensation, and Labor Costs - - - Other
    • C39 - Mathematical and Quantitative Methods - - Multiple or Simultaneous Equation Models; Multiple Variables - - - Other
    • C81 - Mathematical and Quantitative Methods - - Data Collection and Data Estimation Methodology; Computer Programs - - - Methodology for Collecting, Estimating, and Organizing Microeconomic Data; Data Access

    NEP fields

    This paper has been announced in the following NEP Reports:


    Access and download statistics


    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:ran:wpaper:584. See general information about how to correct material in RePEc.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: (Benson Wong). General contact details of provider: .

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your profile, as there may be some citations waiting for confirmation.

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service hosted by the Research Division of the Federal Reserve Bank of St. Louis . RePEc uses bibliographic data supplied by the respective publishers.