IDEAS home Printed from https://ideas.repec.org/p/mil/wpdepa/2015-04.html
   My bibliography  Save this paper

A Comprehensive Simulation Study on the Forward Imputation

Author

Listed:
  • Nadia SOLARO
  • Alessandro BARBIERO
  • Giancarlo MANZI
  • Pier Alda FERRARI

Abstract

The Nearest Neighbour Imputation (NNI) method has a long history in missing data imputation. Likewise, multivariate dimensional reduction techniques allow for preserving the maximum information from the data. Recently, the combined use of these methodologies has been proposed to solve data imputation problems and exploit as much as information from the complete part of the data. In this paper we perform an extensive simulation study to test the performance of this new imputation approach (called “Forward Imputation” - ForImp). We compare the two ForImp methods developed for missing quantitative data (the first one called ForImpPCA involving the NNI method and the Principal Component Analysis (PCA) as a multivariate data analysis technique, and the second one called ForImpMahalanobis, which involves the Mahalanobis distance for NNI) with other two imputation techniques regarded as benchmark, namely Stekhoven and Bühlmann’s missForest method, which is a nonparametric imputation technique for continuous and/or categorical data based on a random forest, and the Iterative PCA, which is an algorithmic-type technique that imputes missing values simultaneously by an iterative use of PCA. The simulation study is based on constructing simulated data with different levels of kurtosis or skewness and strength of linear relationship of variables, so that the performance of the four methods can be compared on various data patterns. Distributions used for these simulated data belong to the families of Multivariate Exponential Power and Multivariate Skew-Normal distributions, respectively. Results tend to favour ForImpMahalanobis especially in the presence of skew data with small or negative correlations of a same magnitude, or a mix of negative and positive correlations of low level, whereas ForImpPCA works better than it when a slightly higher level of correlations is present in the data.

Suggested Citation

  • Nadia SOLARO & Alessandro BARBIERO & Giancarlo MANZI & Pier Alda FERRARI, 2015. "A Comprehensive Simulation Study on the Forward Imputation," Departmental Working Papers 2015-04, Department of Economics, Management and Quantitative Methods at Università degli Studi di Milano.
  • Handle: RePEc:mil:wpdepa:2015-04
    as

    Download full text from publisher

    File URL: http://wp.demm.unimi.it/files/wp/2015/DEMM-2015_04wp.pdf
    Download Restriction: no

    File URL: http://wp.demm.unimi.it/files/wp/2015/DEMM-2015_04wp_ESM.pdf
    Download Restriction: no
    ---><---

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Nadia Solaro & Alessandro Barbiero & Giancarlo Manzi & Pier Alda Ferrari, 2017. "A sequential distance-based approach for imputing missing data: Forward Imputation," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 11(2), pages 395-414, June.

    More about this item

    Keywords

    Correlation; Data patterns; Kurtosis; Mahalanobis distance; MissForest; Nearest Neighbour Imputation; Principal Component Analysis; Skewness;
    All these keywords.

    JEL classification:

    • C15 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods and Methodology: General - - - Statistical Simulation Methods: General
    • C18 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods and Methodology: General - - - Methodolical Issues: General
    • C38 - Mathematical and Quantitative Methods - - Multiple or Simultaneous Equation Models; Multiple Variables - - - Classification Methdos; Cluster Analysis; Principal Components; Factor Analysis
    • C63 - Mathematical and Quantitative Methods - - Mathematical Methods; Programming Models; Mathematical and Simulation Modeling - - - Computational Techniques

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:mil:wpdepa:2015-04. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: DEMM Working Papers (email available below). General contact details of provider: https://edirc.repec.org/data/damilit.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.