IDEAS home Printed from
   My bibliography  Save this article

Imputation of missing values for compositional data using classical and robust methods


  • Hron, K.
  • Templ, M.
  • Filzmoser, P.


New imputation algorithms for estimating missing values in compositional data are introduced. A first proposal uses the k-nearest neighbor procedure based on the Aitchison distance, a distance measure especially designed for compositional data. It is important to adjust the estimated missing values to the overall size of the compositional parts of the neighbors. As a second proposal an iterative model-based imputation technique is introduced which initially starts from the result of the proposed k-nearest neighbor procedure. The method is based on iterative regressions, thereby accounting for the whole multivariate data information. The regressions have to be performed in a transformed space, and depending on the data quality classical or robust regression techniques can be employed. The proposed methods are tested on a real and on simulated data sets. The results show that the proposed methods outperform standard imputation methods. In the presence of outliers, the model-based method with robust regressions is preferable.

Suggested Citation

  • Hron, K. & Templ, M. & Filzmoser, P., 2010. "Imputation of missing values for compositional data using classical and robust methods," Computational Statistics & Data Analysis, Elsevier, vol. 54(12), pages 3095-3107, December.
  • Handle: RePEc:eee:csdana:v:54:y:2010:i:12:p:3095-3107

    Download full text from publisher

    File URL:
    Download Restriction: Full text for ScienceDirect subscribers only.

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    1. Yucel, Recai M. & Demirtas, Hakan, 2010. "Impact of non-normal random effects on inference by multiple imputation: A simulation assessment," Computational Statistics & Data Analysis, Elsevier, vol. 54(3), pages 790-801, March.
    2. Michael E. Tipping & Christopher M. Bishop, 1999. "Probabilistic Principal Component Analysis," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 61(3), pages 611-622.
    3. Serneels, Sven & Verdonck, Tim, 2008. "Principal component analysis for data containing outliers and missing elements," Computational Statistics & Data Analysis, Elsevier, vol. 52(3), pages 1712-1727, January.
    Full references (including those not matched with items on IDEAS)


    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.

    Cited by:

    1. repec:spr:lsprsc:v:11:y:2018:i:1:d:10.1007_s12076-017-0199-5 is not listed on IDEAS
    2. Matthias Templ & Andreas Alfons & Peter Filzmoser, 2012. "Exploring incomplete data using visualization techniques," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 6(1), pages 29-47, April.
    3. Hazen, Benjamin T. & Overstreet, Robert E. & Jones-Farmer, L. Allison & Field, Hubert S., 2012. "The role of ambiguity tolerance in consumer perception of remanufactured products," International Journal of Production Economics, Elsevier, vol. 135(2), pages 781-790.
    4. Martín-Fernández, J.A. & Hron, K. & Templ, M. & Filzmoser, P. & Palarea-Albaladejo, J., 2012. "Model-based replacement of rounded zeros in compositional data: Classical and robust approaches," Computational Statistics & Data Analysis, Elsevier, vol. 56(9), pages 2688-2704.
    5. Elena Catanese, 2016. "Data Editing for Complex Surveys in Presence Of Administrative Data: An Application to Fss 2013 Livestock Survey Data Based on The Joint Sequential Use Of Different R Packages," Romanian Statistical Review, Romanian Statistical Review, vol. 64(2), pages 101-117, June.
    6. Peter Filzmoser & Karel Hron & Matthias Templ, 2012. "Discriminant analysis for compositional data and robust parameter estimation," Computational Statistics, Springer, vol. 27(4), pages 585-604, December.
    7. Tutz, Gerhard & Ramzan, Shahla, 2015. "Improved methods for the imputation of missing data by nearest neighbor methods," Computational Statistics & Data Analysis, Elsevier, vol. 90(C), pages 84-99.
    8. Templ, Matthias & Kowarik, Alexander & Filzmoser, Peter, 2011. "Iterative stepwise regression imputation using standard and robust methods," Computational Statistics & Data Analysis, Elsevier, vol. 55(10), pages 2793-2806, October.
    9. Garrido-Vega, Pedro & Ortega Jimenez, Cesar H. & de los Ríos, José Luis Díez Pérez & Morita, Michiya, 2015. "Implementation of technology and production strategy practices: Relationship levels in different industries," International Journal of Production Economics, Elsevier, vol. 161(C), pages 201-216.

    More about this item


    Access and download statistics


    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:54:y:2010:i:12:p:3095-3107. See general information about how to correct material in RePEc.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: (Dana Niculescu). General contact details of provider: .

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service hosted by the Research Division of the Federal Reserve Bank of St. Louis . RePEc uses bibliographic data supplied by the respective publishers.