IDEAS home Printed from https://ideas.repec.org/p/crs/wpaper/2017-72.html
   My bibliography  Save this paper

Nonparametric imputation by data depth

Author

Listed:
  • Pavlo Mozharovskyi

    (CREST; ENSAI; Université Bretagne Loire)

  • Julie Josse

    (CMAP; Ecole polytechnique)

  • François Husson

    (IRMAR; Applied Mathematics Unit; Agrocampus Ouest)

Abstract

The presented methodology for single imputation of missing values borrows the idea from data depth — a measure of centrality defined for an arbitrary point of the space with respect to a probability distribution or a data cloud. This consists in iterative maximization of the depth of each observation with missing values, and can be employed with any properly defined statistical depth function. On each single iteration, imputation is narrowed down to optimization of quadratic, linear, or quasiconcave function being solved analytically, by linear programming, or the Nelder-Mead method, respectively. Being able to grasp the underlying data topology, the procedure is distribution free, allows to impute close to the data, preserves prediction possibilities different to local imputation methods (k-nearest neighbors, random forest), and has attractive robustness and asymptotic properties under elliptical symmetry. It is shown that its particular case — when using Mahalanobis depth — has direct connection to well known treatments for multivariate normal model, such as iterated regression or regularized PCA. The methodology is extended to the multiple imputation for data stemming from an elliptically symmetric distribution. Simulation and real data studies positively contrast the procedure with existing popular alternatives. The method has been implemented as an R-package.

Suggested Citation

  • Pavlo Mozharovskyi & Julie Josse & François Husson, 2017. "Nonparametric imputation by data depth," Working Papers 2017-72, Center for Research in Economics and Statistics.
  • Handle: RePEc:crs:wpaper:2017-72
    as

    Download full text from publisher

    File URL: http://crest.science/RePEc/wpstorage/2017-72.pdf
    File Function: CREST working paper version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Pavel Bazovkin & Karl Mosler, 2015. "A general solution for robust linear programs with distortion risk constraints," Annals of Operations Research, Springer, vol. 229(1), pages 103-120, June.
    2. Marc Hallin & Davy Paindaveine & Miroslav Siman, 2008. "Multivariate quantiles and multiple-output regression quantiles: from L1 optimization to halfspace depth," Working Papers ECARES 2008_042, ULB -- Universite Libre de Bruxelles.
    3. Templ, Matthias & Kowarik, Alexander & Filzmoser, Peter, 2011. "Iterative stepwise regression imputation using standard and robust methods," Computational Statistics & Data Analysis, Elsevier, vol. 55(10), pages 2793-2806, October.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Mia Hubert & Peter Rousseeuw & Pieter Segaert, 2015. "Multivariate functional outlier detection," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 24(2), pages 177-202, July.
    2. Nikola Štefelová & Andreas Alfons & Javier Palarea-Albaladejo & Peter Filzmoser & Karel Hron, 2021. "Robust regression with compositional covariates including cellwise outliers," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 15(4), pages 869-909, December.
    3. Hemant Kulkarni & Jayabrata Biswas & Kiranmoy Das, 2019. "A joint quantile regression model for multiple longitudinal outcomes," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 103(4), pages 453-473, December.
    4. María Edo & Walter Sosa Escudero & Marcela Svarc, 2021. "A multidimensional approach to measuring the middle class," The Journal of Economic Inequality, Springer;Society for the Study of Economic Inequality, vol. 19(1), pages 139-162, March.
    5. Paola Stolfi & Mauro Bernardi & Lea Petrella, 2016. "Multivariate Method Of Simulated Quantiles," Departmental Working Papers of Economics - University 'Roma Tre' 0212, Department of Economics - University Roma Tre.
    6. Victor Chernozhukov & Alfred Galichon & Marc Hallin & Marc Henry, 2014. "Monge-Kantorovich Depth, Quantiles, Ranks, and Signs," Papers 1412.8434, arXiv.org, revised Sep 2015.
    7. Gerko Vink & Laurence E. Frank & Jeroen Pannekoek & Stef Buuren, 2014. "Predictive mean matching imputation of semicontinuous variables," Statistica Neerlandica, Netherlands Society for Statistics and Operations Research, vol. 68(1), pages 61-90, February.
    8. Victor Chernozhukov & Alfred Galichon & Marc Hallin & Marc Henry, 2014. "Monge-Kantorovich Depth, Quantiles, Ranks, and Signs," Papers 1412.8434, arXiv.org, revised Sep 2015.
    9. Sarno, Lucio & Schneider, Paul & Wagner, Christian, 2012. "Properties of foreign exchange risk premiums," Journal of Financial Economics, Elsevier, vol. 105(2), pages 279-310.
    10. Daouia, Abdelaati & Paindaveine, Davy, 2019. "Multivariate Expectiles, Expectile Depth and Multiple-Output Expectile Regression," TSE Working Papers 19-1022, Toulouse School of Economics (TSE), revised Feb 2023.
    11. Dyckerhoff, Rainer & Mozharovskyi, Pavlo, 2016. "Exact computation of the halfspace depth," Computational Statistics & Data Analysis, Elsevier, vol. 98(C), pages 19-30.
    12. L. Jeff Hong & Zhiyuan Huang & Henry Lam, 2021. "Learning-Based Robust Optimization: Procedures and Statistical Guarantees," Management Science, INFORMS, vol. 67(6), pages 3447-3467, June.
    13. Hallin, Marc & Šiman, Miroslav, 2016. "Elliptical multiple-output quantile regression and convex optimization," Statistics & Probability Letters, Elsevier, vol. 109(C), pages 232-237.
    14. Arthur Stepchenko & Jurij Chizhov & Ludmila Aleksejeva, 2018. "Transfer of the data preprocessing parameters and fore- casting models," Journal of Advances in Technology and Engineering Research, A/Professor Akbar A. Khatibi, vol. 4(6), pages 214-221.
    15. Yi He & John H. J. Einmahl, 2017. "Estimation of extreme depth-based quantile regions," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 79(2), pages 449-461, March.
    16. Feng, Sanying & Lian, Heng & Zhu, Fukang, 2016. "Reduced rank regression with possibly non-smooth criterion functions: An empirical likelihood approach," Computational Statistics & Data Analysis, Elsevier, vol. 103(C), pages 139-150.
    17. Dette, Holger & Hoderlein, Stefan & Neumeyer, Natalie, 2016. "Testing multivariate economic restrictions using quantiles: The example of Slutsky negative semidefiniteness," Journal of Econometrics, Elsevier, vol. 191(1), pages 129-144.
    18. Bazovkin, Pavel, 2014. "Geometrical framework for robust portfolio optimization," Discussion Papers in Econometrics and Statistics 01/14, University of Cologne, Institute of Econometrics and Statistics.
    19. Einmahl, J.H.J. & Li, Jun & Liu, Regina, 2015. "Bridging Centrality and Extremity : Refining Empirical Data Depth using Extreme Value Statistics," Discussion Paper 2015-020, Tilburg University, Center for Economic Research.
    20. Xiaohui Liu & Karl Mosler & Pavlo Mozharovskyi, 2017. "Fast computation of Tukey trimmed regions and median in dimension p > 2," Working Papers 2017-71, Center for Research in Economics and Statistics.

    More about this item

    Keywords

    Elliptical symmetry; Outliers; Tukey depth; Zonoid depth; Nonparametric imputation; Convex optimization;
    All these keywords.

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:crs:wpaper:2017-72. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Secretariat General (email available below). General contact details of provider: https://edirc.repec.org/data/crestfr.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.