IDEAS home Printed from https://ideas.repec.org/a/spr/compst/v34y2019i1d10.1007_s00180-018-0857-0.html
   My bibliography  Save this article

Assessing variable importance in clustering: a new method based on unsupervised binary decision trees

Author

Listed:
  • Ghattas Badih

    (Aix Marseille Université, CNRS, Centrale Marseille)

  • Michel Pierre

    (Aix Marseille Université, CNRS, Centrale Marseille
    Aix Marseille Université)

  • Boyer Laurent

    (Aix Marseille Université)

Abstract

We consider different approaches for assessing variable importance in clustering. We focus on clustering using binary decision trees (CUBT), which is a non-parametric top-down hierarchical clustering method designed for both continuous and nominal data. We suggest a measure of variable importance for this method similar to the one used in Breiman’s classification and regression trees. This score is useful to rank the variables in a dataset, to determine which variables are the most important or to detect the irrelevant ones. We analyze both stability and efficiency of this score on different data simulation models in the presence of noise, and compare it to other classical variable importance measures. Our experiments show that variable importance based on CUBT is much more efficient than other approaches in a large variety of situations.

Suggested Citation

  • Ghattas Badih & Michel Pierre & Boyer Laurent, 2019. "Assessing variable importance in clustering: a new method based on unsupervised binary decision trees," Computational Statistics, Springer, vol. 34(1), pages 301-321, March.
  • Handle: RePEc:spr:compst:v:34:y:2019:i:1:d:10.1007_s00180-018-0857-0
    DOI: 10.1007/s00180-018-0857-0
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s00180-018-0857-0
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s00180-018-0857-0?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to look for a different version below or search for a different version of it.

    Other versions of this item:

    References listed on IDEAS

    as
    1. R. Darrell Bock, 1972. "Estimating item parameters and latent ability when responses are scored in two or more nominal categories," Psychometrika, Springer;The Psychometric Society, vol. 37(1), pages 29-51, March.
    2. Ricardo Fraiman & Badih Ghattas & Marcela Svarc, 2013. "Interpretable clustering using unsupervised binary trees," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 7(2), pages 125-145, June.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Michelle M. LaMar, 2018. "Markov Decision Process Measurement Model," Psychometrika, Springer;The Psychometric Society, vol. 83(1), pages 67-88, March.
    2. Bas Hemker & Klaas Sijtsma & Ivo Molenaar & Brian Junker, 1996. "Polytomous IRT models and monotone likelihood ratio of the total score," Psychometrika, Springer;The Psychometric Society, vol. 61(4), pages 679-693, December.
    3. Björn Andersson & Tao Xin, 2021. "Estimation of Latent Regression Item Response Theory Models Using a Second-Order Laplace Approximation," Journal of Educational and Behavioral Statistics, , vol. 46(2), pages 244-265, April.
    4. Yang Liu & Jan Hannig & Abhishek Pal Majumder, 2019. "Second-Order Probability Matching Priors for the Person Parameter in Unidimensional IRT Models," Psychometrika, Springer;The Psychometric Society, vol. 84(3), pages 701-718, September.
    5. Hsiao, Cheng & Sun, Bao-Hong, 1998. "Modeling survey response bias - with an analysis of the demand for an advanced electronic device," Journal of Econometrics, Elsevier, vol. 89(1-2), pages 15-39, November.
    6. Golovkine, Steven & Klutchnikoff, Nicolas & Patilea, Valentin, 2022. "Clustering multivariate functional data using unsupervised binary trees," Computational Statistics & Data Analysis, Elsevier, vol. 168(C).
    7. Roderick McDonald, 1986. "Describing the elephant: Structure and function in multivariate data," Psychometrika, Springer;The Psychometric Society, vol. 51(4), pages 513-534, December.
    8. Jouni Kuha & Myrsini Katsikatsou & Irini Moustaki, 2018. "Latent variable modelling with non‐ignorable item non‐response: multigroup response propensity models for cross‐national analysis," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 181(4), pages 1169-1192, October.
    9. Laine Bradshaw & Jonathan Templin, 2014. "Combining Item Response Theory and Diagnostic Classification Models: A Psychometric Model for Scaling Ability and Diagnosing Misconceptions," Psychometrika, Springer;The Psychometric Society, vol. 79(3), pages 403-425, July.
    10. Javier Revuelta, 2004. "Analysis of distractor difficulty in multiple-choice items," Psychometrika, Springer;The Psychometric Society, vol. 69(2), pages 217-234, June.
    11. Peida Zhan & Wen-Chung Wang & Xiaomin Li, 2020. "A Partial Mastery, Higher-Order Latent Structural Model for Polytomous Attributes in Cognitive Diagnostic Assessments," Journal of Classification, Springer;The Classification Society, vol. 37(2), pages 328-351, July.
    12. Gerhard Tutz & Moritz Berger, 2016. "Response Styles in Rating Scales," Journal of Educational and Behavioral Statistics, , vol. 41(3), pages 239-268, June.
    13. Ulf Böckenholt, 2012. "The Cognitive-Miser Response Model: Testing for Intuitive and Deliberate Reasoning," Psychometrika, Springer;The Psychometric Society, vol. 77(2), pages 388-399, April.
    14. Albert Yu & Jeffrey A. Douglas, 2023. "IRT Models for Learning With Item-Specific Learning Parameters," Journal of Educational and Behavioral Statistics, , vol. 48(6), pages 866-888, December.
    15. Jochen Ranger & Kay Brauer, 2022. "On the Generalized S − X 2 –Test of Item Fit: Some Variants, Residuals, and a Graphical Visualization," Journal of Educational and Behavioral Statistics, , vol. 47(2), pages 202-230, April.
    16. Albert Maydeu-Olivares & Harry Joe, 2006. "Limited Information Goodness-of-fit Testing in Multidimensional Contingency Tables," Psychometrika, Springer;The Psychometric Society, vol. 71(4), pages 713-732, December.
    17. César Martinelli & Susan W. Parker & Ana Cristina Pérez-Gea & Rodimiro Rodrigo, 2018. "Cheating and Incentives: Learning from a Policy Experiment," American Economic Journal: Economic Policy, American Economic Association, vol. 10(1), pages 298-325, February.
    18. John Hsu & Tom Leonard & Kam-Wah Tsui, 1991. "Statistical inference for multiple choice tests," Psychometrika, Springer;The Psychometric Society, vol. 56(2), pages 327-348, June.
    19. Brooke E. Magnus & David Thissen, 2017. "Item Response Modeling of Multivariate Count Data With Zero Inflation, Maximum Inflation, and Heaping," Journal of Educational and Behavioral Statistics, , vol. 42(5), pages 531-558, October.
    20. repec:hal:journl:hal-03533356 is not listed on IDEAS
    21. Zachary F. Fisher & Kenneth A. Bollen, 2020. "An Instrumental Variable Estimator for Mixed Indicators: Analytic Derivatives and Alternative Parameterizations," Psychometrika, Springer;The Psychometric Society, vol. 85(3), pages 660-683, September.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:compst:v:34:y:2019:i:1:d:10.1007_s00180-018-0857-0. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.