rCOSA: A Software Package for Clustering Objects on Subsets of Attributes

My bibliography Save this article

rCOSA: A Software Package for Clustering Objects on Subsets of Attributes

Author

Listed:

Maarten M. Kampert
(Leiden University)
Jacqueline J. Meulman
(Leiden University
Stanford University)
Jerome H. Friedman
(Stanford University)

Registered:

Abstract

rCOSA is a software package interfaced to the R language. It implements statistical techniques for clustering objects on subsets of attributes in multivariate data. The main output of COSA is a dissimilarity matrix that one can subsequently analyze with a variety of proximity analysis methods. Our package extends the original COSA software (Friedman and Meulman, 2004) by adding functions for hierarchical clustering methods, least squares multidimensional scaling, partitional clustering, and data visualization. In the many publications that cite the COSA paper by Friedman and Meulman (2004), the COSA program is actually used only a small number of times. This can be attributed to the fact that this original implementation is not very easy to install and use. Moreover, the available software is out-of-date. Here, we introduce an up-to-date software package and a clear guidance for this advanced technique. The software package and related links are available for free at: https://github.com/mkampert/rCOSA .

Suggested Citation

Maarten M. Kampert & Jacqueline J. Meulman & Jerome H. Friedman, 2017. "rCOSA: A Software Package for Clustering Objects on Subsets of Attributes," Journal of Classification, Springer;The Classification Society, vol. 34(3), pages 514-547, October.

Handle: RePEc:spr:jclass:v:34:y:2017:i:3:d:10.1007_s00357-017-9240-z
DOI: 10.1007/s00357-017-9240-z

Download full text from publisher

As the access to this document is restricted, you may want to search for a different version of it.

References listed on IDEAS

Geert Soete & Wayne DeSarbo & J. Carroll, 1985. "Optimal variable weighting for hierarchical clustering: An alternating least-squares algorithm," Journal of Classification, Springer;The Classification Society, vol. 2(1), pages 173-192, December.
Gale Young & A. Householder, 1938. "Discussion of a set of points in terms of their mutual distances," Psychometrika, Springer;The Psychometric Society, vol. 3(1), pages 19-22, March.
Warren Torgerson, 1952. "Multidimensional scaling: I. Theory and method," Psychometrika, Springer;The Psychometric Society, vol. 17(4), pages 401-419, December.
Jacqueline Meulman, 1992. "The integration of multidimensional scaling and multivariate analysis with optimal transformations," Psychometrika, Springer;The Psychometric Society, vol. 57(4), pages 539-565, December.
Witten, Daniela M. & Tibshirani, Robert, 2010. "A Framework for Feature Selection in Clustering," Journal of the American Statistical Association, American Statistical Association, vol. 105(490), pages 713-726.
Wayne DeSarbo & J. Carroll & Linda Clark & Paul Green, 1984. "Synthesized clustering: A method for amalgamating alternative clustering bases with differential weighting of variables," Psychometrika, Springer;The Psychometric Society, vol. 49(1), pages 57-78, March.
Renato Amorim, 2015. "Feature Relevance in Ward’s Hierarchical Clustering Using the L p Norm," Journal of Classification, Springer;The Classification Society, vol. 32(1), pages 46-62, April.
Douglas Steinley & Michael Brusco, 2008. "Selection of Variables in Cluster Analysis: An Empirical Comparison of Eight Procedures," Psychometrika, Springer;The Psychometric Society, vol. 73(1), pages 125-144, March.
Jeffrey Andrews & Paul McNicholas, 2014. "Variable Selection for Clustering and Classification," Journal of Classification, Springer;The Classification Society, vol. 31(2), pages 136-153, July.
Jerome H. Friedman & Jacqueline J. Meulman, 2004. "Clustering objects on subsets of attributes (with discussion)," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 66(4), pages 815-849, November.

Full references (including those not matched with items on IDEAS)

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Malone, Samuel W. & Tarazaga, Pablo & Trosset, Michael W., 2002. "Better initial configurations for metric multidimensional scaling," Computational Statistics & Data Analysis, Elsevier, vol. 41(1), pages 143-156, November.
Susan Brudvig & Michael J. Brusco & J. Dennis Cradit, 2019. "Joint selection of variables and clusters: recovering the underlying structure of marketing data," Journal of Marketing Analytics, Palgrave Macmillan, vol. 7(1), pages 1-12, March.
Daniel B. McArtor & Gitta H. Lubke & C. S. Bergeman, 2017. "Extending multivariate distance matrix regression with an effect size measure and the asymptotic null distribution of the test statistic," Psychometrika, Springer;The Psychometric Society, vol. 82(4), pages 1052-1077, December.
J. Fernando Vera & Rodrigo Macías, 2021. "On the Behaviour of K-Means Clustering of a Dissimilarity Matrix by Means of Full Multidimensional Scaling," Psychometrika, Springer;The Psychometric Society, vol. 86(2), pages 489-513, June.
Panpan Yu & Qingna Li, 2018. "Ordinal Distance Metric Learning with MDS for Image Ranking," Asia-Pacific Journal of Operational Research (APJOR), World Scientific Publishing Co. Pte. Ltd., vol. 35(01), pages 1-19, February.
Zha, Hongyuan & Zhang, Zhenyue, 2007. "Continuum Isomap for manifold learnings," Computational Statistics & Data Analysis, Elsevier, vol. 52(1), pages 184-200, September.
Wayne DeSarbo & Richard Oliver & Arvind Rangaswamy, 1989. "A simulated annealing methodology for clusterwise linear regression," Psychometrika, Springer;The Psychometric Society, vol. 54(4), pages 707-736, September.
Dolnicar, Sara & Grün, Bettina & Leisch, Friedrich, 2016. "Increasing sample size compensates for data problems in segmentation studies," Journal of Business Research, Elsevier, vol. 69(2), pages 992-999.
Gao, Jinxin & Hitchcock, David B., 2010. "James-Stein shrinkage to improve k-means cluster analysis," Computational Statistics & Data Analysis, Elsevier, vol. 54(9), pages 2113-2127, September.
Si-Tong Lu & Miao Zhang & Qing-Na Li, 2020. "Feasibility and a fast algorithm for Euclidean distance matrix optimization with ordinal constraints," Computational Optimization and Applications, Springer, vol. 76(2), pages 535-569, June.
Beibei Yuan & Willem Heiser & Mark Rooij, 2019. "The δ-Machine: Classification Based on Distances Towards Prototypes," Journal of Classification, Springer;The Classification Society, vol. 36(3), pages 442-470, October.
Ronglai Shen & Qianxing Mo & Nikolaus Schultz & Venkatraman E Seshan & Adam B Olshen & Jason Huse & Marc Ladanyi & Chris Sander, 2012. "Integrative Subtype Discovery in Glioblastoma Using iCluster," PLOS ONE, Public Library of Science, vol. 7(4), pages 1-9, April.
Arias-Castro, Ery & Pu, Xiao, 2017. "A simple approach to sparse clustering," Computational Statistics & Data Analysis, Elsevier, vol. 105(C), pages 217-228.
Aurea GranÃ© & Rosario Romera, 2018. "On Visualizing Mixed-Type Data," Sociological Methods & Research, , vol. 47(2), pages 207-239, March.
Tsai, Chieh-Yuan & Chiu, Chuang-Cheng, 2008. "Developing a feature weight self-adjustment mechanism for a K-means clustering algorithm," Computational Statistics & Data Analysis, Elsevier, vol. 52(10), pages 4658-4672, June.
Stef Buuren & Willem Heiser, 1989. "Clusteringn objects intok groups under optimal scaling of variables," Psychometrika, Springer;The Psychometric Society, vol. 54(4), pages 699-706, September.
Michael W. Trosset, 2002. "Extensions of Classical Multidimensional Scaling via Variable Reduction," Computational Statistics, Springer, vol. 17(2), pages 147-163, July.
Jacqueline Meulman & Peter Verboon, 1993. "Points of view analysis revisited: Fitting multidimensional structures to optimal distance components with cluster restrictions on the variables," Psychometrika, Springer;The Psychometric Society, vol. 58(1), pages 7-35, March.
Sheng-Shiung Wu & Sing-Jie Jong & Kai Hu & Jiann-Ming Wu, 2021. "Learning Neural Representations and Local Embedding for Nonlinear Dimensionality Reduction Mapping," Mathematics, MDPI, vol. 9(9), pages 1-18, April.
Michael Brusco & J. Cradit, 2001. "A variable-selection heuristic for K-means clustering," Psychometrika, Springer;The Psychometric Society, vol. 66(2), pages 249-270, June.

More about this item

Keywords

Distance-based clustering; Subsets of variables; Feature selection; Targeted clustering; Mixtures of numeric and categorical variables; Clustering in R; Multidimensional scaling; Proximities; Dissimilarities; Omics data;
All these keywords.

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:jclass:v:34:y:2017:i:3:d:10.1007_s00357-017-9240-z. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

rCOSA: A Software Package for Clustering Objects on Subsets of Attributes

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Most related items

More about this item

Keywords

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data