This file is part of IDEAS, which uses RePEc data


[ Papers | Articles | Software | Books | Chapters | Authors | Institutions | JEL Classification | NEP reports | Search | New papers by email | Author registration | Rankings | Volunteers | FAQ | Blog | Help! ]

Partition clustering of high dimensional low sample size data based on p-values

Author info | Abstract | Publisher info | Download info | Related research | Statistics
Author Info
von Borries, George
Wang, Haiyan
Abstract

Clustering techniques play an important role in analyzing high dimensional data that is common in high-throughput screening such as microarray and mass spectrometry data. Effective use of the high dimensionality and some replications can help to increase clustering accuracy and stability. In this article a new partitioning algorithm with a robust distance measure is introduced to cluster variables in high dimensional low sample size (HDLSS) data that contain a large number of independent variables with a small number of replications per variable. The proposed clustering algorithm, PPCLUST, considers data from a mixture distribution and uses p-values from nonparametric rank tests of homogeneous distribution as a measure of similarity to separate the mixture components. PPCLUST is able to efficiently cluster a large number of variables in the presence of very few replications. Inherited from the robustness of rank procedure, the new algorithm is robust to outliers and invariant to monotone transformations of data. Numerical studies and an application to microarray gene expression data for colorectal cancer study are discussed.

Download Info
To download:

If you experience problems downloading a file, check if you have the proper application to view it first. Information about this may be contained in the File-Format links below. In case of further problems read the IDEAS help page. Note that these files are not on the IDEAS site. Please be patient as the files may be large.

File URL: http://www.sciencedirect.com/science/article/B6V8V-4WM74WT-2/2/a3182692514410ddb270a0bbb121a7d9
File Format:
File Function:
Download Restriction: Full text for ScienceDirect subscribers only

As the access to this document is restricted, you may want to look for a different version under "Related research" (further below) or search for a different version of it.

Publisher Info
Article provided by Elsevier in its journal Computational Statistics & Data Analysis.

Volume (Year): 53 (2009)
Issue (Month): 12 (October)
Pages: 3987-3998
Download reference. The following formats are available: HTML (with abstract), plain text (with abstract), BibTeX, RIS (EndNote, RefMan, ProCite), ReDIF
Handle: RePEc:eee:csdana:v:53:y:2009:i:12:p:3987-3998

Contact details of provider:
Web page: http://www.elsevier.com/locate/csda

For technical questions regarding this item, or to correct its listing, contact: (Heidi Boesdal).

Related research
Keywords:

Statistics
Access and download statistics

Did you know? Want to help out with this project? Look for volunteer opportunities.

This page was last updated on 2009-12-3.


This information is provided to you by IDEAS at the Department of Economics, College of Liberal Arts and Sciences, University of Connecticut using RePEc data on a server sponsored by the Society for Economic Dynamics.