IDEAS home Printed from https://ideas.repec.org/a/jss/jstsof/v046i11.html
   My bibliography  Save this article

Fast R Functions for Robust Correlations and Hierarchical Clustering

Author

Listed:
  • Langfelder, Peter
  • Horvath, Steve

Abstract

Many high-throughput biological data analyses require the calculation of large correlation matrices and/or clustering of a large number of objects. The standard R function for calculating Pearson correlation can handle calculations without missing values efficiently, but is inefficient when applied to data sets with a relatively small number of missing data. We present an implementation of Pearson correlation calculation that can lead to substantial speedup on data with relatively small number of missing entries. Further, we parallelize all calculations and thus achieve further speedup on systems where parallel processing is available. A robust correlation measure, the biweight midcorrelation, is implemented in a similar manner and provides comparable speed. The functions cor and bicor for fast Pearson and biweight midcorrelation, respectively, are part of the updated, freely available R package WGCNA. The hierarchical clustering algorithm implemented in R function hclust is an order n3 (n is the number of clustered objects) version of a publicly available clustering algorithm (Murtagh 2012). We present the package flashClust that implements the original algorithm which in practice achieves order approximately n2, leading to substantial time savings when clustering large data sets.

Suggested Citation

  • Langfelder, Peter & Horvath, Steve, 2012. "Fast R Functions for Robust Correlations and Hierarchical Clustering," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 46(i11).
  • Handle: RePEc:jss:jstsof:v:046:i11
    DOI: http://hdl.handle.net/10.18637/jss.v046.i11
    as

    Download full text from publisher

    File URL: https://www.jstatsoft.org/index.php/jss/article/view/v046i11/v46i11.pdf
    Download Restriction: no

    File URL: https://www.jstatsoft.org/index.php/jss/article/downloadSuppFile/v046i11/flashClust_1.01-1.tar.gz
    Download Restriction: no

    File URL: https://www.jstatsoft.org/index.php/jss/article/downloadSuppFile/v046i11/WGCNA_1.19.tar.gz
    Download Restriction: no

    File URL: https://www.jstatsoft.org/index.php/jss/article/downloadSuppFile/v046i11/v46i11.R
    Download Restriction: no

    File URL: https://www.jstatsoft.org/index.php/jss/article/downloadSuppFile/v046i11/v46i11-replication.zip
    Download Restriction: no

    File URL: https://libkey.io/http://hdl.handle.net/10.18637/jss.v046.i11?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Kandt, Jens & Leak, Alistair, 2019. "Examining inclusive mobility through smartcard data: What shall we make of senior citizens' declining bus patronage in the West Midlands?," Journal of Transport Geography, Elsevier, vol. 79(C), pages 1-1.
    2. Tingting Bo & Jie Li & Ganlu Hu & Ge Zhang & Wei Wang & Qian Lv & Shaoling Zhao & Junjie Ma & Meng Qin & Xiaohui Yao & Meiyun Wang & Guang-Zhong Wang & Zheng Wang, 2023. "Brain-wide and cell-specific transcriptomic insights into MRI-derived cortical morphology in macaque monkeys," Nature Communications, Nature, vol. 14(1), pages 1-15, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:jss:jstsof:v:046:i11. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Christopher F. Baum (email available below). General contact details of provider: http://www.jstatsoft.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.