IDEAS home Printed from
MyIDEAS: Log in (now much improved!) to save this article

The cluster graphical lasso for improved estimation of Gaussian graphical models

Listed author(s):
  • Tan, Kean Ming
  • Witten, Daniela
  • Shojaie, Ali
Registered author(s):

    The task of estimating a Gaussian graphical model in the high-dimensional setting is considered. The graphical lasso, which involves maximizing the Gaussian log likelihood subject to a lasso penalty, is a well-studied approach for this task. A surprising connection between the graphical lasso and hierarchical clustering is introduced: the graphical lasso in effect performs a two-step procedure, in which (1) single linkage hierarchical clustering is performed on the variables in order to identify connected components, and then (2) a penalized log likelihood is maximized on the subset of variables within each connected component. Thus, the graphical lasso determines the connected components of the estimated network via single linkage clustering. The single linkage clustering is known to perform poorly in certain finite-sample settings. Therefore, the cluster graphical lasso, which involves clustering the features using an alternative to single linkage clustering, and then performing the graphical lasso on the subset of variables within each cluster, is proposed. Model selection consistency for this technique is established, and its improved performance relative to the graphical lasso is demonstrated in a simulation study, as well as in applications to a university webpage and a gene expression data sets.

    If you experience problems downloading a file, check if you have the proper application to view it first. In case of further problems read the IDEAS help page. Note that these files are not on the IDEAS site. Please be patient as the files may be large.

    File URL:
    Download Restriction: Full text for ScienceDirect subscribers only.

    As the access to this document is restricted, you may want to look for a different version under "Related research" (further below) or search for a different version of it.

    Article provided by Elsevier in its journal Computational Statistics & Data Analysis.

    Volume (Year): 85 (2015)
    Issue (Month): C ()
    Pages: 23-36

    in new window

    Handle: RePEc:eee:csdana:v:85:y:2015:i:c:p:23-36
    DOI: 10.1016/j.csda.2014.11.015
    Contact details of provider: Web page:

    References listed on IDEAS
    Please report citation or reference errors to , or , if you are the registered author of the cited work, log in to your RePEc Author Service profile, click on "citations" and make appropriate adjustments.:

    in new window

    1. Zou, Hui, 2006. "The Adaptive Lasso and Its Oracle Properties," Journal of the American Statistical Association, American Statistical Association, vol. 101, pages 1418-1429, December.
    2. Nicolai Meinshausen & Peter Bühlmann, 2010. "Stability selection," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 72(4), pages 417-473.
    3. Jian Guo & Elizaveta Levina & George Michailidis & Ji Zhu, 2011. "Joint estimation of multiple graphical models," Biometrika, Biometrika Trust, vol. 98(1), pages 1-15.
    4. Robert Tibshirani & Guenther Walther & Trevor Hastie, 2001. "Estimating the number of clusters in a data set via the gap statistic," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 63(2), pages 411-423.
    5. Peng, Jie & Wang, Pei & Zhou, Nengfeng & Zhu, Ji, 2009. "Partial Correlation Estimation by Joint Sparse Regression Models," Journal of the American Statistical Association, American Statistical Association, vol. 104(486), pages 735-746.
    6. Glenn Milligan & Martha Cooper, 1985. "An examination of procedures for determining the number of clusters in a data set," Psychometrika, Springer;The Psychometric Society, vol. 50(2), pages 159-179, June.
    7. Ming Yuan & Yi Lin, 2007. "Model selection and estimation in the Gaussian graphical model," Biometrika, Biometrika Trust, vol. 94(1), pages 19-35.
    8. Lam, Clifford & Fan, Jianqing, 2009. "Sparsistency and rates of convergence in large covariance matrix estimation," LSE Research Online Documents on Economics 31540, London School of Economics and Political Science, LSE Library.
    9. Beatrix Jones & Mike West, 2005. "Covariance decomposition in undirected Gaussian graphical models," Biometrika, Biometrika Trust, vol. 92(4), pages 779-786, December.
    10. Cai, Tony & Liu, Weidong & Luo, Xi, 2011. "A Constrained â„“1 Minimization Approach to Sparse Precision Matrix Estimation," Journal of the American Statistical Association, American Statistical Association, vol. 106(494), pages 594-607.
    Full references (including those not matched with items on IDEAS)

    This item is not listed on Wikipedia, on a reading list or among the top items on IDEAS.

    When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:85:y:2015:i:c:p:23-36. See general information about how to correct material in RePEc.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: (Dana Niculescu)

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If references are entirely missing, you can add them using this form.

    If the full references list an item that is present in RePEc, but the system did not link to it, you can help with this form.

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your profile, as there may be some citations waiting for confirmation.

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    This information is provided to you by IDEAS at the Research Division of the Federal Reserve Bank of St. Louis using RePEc data.