IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1006105.html
   My bibliography  Save this article

A loop-counting method for covariate-corrected low-rank biclustering of gene-expression and genome-wide association study data

Author

Listed:
  • Aaditya V Rangan
  • Caroline C McGrouther
  • John Kelsoe
  • Nicholas Schork
  • Eli Stahl
  • Qian Zhu
  • Arjun Krishnan
  • Vicky Yao
  • Olga Troyanskaya
  • Seda Bilaloglu
  • Preeti Raghavan
  • Sarah Bergen
  • Anders Jureus
  • Mikael Landen
  • Bipolar Disorders Working Group of the Psychiatric Genomics Consortium

Abstract

A common goal in data-analysis is to sift through a large data-matrix and detect any significant submatrices (i.e., biclusters) that have a low numerical rank. We present a simple algorithm for tackling this biclustering problem. Our algorithm accumulates information about 2-by-2 submatrices (i.e., ‘loops’) within the data-matrix, and focuses on rows and columns of the data-matrix that participate in an abundance of low-rank loops. We demonstrate, through analysis and numerical-experiments, that this loop-counting method performs well in a variety of scenarios, outperforming simple spectral methods in many situations of interest. Another important feature of our method is that it can easily be modified to account for aspects of experimental design which commonly arise in practice. For example, our algorithm can be modified to correct for controls, categorical- and continuous-covariates, as well as sparsity within the data. We demonstrate these practical features with two examples; the first drawn from gene-expression analysis and the second drawn from a much larger genome-wide-association-study (GWAS).Author summary: An important problem in genomics is how to detect the genetic signatures associated with disease. When a disease is caused by a single well-defined biological mechanism, the genetic signature often involves a handful of genes present across the majority of the diseased patients. On the other hand, when a disease is complicated or poorly understood there may be many possible biological mechanisms at play. Any genetic signatures associated with such a disease may involve multiple genes, and each signature might only manifest across a subset of the diseased patients. Finding the signatures responsible for this heterogeneity requires searching through subsets of genes and subsets of patients—a problem referred to as ‘biclustering’. In this paper we present a new biclustering method which can scale up efficiently to handle large genomic data sets, such as GWAS-data. Our method is quite accurate, outperforming current ‘spectral’ biclustering methods for many problems of interest. Perhaps most importantly, our method can be corrected for many features of experimental design, such as controls, covariates and sparsity—all of which are especially important when analyzing real data sets.

Suggested Citation

  • Aaditya V Rangan & Caroline C McGrouther & John Kelsoe & Nicholas Schork & Eli Stahl & Qian Zhu & Arjun Krishnan & Vicky Yao & Olga Troyanskaya & Seda Bilaloglu & Preeti Raghavan & Sarah Bergen & Ande, 2018. "A loop-counting method for covariate-corrected low-rank biclustering of gene-expression and genome-wide association study data," PLOS Computational Biology, Public Library of Science, vol. 14(5), pages 1-29, May.
  • Handle: RePEc:plo:pcbi00:1006105
    DOI: 10.1371/journal.pcbi.1006105
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1006105
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1006105&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1006105?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Peeters, M.J.P., 2003. "The maximum edge biclique problem is NP-complete," Other publications TiSEM 3e340431-37b3-4bc5-9b14-9, Tilburg University, School of Economics and Management.
    2. Mihee Lee & Haipeng Shen & Jianhua Z. Huang & J. S. Marron, 2010. "Biclustering via Sparse Singular Value Decomposition," Biometrics, The International Biometric Society, vol. 66(4), pages 1087-1095, December.
    3. Chuan Gao & Ian C McDowell & Shiwen Zhao & Christopher D Brown & Barbara E Engelhardt, 2016. "Context Specific and Differential Gene Co-expression Networks via Bayesian Biclustering," PLOS Computational Biology, Public Library of Science, vol. 12(7), pages 1-39, July.
    4. Turner, Heather & Bailey, Trevor & Krzanowski, Wojtek, 2005. "Improved biclustering of microarray data demonstrated through systematic performance tests," Computational Statistics & Data Analysis, Elsevier, vol. 48(2), pages 235-254, February.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Eric C. Chi & Genevera I. Allen & Richard G. Baraniuk, 2017. "Convex biclustering," Biometrics, The International Biometric Society, vol. 73(1), pages 10-19, March.
    2. Xu Gao & Weining Shen & Liwen Zhang & Jianhua Hu & Norbert J. Fortin & Ron D. Frostig & Hernando Ombao, 2021. "Regularized matrix data clustering and its application to image analysis," Biometrics, The International Biometric Society, vol. 77(3), pages 890-902, September.
    3. Sumit Kunnumkal & Kalyan Talluri, 2019. "Choice Network Revenue Management Based on New Tractable Approximations," Transportation Science, INFORMS, vol. 53(6), pages 1591-1608, November.
    4. Alejandra Casado & Sergio Pérez-Peló & Jesús Sánchez-Oro & Abraham Duarte, 2022. "A GRASP algorithm with Tabu Search improvement for solving the maximum intersection of k-subsets problem," Journal of Heuristics, Springer, vol. 28(1), pages 121-146, February.
    5. GILLIS, Nicolas & GLINEUR, François, 2010. "On the geometric interpretation of the nonnegative rank," LIDAM Discussion Papers CORE 2010051, Université catholique de Louvain, Center for Operations Research and Econometrics (CORE).
    6. Hongtu Zhu & Dan Shen & Xuewei Peng & Leo Yufeng Liu, 2017. "MWPCR: Multiscale Weighted Principal Component Regression for High-Dimensional Prediction," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 112(519), pages 1009-1021, July.
    7. Hu, Jianhua & Liu, Xiaoqian & Liu, Xu & Xia, Ningning, 2022. "Some aspects of response variable selection and estimation in multivariate linear regression," Journal of Multivariate Analysis, Elsevier, vol. 188(C).
    8. Jie Chen & Joe Suzuki, 2021. "An Efficient Algorithm for Convex Biclustering," Mathematics, MDPI, vol. 9(23), pages 1-18, November.
    9. Tom Wilderjans & Dirk Depril & Iven Van Mechelen, 2013. "Additive Biclustering: A Comparison of One New and Two Existing ALS Algorithms," Journal of Classification, Springer;The Classification Society, vol. 30(1), pages 56-74, April.
    10. Chakraborty, Saptarshi & Das, Swagatam, 2021. "On uniform concentration bounds for Bi-clustering by using the Vapnik–Chervonenkis theory," Statistics & Probability Letters, Elsevier, vol. 175(C).
    11. Kun Chen & Kung-Sik Chan & Nils Chr. Stenseth, 2014. "Source-Sink Reconstruction Through Regularized Multicomponent Regression Analysis-With Application to Assessing Whether North Sea Cod Larvae Contributed to Local Fjord Cod in Skagerrak," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 109(506), pages 560-573, June.
    12. Shen, Dan & Shen, Haipeng & Marron, J.S., 2013. "Consistency of sparse PCA in High Dimension, Low Sample Size contexts," Journal of Multivariate Analysis, Elsevier, vol. 115(C), pages 317-333.
    13. Pi, J. & Wang, Honggang & Pardalos, Panos M., 2021. "A dual reformulation and solution framework for regularized convex clustering problems," European Journal of Operational Research, Elsevier, vol. 290(3), pages 844-856.
    14. Derval, Guillaume & Schaus, Pierre, 2022. "Maximal-Sum submatrix search using a hybrid contraint programming/linear programming approach," European Journal of Operational Research, Elsevier, vol. 297(3), pages 853-865.
    15. GILLIS, Nicolas & GLINEUR, François, 2008. "Nonnegative factorization and the maximum edge biclique problem," LIDAM Discussion Papers CORE 2008064, Université catholique de Louvain, Center for Operations Research and Econometrics (CORE).
    16. Sumit Kunnumkal & Kalyan Talluri, 2012. "A New Compact Linear Programming Formulation for Choice Network Revenue Management," Working Papers 677, Barcelona School of Economics.
    17. Sumit Kunnumkal & Kalyan Talluri, 2012. "A new compact linear programming formulation for choice network revenue management," Economics Working Papers 1349, Department of Economics and Business, Universitat Pompeu Fabra.
    18. Jian Zhang, 2010. "A Bayesian model for biclustering with applications," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 59(4), pages 635-656, August.
    19. Hung-Chia Chen & Wen Zou & Tzu-Pin Lu & James J Chen, 2014. "A Composite Model for Subgroup Identification and Prediction via Bicluster Analysis," PLOS ONE, Public Library of Science, vol. 9(10), pages 1-14, October.
    20. Nicolas Gillis & François Glineur, 2014. "A continuous characterization of the maximum-edge biclique problem," Journal of Global Optimization, Springer, vol. 58(3), pages 439-464, March.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1006105. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.