IDEAS home Printed from https://ideas.repec.org/a/bpj/sagmbi/v15y2016i3p253-272n2.html
   My bibliography  Save this article

Sparse factor model for co-expression networks with an application using prior biological knowledge

Author

Listed:
  • Blum Yuna

    (Department of Medicine, David Geffen School of Medicine, A2-237 Center for Health Sciences, University of California, 10833 Le Conte Avenue, Los Angeles, CA 90095-1679, USA)

  • Houée-Bigot Magalie
  • Causeur David

    (Agrocampus Ouest, IRMAR, UMR 6625 CNRS, 65 rue de St-Brieuc CS84215, 35042 Rennes Cedex, France)

Abstract

Inference on gene regulatory networks from high-throughput expression data turns out to be one of the main current challenges in systems biology. Such networks can be very insightful for the deep understanding of interactions between genes. Because genes-gene interactions is often viewed as joint contributions to known biological mechanisms, inference on the dependence among gene expressions is expected to be consistent to some extent with the functional characterization of genes which can be derived from ontologies (GO, KEGG, …). The present paper introduces a sparse factor model as a general framework either to account for a prior knowledge on joint contributions of modules of genes to latent biological processes or to infer on the corresponding co-expression network. We propose an ℓ1 – regularized EM algorithm to fit a sparse factor model for correlation. We demonstrate how it helps extracting modules of genes and more generally improves the gene clustering performance. The method is compared to alternative estimation procedures for sparse factor models of relevance networks in a simulation study. The integration of a biological knowledge based on the gene ontology (GO) is also illustrated on a liver expression data generated to understand adiposity variability in chicken.

Suggested Citation

  • Blum Yuna & Houée-Bigot Magalie & Causeur David, 2016. "Sparse factor model for co-expression networks with an application using prior biological knowledge," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 15(3), pages 253-272, June.
  • Handle: RePEc:bpj:sagmbi:v:15:y:2016:i:3:p:253-272:n:2
    DOI: 10.1515/sagmb-2015-0002
    as

    Download full text from publisher

    File URL: https://doi.org/10.1515/sagmb-2015-0002
    Download Restriction: For access to full text, subscription to the journal or payment for the individual article is required.

    File URL: https://libkey.io/10.1515/sagmb-2015-0002?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Zhang Bin & Horvath Steve, 2005. "A General Framework for Weighted Gene Co-Expression Network Analysis," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 4(1), pages 1-45, August.
    2. Donald Rubin & Dorothy Thayer, 1982. "EM algorithms for ML factor analysis," Psychometrika, Springer;The Psychometric Society, vol. 47(1), pages 69-76, March.
    3. Schäfer Juliane & Strimmer Korbinian, 2005. "A Shrinkage Approach to Large-Scale Covariance Matrix Estimation and Implications for Functional Genomics," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 4(1), pages 1-32, November.
    4. Friedman, Jerome H. & Hastie, Trevor & Tibshirani, Rob, 2010. "Regularization Paths for Generalized Linear Models via Coordinate Descent," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 33(i01).
    5. Friguet, Chloé & Kloareg, Maela & Causeur, David, 2009. "A Factor Model Approach to Multiple Testing Under Dependence," Journal of the American Statistical Association, American Statistical Association, vol. 104(488), pages 1406-1415.
    6. Jeffrey T Leek & John D Storey, 2007. "Capturing Heterogeneity in Gene Expression Studies by Surrogate Variable Analysis," PLOS Genetics, Public Library of Science, vol. 3(9), pages 1-12, September.
    7. Carvalho, Carlos M. & Chang, Jeffrey & Lucas, Joseph E. & Nevins, Joseph R. & Wang, Quanli & West, Mike, 2008. "High-Dimensional Sparse Factor Modeling: Applications in Gene Expression Genomics," Journal of the American Statistical Association, American Statistical Association, vol. 103(484), pages 1438-1456.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Chuan Gao & Ian C McDowell & Shiwen Zhao & Christopher D Brown & Barbara E Engelhardt, 2016. "Context Specific and Differential Gene Co-expression Networks via Bayesian Biclustering," PLOS Computational Biology, Public Library of Science, vol. 12(7), pages 1-39, July.
    2. Bodnar, Taras & Reiß, Markus, 2016. "Exact and asymptotic tests on a factor model in low and large dimensions with applications," Journal of Multivariate Analysis, Elsevier, vol. 150(C), pages 125-151.
    3. Friguet, Chloé & Causeur, David, 2011. "Estimation of the proportion of true null hypotheses in high-dimensional data under dependence," Computational Statistics & Data Analysis, Elsevier, vol. 55(9), pages 2665-2676, September.
    4. Kei Hirose & Miyuki Imada, 2018. "Sparse factor regression via penalized maximum likelihood estimation," Statistical Papers, Springer, vol. 59(2), pages 633-662, June.
    5. Jianqing Fan & Xu Han, 2017. "Estimation of the false discovery proportion with unknown dependence," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 79(4), pages 1143-1164, September.
    6. Aßmann, Christian & Boysen-Hogrefe, Jens & Pape, Markus, 2012. "The directional identification problem in Bayesian factor analysis: An ex-post approach," Kiel Working Papers 1799, Kiel Institute for the World Economy (IfW Kiel).
    7. repec:jss:jstsof:40:i14 is not listed on IDEAS
    8. Jin, Shaobo & Moustaki, Irini & Yang-Wallentin, Fan, 2018. "Approximated penalized maximum likelihood for exploratory factor analysis: an orthogonal case," LSE Research Online Documents on Economics 88118, London School of Economics and Political Science, LSE Library.
    9. Aßmann, Christian & Boysen-Hogrefe, Jens & Pape, Markus, 2014. "Bayesian analysis of dynamic factor models: An ex-post approach towards the rotation problem," Kiel Working Papers 1902, Kiel Institute for the World Economy (IfW Kiel).
    10. Aderhold Andrej & Husmeier Dirk & Grzegorczyk Marco, 2014. "Statistical inference of regulatory networks for circadian regulation," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 13(3), pages 1-47, June.
    11. Shaobo Jin & Irini Moustaki & Fan Yang-Wallentin, 2018. "Approximated Penalized Maximum Likelihood for Exploratory Factor Analysis: An Orthogonal Case," Psychometrika, Springer;The Psychometric Society, vol. 83(3), pages 628-649, September.
    12. Lingxue Zhang & Seyoung Kim, 2014. "Learning Gene Networks under SNP Perturbations Using eQTL Datasets," PLOS Computational Biology, Public Library of Science, vol. 10(2), pages 1-20, February.
    13. Sahra Uygun & Cheng Peng & Melissa D Lehti-Shiu & Robert L Last & Shin-Han Shiu, 2016. "Utility and Limitations of Using Gene Expression Data to Identify Functional Associations," PLOS Computational Biology, Public Library of Science, vol. 12(12), pages 1-27, December.
    14. Seungchul Baek & Yen‐Yi Ho & Yanyuan Ma, 2020. "Using sufficient direction factor model to analyze latent activities associated with breast cancer survival," Biometrics, The International Biometric Society, vol. 76(4), pages 1340-1350, December.
    15. Xiaodong Cai & Juan Andrés Bazerque & Georgios B Giannakis, 2013. "Inference of Gene Regulatory Networks with Sparse Structural Equation Models Exploiting Genetic Perturbations," PLOS Computational Biology, Public Library of Science, vol. 9(5), pages 1-13, May.
    16. Michimasa Fujiogi & Yoshihiko Raita & Marcos Pérez-Losada & Robert J. Freishtat & Juan C. Celedón & Jonathan M. Mansbach & Pedro A. Piedra & Zhaozhong Zhu & Carlos A. Camargo & Kohei Hasegawa, 2022. "Integrated relationship of nasopharyngeal airway host response and microbiome associates with bronchiolitis severity," Nature Communications, Nature, vol. 13(1), pages 1-12, December.
    17. Leek Jeffrey T & Storey John D., 2011. "The Joint Null Criterion for Multiple Hypothesis Tests," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 10(1), pages 1-22, June.
    18. Siwei Xia & Yuehan Yang & Hu Yang, 2022. "Sparse Laplacian Shrinkage with the Graphical Lasso Estimator for Regression Problems," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 31(1), pages 255-277, March.
    19. Eric P Xing & Ross E Curtis & Georg Schoenherr & Seunghak Lee & Junming Yin & Kriti Puniyani & Wei Wu & Peter Kinnaird, 2014. "GWAS in a Box: Statistical and Visual Analytics of Structured Associations via GenAMap," PLOS ONE, Public Library of Science, vol. 9(6), pages 1-19, June.
    20. Guibert, Quentin & Lopez, Olivier & Piette, Pierrick, 2019. "Forecasting mortality rate improvements with a high-dimensional VAR," Insurance: Mathematics and Economics, Elsevier, vol. 88(C), pages 255-272.
    21. Daniel Bartz & Kerr Hatrick & Christian W. Hesse & Klaus-Robert Muller & Steven Lemm, 2011. "Directional Variance Adjustment: improving covariance estimates for high-dimensional portfolio optimization," Papers 1109.3069, arXiv.org, revised Mar 2012.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bpj:sagmbi:v:15:y:2016:i:3:p:253-272:n:2. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Peter Golla (email available below). General contact details of provider: https://www.degruyter.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.