IDEAS home Printed from https://ideas.repec.org/a/plo/pbio00/3001723.html
   My bibliography  Save this article

Multivariate phenotype analysis enables genome-wide inference of mammalian gene function

Author

Listed:
  • George Nicholson
  • Hugh Morgan
  • Habib Ganjgahi
  • Steve D M Brown
  • Ann-Marie Mallon
  • Chris Holmes

Abstract

The function of the majority of genes in the human and mouse genomes is unknown. Investigating and illuminating this dark genome is a major challenge for the biomedical sciences. The International Mouse Phenotyping Consortium (IMPC) is addressing this through the generation and broad-based phenotyping of a knockout (KO) mouse line for every protein-coding gene, producing a multidimensional data set that underlies a genome-wide annotation map from genes to phenotypes. Here, we develop a multivariate (MV) statistical approach and apply it to IMPC data comprising 148 phenotypes measured across 4,548 KO lines.There are 4,256 (1.4% of 302,997 observed data measurements) hits called by the univariate (UV) model analysing each phenotype separately, compared to 31,843 (10.5%) hits in the observed data results of the MV model, corresponding to an estimated 7.5-fold increase in power of the MV model relative to the UV model. One key property of the data set is its 55.0% rate of missingness, resulting from quality control filters and incomplete measurement of some KO lines. This raises the question of whether it is possible to infer perturbations at phenotype–gene pairs at which data are not available, i.e., to infer some in vivo effects using statistical analysis rather than experimentation. We demonstrate that, even at missing phenotypes, the MV model can detect perturbations with power comparable to the single-phenotype analysis, thereby filling in the complete gene–phenotype map with good sensitivity.A factor analysis of the MV model’s fitted covariance structure identifies 20 clusters of phenotypes, with each cluster tending to be perturbed collectively. These factors cumulatively explain 75% of the KO-induced variation in the data and facilitate biological interpretation of perturbations. We also demonstrate that the MV approach strengthens the correspondence between IMPC phenotypes and existing gene annotation databases. Analysis of a subset of KO lines measured in replicate across multiple laboratories confirms that the MV model increases power with high replicability.The function of the majority of genes in the human and mouse genomes is unknown, and illuminating this "dark genome" is a major challenge for the biomedical sciences. This study shows that multi-dimensional phenotypes from single-gene knockout mouse lines can be analysed at a genome-wide scale both to increase power and infer missing phenotypes.

Suggested Citation

  • George Nicholson & Hugh Morgan & Habib Ganjgahi & Steve D M Brown & Ann-Marie Mallon & Chris Holmes, 2022. "Multivariate phenotype analysis enables genome-wide inference of mammalian gene function," PLOS Biology, Public Library of Science, vol. 20(8), pages 1-41, August.
  • Handle: RePEc:plo:pbio00:3001723
    DOI: 10.1371/journal.pbio.3001723
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.3001723
    Download Restriction: no

    File URL: https://journals.plos.org/plosbiology/article/file?id=10.1371/journal.pbio.3001723&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pbio.3001723?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Timothée Flutre & Xiaoquan Wen & Jonathan Pritchard & Matthew Stephens, 2013. "A Statistical Framework for Joint eQTL Analysis in Multiple Tissues," PLOS Genetics, Public Library of Science, vol. 9(5), pages 1-13, May.
    2. Michael R. Bowl & Michelle M. Simon & Neil J. Ingham & Simon Greenaway & Luis Santos & Heather Cater & Sarah Taylor & Jeremy Mason & Natalja Kurbatova & Selina Pearson & Lynette R. Bower & Dave A. Cla, 2017. "A large scale hearing loss screen reveals an extensive unexplored genetic landscape for auditory dysfunction," Nature Communications, Nature, vol. 8(1), pages 1-11, December.
    3. Qiong Yang & Yuanjia Wang, 2012. "Methods for Analyzing Multivariate Phenotypes in Genetic Association Studies," Journal of Probability and Statistics, Hindawi, vol. 2012, pages 1-13, July.
    4. Jan Rozman & Birgit Rathkolb & Manuela A. Oestereicher & Christine Schütt & Aakash Chavan Ravindranath & Stefanie Leuchtenberger & Sapna Sharma & Martin Kistler & Monja Willershäuser & Robert Brommage, 2018. "Identification of genetic elements in metabolism by high-throughput mouse phenotyping," Nature Communications, Nature, vol. 9(1), pages 1-16, December.
    5. Bengt Muthén, 1989. "Latent variable modeling in heterogeneous populations," Psychometrika, Springer;The Psychometric Society, vol. 54(4), pages 557-585, September.
    6. N. Longford & B. Muthén, 1992. "Factor analysis for clustered observations," Psychometrika, Springer;The Psychometric Society, vol. 57(4), pages 581-597, December.
    7. Enrico Petretto & Leonardo Bottolo & Sarah R Langley & Matthias Heinig & Chris McDermott-Roe & Rizwan Sarwar & Michal Pravenec & Norbert Hübner & Timothy J Aitman & Stuart A Cook & Sylvia Richardson, 2010. "New Insights into the Genetic Control of Gene Expression using a Bayesian Multi-tissue Approach," PLOS Computational Biology, Public Library of Science, vol. 6(4), pages 1-13, April.
    8. Asim Ansari & Kamel Jedidi, 2000. "Bayesian factor analysis for multilevel binary observations," Psychometrika, Springer;The Psychometric Society, vol. 65(4), pages 475-496, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Asim Ansari & Kamel Jedidi & Laurette Dube, 2002. "Heterogeneous factor analysis models: A bayesian approach," Psychometrika, Springer;The Psychometric Society, vol. 67(1), pages 49-77, March.
    2. Sophia Rabe-Hesketh & Anders Skrondal & Andrew Pickles, 2004. "Generalized multilevel structural equation modeling," Psychometrika, Springer;The Psychometric Society, vol. 69(2), pages 167-190, June.
    3. Asim Ansari & Kamel Jedidi & Sharan Jagpal, 2000. "A Hierarchical Bayesian Methodology for Treating Heterogeneity in Structural Equation Models," Marketing Science, INFORMS, vol. 19(4), pages 328-347, August.
    4. David B. Dunson & Zhen Chen & Jean Harry, 2003. "A Bayesian Approach for Joint Modeling of Cluster Size and Subunit-Specific Outcomes," Biometrics, The International Biometric Society, vol. 59(3), pages 521-530, September.
    5. Sik-Yum Lee & Sin-Yu Tsang, 1999. "Constrained maximum likelihood estimation of two-level covariance structure model via EM type algorithms," Psychometrika, Springer;The Psychometric Society, vol. 64(4), pages 435-450, December.
    6. Anders Skrondal & Sophia Rabe‐Hesketh, 2007. "Latent Variable Modelling: A Survey," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 34(4), pages 712-745, December.
    7. Chia-Huei Wu, 2008. "The Role of Perceived Discrepancy in Satisfaction Evaluation," Social Indicators Research: An International and Interdisciplinary Journal for Quality-of-Life Measurement, Springer, vol. 88(3), pages 423-436, September.
    8. N. Longford & B. Muthén, 1992. "Factor analysis for clustered observations," Psychometrika, Springer;The Psychometric Society, vol. 57(4), pages 581-597, December.
    9. Tuck Siong Chung & Roland T. Rust & Michel Wedel, 2009. "My Mobile Music: An Adaptive Personalization System for Digital Audio Players," Marketing Science, INFORMS, vol. 28(1), pages 52-68, 01-02.
    10. Coenders, Germà & Espinet, Josep Maria & Saez, Marc, 2001. "Predicting random level and seasonality of hotel prices. A structural equation growth curve approach," Working Papers of the Department of Economics, University of Girona 1, Department of Economics, University of Girona.
    11. Xiaoquan Wen, 2017. "Robust Bayesian FDR Control Using Bayes Factors, with Applications to Multi-tissue eQTL Discovery," Statistics in Biosciences, Springer;International Chinese Statistical Association, vol. 9(1), pages 28-49, June.
    12. Martijn G. de Jong & Donald R. Lehmann & Oded Netzer, 2012. "State-Dependence Effects in Surveys," Marketing Science, INFORMS, vol. 31(5), pages 838-854, September.
    13. Bacci, Silvia & Bartolucci, Francesco & Pieroni, Luca, 2012. "A causal analysis of mother’s education on birth inequalities," MPRA Paper 38754, University Library of Munich, Germany.
    14. Jonathan Schweig, 2014. "Multilevel Factor Analysis by Model Segregation," Journal of Educational and Behavioral Statistics, , vol. 39(5), pages 394-422, October.
    15. Matthew Stephens, 2013. "A Unified Framework for Association Analysis with Multiple Related Phenotypes," PLOS ONE, Public Library of Science, vol. 8(7), pages 1-19, July.
    16. Steven Boker & Michael Neale & Hermine Maes & Michael Wilde & Michael Spiegel & Timothy Brick & Jeffrey Spies & Ryne Estabrook & Sarah Kenny & Timothy Bates & Paras Mehta & John Fox, 2011. "OpenMx: An Open Source Extended Structural Equation Modeling Framework," Psychometrika, Springer;The Psychometric Society, vol. 76(2), pages 306-317, April.
    17. Yiu-Fai Yung, 1997. "Finite mixtures in confirmatory factor-analysis models," Psychometrika, Springer;The Psychometric Society, vol. 62(3), pages 297-330, September.
    18. Ryan, Joseph P. & Garnier, Philip & Zyphur, Michael & Zhai, Fuhua, 2006. "Investigating the effects of caseworker characteristics in child welfare," Children and Youth Services Review, Elsevier, vol. 28(9), pages 993-1006, September.
    19. Bengt O. Muthã‰N, 1994. "Multilevel Covariance Structure Analysis," Sociological Methods & Research, , vol. 22(3), pages 376-398, February.
    20. Cécile Proust & Hélène Jacqmin-Gadda & Jeremy M. G. Taylor & Julien Ganiayre & Daniel Commenges, 2006. "A Nonlinear Model with Latent Process for Cognitive Evolution Using Multivariate Longitudinal Data," Biometrics, The International Biometric Society, vol. 62(4), pages 1014-1024, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pbio00:3001723. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosbiology (email available below). General contact details of provider: https://journals.plos.org/plosbiology/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.