IDEAS home Printed from https://ideas.repec.org/a/plo/pbio00/3001723.html
   My bibliography  Save this article

Multivariate phenotype analysis enables genome-wide inference of mammalian gene function

Author

Listed:
  • George Nicholson
  • Hugh Morgan
  • Habib Ganjgahi
  • Steve D M Brown
  • Ann-Marie Mallon
  • Chris Holmes

Abstract

The function of the majority of genes in the human and mouse genomes is unknown. Investigating and illuminating this dark genome is a major challenge for the biomedical sciences. The International Mouse Phenotyping Consortium (IMPC) is addressing this through the generation and broad-based phenotyping of a knockout (KO) mouse line for every protein-coding gene, producing a multidimensional data set that underlies a genome-wide annotation map from genes to phenotypes. Here, we develop a multivariate (MV) statistical approach and apply it to IMPC data comprising 148 phenotypes measured across 4,548 KO lines.There are 4,256 (1.4% of 302,997 observed data measurements) hits called by the univariate (UV) model analysing each phenotype separately, compared to 31,843 (10.5%) hits in the observed data results of the MV model, corresponding to an estimated 7.5-fold increase in power of the MV model relative to the UV model. One key property of the data set is its 55.0% rate of missingness, resulting from quality control filters and incomplete measurement of some KO lines. This raises the question of whether it is possible to infer perturbations at phenotype–gene pairs at which data are not available, i.e., to infer some in vivo effects using statistical analysis rather than experimentation. We demonstrate that, even at missing phenotypes, the MV model can detect perturbations with power comparable to the single-phenotype analysis, thereby filling in the complete gene–phenotype map with good sensitivity.A factor analysis of the MV model’s fitted covariance structure identifies 20 clusters of phenotypes, with each cluster tending to be perturbed collectively. These factors cumulatively explain 75% of the KO-induced variation in the data and facilitate biological interpretation of perturbations. We also demonstrate that the MV approach strengthens the correspondence between IMPC phenotypes and existing gene annotation databases. Analysis of a subset of KO lines measured in replicate across multiple laboratories confirms that the MV model increases power with high replicability.The function of the majority of genes in the human and mouse genomes is unknown, and illuminating this "dark genome" is a major challenge for the biomedical sciences. This study shows that multi-dimensional phenotypes from single-gene knockout mouse lines can be analysed at a genome-wide scale both to increase power and infer missing phenotypes.

Suggested Citation

  • George Nicholson & Hugh Morgan & Habib Ganjgahi & Steve D M Brown & Ann-Marie Mallon & Chris Holmes, 2022. "Multivariate phenotype analysis enables genome-wide inference of mammalian gene function," PLOS Biology, Public Library of Science, vol. 20(8), pages 1-41, August.
  • Handle: RePEc:plo:pbio00:3001723
    DOI: 10.1371/journal.pbio.3001723
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.3001723
    Download Restriction: no

    File URL: https://journals.plos.org/plosbiology/article/file?id=10.1371/journal.pbio.3001723&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pbio.3001723?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Timothée Flutre & Xiaoquan Wen & Jonathan Pritchard & Matthew Stephens, 2013. "A Statistical Framework for Joint eQTL Analysis in Multiple Tissues," PLOS Genetics, Public Library of Science, vol. 9(5), pages 1-13, May.
    2. Enrico Petretto & Leonardo Bottolo & Sarah R Langley & Matthias Heinig & Chris McDermott-Roe & Rizwan Sarwar & Michal Pravenec & Norbert Hübner & Timothy J Aitman & Stuart A Cook & Sylvia Richardson, 2010. "New Insights into the Genetic Control of Gene Expression using a Bayesian Multi-tissue Approach," PLOS Computational Biology, Public Library of Science, vol. 6(4), pages 1-13, April.
    3. Asim Ansari & Kamel Jedidi, 2000. "Bayesian factor analysis for multilevel binary observations," Psychometrika, Springer;The Psychometric Society, vol. 65(4), pages 475-496, December.
    4. Michael R. Bowl & Michelle M. Simon & Neil J. Ingham & Simon Greenaway & Luis Santos & Heather Cater & Sarah Taylor & Jeremy Mason & Natalja Kurbatova & Selina Pearson & Lynette R. Bower & Dave A. Cla, 2017. "A large scale hearing loss screen reveals an extensive unexplored genetic landscape for auditory dysfunction," Nature Communications, Nature, vol. 8(1), pages 1-11, December.
    5. Qiong Yang & Yuanjia Wang, 2012. "Methods for Analyzing Multivariate Phenotypes in Genetic Association Studies," Journal of Probability and Statistics, Hindawi, vol. 2012, pages 1-13, July.
    6. Jan Rozman & Birgit Rathkolb & Manuela A. Oestereicher & Christine Schütt & Aakash Chavan Ravindranath & Stefanie Leuchtenberger & Sapna Sharma & Martin Kistler & Monja Willershäuser & Robert Brommage, 2018. "Identification of genetic elements in metabolism by high-throughput mouse phenotyping," Nature Communications, Nature, vol. 9(1), pages 1-16, December.
    7. Bengt Muthén, 1989. "Latent variable modeling in heterogeneous populations," Psychometrika, Springer;The Psychometric Society, vol. 54(4), pages 557-585, September.
    8. N. Longford & B. Muthén, 1992. "Factor analysis for clustered observations," Psychometrika, Springer;The Psychometric Society, vol. 57(4), pages 581-597, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Asim Ansari & Kamel Jedidi & Laurette Dube, 2002. "Heterogeneous factor analysis models: A bayesian approach," Psychometrika, Springer;The Psychometric Society, vol. 67(1), pages 49-77, March.
    2. Sophia Rabe-Hesketh & Anders Skrondal & Andrew Pickles, 2004. "Generalized multilevel structural equation modeling," Psychometrika, Springer;The Psychometric Society, vol. 69(2), pages 167-190, June.
    3. Asim Ansari & Kamel Jedidi & Sharan Jagpal, 2000. "A Hierarchical Bayesian Methodology for Treating Heterogeneity in Structural Equation Models," Marketing Science, INFORMS, vol. 19(4), pages 328-347, August.
    4. David B. Dunson & Zhen Chen & Jean Harry, 2003. "A Bayesian Approach for Joint Modeling of Cluster Size and Subunit-Specific Outcomes," Biometrics, The International Biometric Society, vol. 59(3), pages 521-530, September.
    5. Anders Skrondal & Sophia Rabe‐Hesketh, 2007. "Latent Variable Modelling: A Survey," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 34(4), pages 712-745, December.
    6. Sik-Yum Lee & Sin-Yu Tsang, 1999. "Constrained maximum likelihood estimation of two-level covariance structure model via EM type algorithms," Psychometrika, Springer;The Psychometric Society, vol. 64(4), pages 435-450, December.
    7. Coenders, Germà & Espinet, Josep Maria & Saez, Marc, 2001. "Predicting random level and seasonality of hotel prices. A structural equation growth curve approach," Working Papers of the Department of Economics, University of Girona 1, Department of Economics, University of Girona.
    8. Bacci, Silvia & Bartolucci, Francesco & Pieroni, Luca, 2012. "A causal analysis of mother’s education on birth inequalities," MPRA Paper 38754, University Library of Munich, Germany.
    9. Jonathan Schweig, 2014. "Multilevel Factor Analysis by Model Segregation," Journal of Educational and Behavioral Statistics, , vol. 39(5), pages 394-422, October.
    10. Yiu-Fai Yung, 1997. "Finite mixtures in confirmatory factor-analysis models," Psychometrika, Springer;The Psychometric Society, vol. 62(3), pages 297-330, September.
    11. Ryan, Joseph P. & Garnier, Philip & Zyphur, Michael & Zhai, Fuhua, 2006. "Investigating the effects of caseworker characteristics in child welfare," Children and Youth Services Review, Elsevier, vol. 28(9), pages 993-1006, September.
    12. Ellen D’Haenens & Jan Van Damme & Patrick Onghena, 2012. "Constructing measures for school process variables: the potential of multilevel confirmatory factor analysis," Quality & Quantity: International Journal of Methodology, Springer, vol. 46(1), pages 155-188, January.
    13. Ke-Hai Yuan & Kentaro Hayashi, 2005. "On muthén’s maximum likelihood for two-level covariance structure models," Psychometrika, Springer;The Psychometric Society, vol. 70(1), pages 147-167, March.
    14. Holger Steinmetz & Peter Schmidt & Andrea Tina-Booh & Siegrid Wieczorek & Shalom Schwartz, 2009. "Testing measurement invariance using multigroup CFA: differences between educational groups in human values measurement," Quality & Quantity: International Journal of Methodology, Springer, vol. 43(4), pages 599-616, July.
    15. Bobby L. Jones & Daniel S. Nagin & Kathryn Roeder, 2001. "A SAS Procedure Based on Mixture Models for Estimating Developmental Trajectories," Sociological Methods & Research, , vol. 29(3), pages 374-393, February.
    16. Nadezhda Lebedeva & Peter Schmidt, 2013. "Values and Attitudes towards Innovation among Canadian, Chinese and Russian Students," HSE Working papers WP BRP 04/SOC/2013, National Research University Higher School of Economics.
    17. Hong-Tu Zhu & Sik-Yum Lee, 2001. "A Bayesian analysis of finite mixtures in the LISREL model," Psychometrika, Springer;The Psychometric Society, vol. 66(1), pages 133-152, March.
    18. Lee, Sik-Yum & Song, Xin-Yuan, 2008. "On Bayesian estimation and model comparison of an integrated structural equation model," Computational Statistics & Data Analysis, Elsevier, vol. 52(10), pages 4814-4827, June.
    19. Pilar Rivera & Albert Satorra, 2000. "Country effects in ISSP-1993 environmental data: Comparison of SEM approaches," Economics Working Papers 458, Department of Economics and Business, Universitat Pompeu Fabra.
    20. Zhiguo Xiao, 2011. "Efficient Estimation of Moment Condition Models with Heterogenous Populations," Annals of Economics and Finance, Society for AEF, vol. 12(1), pages 89-107, May.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pbio00:3001723. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosbiology (email available below). General contact details of provider: https://journals.plos.org/plosbiology/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.