IDEAS home Printed from https://ideas.repec.org/a/plo/pgen00/1001117.html
   My bibliography  Save this article

Analysis of Population Structure: A Unifying Framework and Novel Methods Based on Sparse Factor Analysis

Author

Listed:
  • Barbara E Engelhardt
  • Matthew Stephens

Abstract

We consider the statistical analysis of population structure using genetic data. We show how the two most widely used approaches to modeling population structure, admixture-based models and principal components analysis (PCA), can be viewed within a single unifying framework of matrix factorization. Specifically, they can both be interpreted as approximating an observed genotype matrix by a product of two lower-rank matrices, but with different constraints or prior distributions on these lower-rank matrices. This opens the door to a large range of possible approaches to analyzing population structure, by considering other constraints or priors. In this paper, we introduce one such novel approach, based on sparse factor analysis (SFA). We investigate the effects of the different types of constraint in several real and simulated data sets. We find that SFA produces similar results to admixture-based models when the samples are descended from a few well-differentiated ancestral populations and can recapitulate the results of PCA when the population structure is more “continuous,” as in isolation-by-distance models.Author Summary: Two different approaches have become widely used in the analysis of population structure: admixture-based models and principal components analysis (PCA). In admixture-based models each individual is assumed to have inherited some proportion of its ancestry from one of several distinct populations. PCA projects the individuals into a low-dimensional subspace. On the face of it, these methods seem to have little in common. Here we show how in fact both of these methods can be viewed within a single unifying framework. This viewpoint should help practitioners to better interpret and contrast the results from these methods in real data applications. It also provides a springboard to the development of novel approaches to this problem. We introduce one such novel approach, based on sparse factor analysis, which has elements in common with both admixture-based models and PCA. As we illustrate here, in some settings sparse factor analysis may provide more interpretable results than either admixture-based models or PCA.

Suggested Citation

  • Barbara E Engelhardt & Matthew Stephens, 2010. "Analysis of Population Structure: A Unifying Framework and Novel Methods Based on Sparse Factor Analysis," PLOS Genetics, Public Library of Science, vol. 6(9), pages 1-12, September.
  • Handle: RePEc:plo:pgen00:1001117
    DOI: 10.1371/journal.pgen.1001117
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1001117
    Download Restriction: no

    File URL: https://journals.plos.org/plosgenetics/article/file?id=10.1371/journal.pgen.1001117&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pgen.1001117?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. David Reich & Kumarasamy Thangaraj & Nick Patterson & Alkes L. Price & Lalji Singh, 2009. "Reconstructing Indian population history," Nature, Nature, vol. 461(7263), pages 489-494, September.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Duforet-Frebourg, Nicolas & Slatkin, Montgomery, 2016. "Isolation-by-distance-and-time in a stepping-stone model," Theoretical Population Biology, Elsevier, vol. 108(C), pages 24-35.
    2. Estavoyer, Maxime & François, Olivier, 2022. "Theoretical analysis of principal components in an umbrella model of intraspecific evolution," Theoretical Population Biology, Elsevier, vol. 148(C), pages 11-21.
    3. Boca, Simina M. & Rosenberg, Noah A., 2011. "Mathematical properties of Fst between admixed populations and their parental source populations," Theoretical Population Biology, Elsevier, vol. 80(3), pages 208-216.
    4. Zheng, Xiuwen & Weir, Bruce S., 2016. "Eigenanalysis of SNP data with an identity by descent interpretation," Theoretical Population Biology, Elsevier, vol. 107(C), pages 65-76.
    5. Chuan Gao & Ian C McDowell & Shiwen Zhao & Christopher D Brown & Barbara E Engelhardt, 2016. "Context Specific and Differential Gene Co-expression Networks via Bayesian Biclustering," PLOS Computational Biology, Public Library of Science, vol. 12(7), pages 1-39, July.
    6. Markus Neuditschko & Mehar S Khatkar & Herman W Raadsma, 2012. "NetView: A High-Definition Network-Visualization Approach to Detect Fine-Scale Population Structures from Genome-Wide Patterns of Variation," PLOS ONE, Public Library of Science, vol. 7(10), pages 1-13, October.
    7. Ricardo Kanitz & Elsa G Guillot & Sylvain Antoniazza & Samuel Neuenschwander & Jérôme Goudet, 2018. "Complex genetic patterns in human arise from a simple range-expansion model over continental landmasses," PLOS ONE, Public Library of Science, vol. 13(2), pages 1-16, February.
    8. Aman Agrawal & Alec M Chiu & Minh Le & Eran Halperin & Sriram Sankararaman, 2020. "Scalable probabilistic PCA for large-scale genetic variation data," PLOS Genetics, Public Library of Science, vol. 16(5), pages 1-19, May.
    9. Bryc, Katarzyna & Bryc, Wlodek & Silverstein, Jack W., 2013. "Separation of the largest eigenvalues in eigenanalysis of genotype data from discrete subpopulations," Theoretical Population Biology, Elsevier, vol. 89(C), pages 34-43.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Gyaneshwer Chaubey & Anurag Kadian & Saroj Bala & Vadlamudi Raghavendra Rao, 2015. "Genetic Affinity of the Bhil, Kol and Gond Mentioned in Epic Ramayana," PLOS ONE, Public Library of Science, vol. 10(6), pages 1-11, June.
    2. Michael Bridges & Elizabeth A Heron & Colm O'Dushlaine & Ricardo Segurado & The International Schizophrenia Consortium (ISC) & Derek Morris & Aiden Corvin & Michael Gill & Carlos Pinto, 2011. "Genetic Classification of Populations Using Supervised Learning," PLOS ONE, Public Library of Science, vol. 6(5), pages 1-12, May.
    3. Kay Young McChesney, 2015. "Teaching Diversity," SAGE Open, , vol. 5(4), pages 21582440156, October.
    4. Rozaimi Mohamad Razali & Juan Rodriguez-Flores & Mohammadmersad Ghorbani & Haroon Naeem & Waleed Aamer & Elbay Aliyev & Ali Jubran & Andrew G. Clark & Khalid A. Fakhro & Younes Mokrab, 2021. "Thousands of Qatari genomes inform human migration history and improve imputation of Arab haplotypes," Nature Communications, Nature, vol. 12(1), pages 1-16, December.
    5. Mark S Hibbins & Matthew W Hahn, 2021. "The effects of introgression across thousands of quantitative traits revealed by gene expression in wild tomatoes," PLOS Genetics, Public Library of Science, vol. 17(11), pages 1-20, November.
    6. David B. Stern & Nathan W. Anderson & Juanita A. Diaz & Carol Eunmi Lee, 2022. "Genome-wide signatures of synergistic epistasis during parallel adaptation in a Baltic Sea copepod," Nature Communications, Nature, vol. 13(1), pages 1-14, December.
    7. S Justin Carlus & Saumya Sarkar & Sandeep Kumar Bansal & Vertika Singh & Kiran Singh & Rajesh Kumar Jha & Nirmala Sadasivam & Sri Revathy Sadasivam & P S Gireesha & Kumarasamy Thangaraj & Singh Rajend, 2016. "Is MTHFR 677 C>T Polymorphism Clinically Important in Polycystic Ovarian Syndrome (PCOS)? A Case-Control Study, Meta-Analysis and Trial Sequential Analysis," PLOS ONE, Public Library of Science, vol. 11(3), pages 1-15, March.
    8. Zhijun Wu & Yuqing Lou & Wei Jin & Yan Liu & Lin Lu & Guoping Lu, 2012. "The Pro12Ala Polymorphism in the Peroxisome Proliferator-Activated Receptor Gamma-2 Gene (PPARγ2) Is Associated with Increased Risk of Coronary Artery Disease: A Meta-Analysis," PLOS ONE, Public Library of Science, vol. 7(12), pages 1-14, December.
    9. Buzbas, Erkan Ozge & Verdu, Paul, 2018. "Inference on admixture fractions in a mechanistic model of recurrent admixture," Theoretical Population Biology, Elsevier, vol. 122(C), pages 149-157.
    10. Gunjan Sharma & Rakesh Tamang & Ruchira Chaudhary & Vipin Kumar Singh & Anish M Shah & Sharath Anugula & Deepa Selvi Rani & Alla G Reddy & Muthukrishnan Eaaswarkhanth & Gyaneshwer Chaubey & Lalji Sing, 2012. "Genetic Affinities of the Central Indian Tribal Populations," PLOS ONE, Public Library of Science, vol. 7(2), pages 1-8, February.
    11. David Gordon & Shailen Nandy, 2016. "The Extent, Nature and Distribution of Child Poverty in India," Indian Journal of Human Development, , vol. 10(1), pages 64-84, April.
    12. Jeffrey D. Wall & J. Fah Sathirapongsasuti & Ravi Gupta & Asif Rasheed & Radha Venkatesan & Saurabh Belsare & Ramesh Menon & Sameer Phalke & Anuradha Mittal & John Fang & Deepak Tanneeru & Manjari Des, 2023. "South Asian medical cohorts reveal strong founder effects and high rates of homozygosity," Nature Communications, Nature, vol. 14(1), pages 1-11, December.
    13. Lejla Kovacevic & Kristiina Tambets & Anne-Mai Ilumäe & Alena Kushniarevich & Bayazit Yunusbayev & Anu Solnik & Tamer Bego & Dragan Primorac & Vedrana Skaro & Andreja Leskovac & Zlatko Jakovski & Katj, 2014. "Standing at the Gateway to Europe - The Genetic Structure of Western Balkan Populations Based on Autosomal and Haploid Markers," PLOS ONE, Public Library of Science, vol. 9(8), pages 1-15, August.
    14. Jason Flannick & Joshua M Korn & Pierre Fontanillas & George B Grant & Eric Banks & Mark A Depristo & David Altshuler, 2012. "Efficiency and Power as a Function of Sequence Coverage, SNP Array Density, and Imputation," PLOS Computational Biology, Public Library of Science, vol. 8(7), pages 1-13, July.
    15. Priya Moorjani & Nick Patterson & Joel N Hirschhorn & Alon Keinan & Li Hao & Gil Atzmon & Edward Burns & Harry Ostrer & Alkes L Price & David Reich, 2011. "The History of African Gene Flow into Southern Europeans, Levantines, and Jews," PLOS Genetics, Public Library of Science, vol. 7(4), pages 1-13, April.
    16. Yedael Y Waldman & Arjun Biddanda & Natalie R Davidson & Paul Billing-Ross & Maya Dubrovsky & Christopher L Campbell & Carole Oddoux & Eitan Friedman & Gil Atzmon & Eran Halperin & Harry Ostrer & Alon, 2016. "The Genetics of Bene Israel from India Reveals Both Substantial Jewish and Indian Ancestry," PLOS ONE, Public Library of Science, vol. 11(3), pages 1-28, March.
    17. Priya Moorjani & Nick Patterson & Po-Ru Loh & Mark Lipson & Péter Kisfali & Bela I Melegh & Michael Bonin & Ľudevít Kádaši & Olaf Rieß & Bonnie Berger & David Reich & Béla Melegh, 2013. "Reconstructing Roma History from Genome-Wide Data," PLOS ONE, Public Library of Science, vol. 8(3), pages 1-11, March.
    18. Soraggi, Samuele & Wiuf, Carsten, 2019. "General theory for stochastic admixture graphs and F-statistics," Theoretical Population Biology, Elsevier, vol. 125(C), pages 56-66.
    19. Daniel John Lawson & Garrett Hellenthal & Simon Myers & Daniel Falush, 2012. "Inference of Population Structure using Dense Haplotype Data," PLOS Genetics, Public Library of Science, vol. 8(1), pages 1-16, January.
    20. Ruha Benjamin, 2015. "The Emperor’s New Genes," The ANNALS of the American Academy of Political and Social Science, , vol. 661(1), pages 130-142, September.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pgen00:1001117. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosgenetics (email available below). General contact details of provider: https://journals.plos.org/plosgenetics/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.