IDEAS home Printed from https://ideas.repec.org/a/plo/pgen00/1007841.html
   My bibliography  Save this article

Expression reflects population structure

Author

Listed:
  • Brielin C Brown
  • Nicolas L Bray
  • Lior Pachter

Abstract

Population structure in genotype data has been extensively studied, and is revealed by looking at the principal components of the genotype matrix. However, no similar analysis of population structure in gene expression data has been conducted, in part because a naïve principal components analysis of the gene expression matrix does not cluster by population. We identify a linear projection that reveals population structure in gene expression data. Our approach relies on the coupling of the principal components of genotype to the principal components of gene expression via canonical correlation analysis. Our method is able to determine the significance of the variance in the canonical correlation projection explained by each gene. We identify 3,571 significant genes, only 837 of which had been previously reported to have an associated eQTL in the GEUVADIS results. We show that our projections are not primarily driven by differences in allele frequency at known cis-eQTLs and that similar projections can be recovered using only several hundred randomly selected genes and SNPs. Finally, we present preliminary work on the consequences for eQTL analysis. We observe that using our projection co-ordinates as covariates results in the discovery of slightly fewer genes with eQTLs, but that these genes replicate in GTEx matched tissue at a slightly higher rate.Author summary: Increasingly complex, high dimensional, multi-modal genomics datasets warrant investigation into analysis techniques that can reveal structure in the data without over-fitting. Here, we show that the coupling of principal component analysis to canonical correlation analysis offers an efficient approach to exploratory analysis of this kind of data. We apply this method to the GEUVADIS dataset of genotype and gene expression values of European and Yoruba individuals, finding as-of-yet unstudied population structure in gene expression abundances. We show that this structure is not driven by known eQTLs, and explore the consequences of our results for eQTL studies involving multiple populations.

Suggested Citation

  • Brielin C Brown & Nicolas L Bray & Lior Pachter, 2018. "Expression reflects population structure," PLOS Genetics, Public Library of Science, vol. 14(12), pages 1-15, December.
  • Handle: RePEc:plo:pgen00:1007841
    DOI: 10.1371/journal.pgen.1007841
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1007841
    Download Restriction: no

    File URL: https://journals.plos.org/plosgenetics/article/file?id=10.1371/journal.pgen.1007841&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pgen.1007841?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. John Novembre & Toby Johnson & Katarzyna Bryc & Zoltán Kutalik & Adam R. Boyko & Adam Auton & Amit Indap & Karen S. King & Sven Bergmann & Matthew R. Nelson & Matthew Stephens & Carlos D. Bustamante, 2008. "Genes mirror geography within Europe," Nature, Nature, vol. 456(7219), pages 274-274, November.
    2. John Novembre & Toby Johnson & Katarzyna Bryc & Zoltán Kutalik & Adam R. Boyko & Adam Auton & Amit Indap & Karen S. King & Sven Bergmann & Matthew R. Nelson & Matthew Stephens & Carlos D. Bustamante, 2008. "Genes mirror geography within Europe," Nature, Nature, vol. 456(7218), pages 98-101, November.
    3. Barbara E Stranger & Stephen B Montgomery & Antigone S Dimas & Leopold Parts & Oliver Stegle & Catherine E Ingle & Magda Sekowska & George Davey Smith & David Evans & Maria Gutierrez-Arcelus & Alkes P, 2012. "Patterns of Cis Regulatory Variation in Diverse Human Populations," PLOS Genetics, Public Library of Science, vol. 8(4), pages 1-13, April.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Marco Lopez-Cruz & Fernando M. Aguate & Jacob D. Washburn & Natalia Leon & Shawn M. Kaeppler & Dayane Cristina Lima & Ruijuan Tan & Addie Thompson & Laurence Willard Bretonne & Gustavo los Campos, 2023. "Leveraging data from the Genomes-to-Fields Initiative to investigate genotype-by-environment interactions in maize in North America," Nature Communications, Nature, vol. 14(1), pages 1-14, December.
    2. Beatrix Eugster & Rafael Lalive & Andreas Steinhauer & Josef Zweimüller, 2011. "The Demand for Social Insurance: Does Culture Matter?," Economic Journal, Royal Economic Society, vol. 121(556), pages 413-448, November.
    3. repec:plo:pgen00:1002078 is not listed on IDEAS
    4. Gad Abraham & Michael Inouye, 2014. "Fast Principal Component Analysis of Large-Scale Genome-Wide Data," PLOS ONE, Public Library of Science, vol. 9(4), pages 1-5, April.
    5. Beatrix Brügger & Rafael Lalive & Josef Zweimüller, 2009. "Does Culture Affect Unemployment? Evidence from the Röstigraben," NRN working papers 2009-10, The Austrian Center for Labor Economics and the Analysis of the Welfare State, Johannes Kepler University Linz, Austria.
    6. Diana Chang & Alon Keinan, 2014. "Principal Component Analysis Characterizes Shared Pathogenetics from Genome-Wide Association Studies," PLOS Computational Biology, Public Library of Science, vol. 10(9), pages 1-14, September.
    7. Alejandro Ochoa & John D Storey, 2021. "Estimating FST and kinship for arbitrary population structures," PLOS Genetics, Public Library of Science, vol. 17(1), pages 1-36, January.
    8. Feldman, Michael J., 2023. "Spiked singular values and vectors under extreme aspect ratios," Journal of Multivariate Analysis, Elsevier, vol. 196(C).
    9. Mateus H. Gouveia & Amy R. Bentley & Thiago P. Leal & Eduardo Tarazona-Santos & Carlos D. Bustamante & Adebowale A. Adeyemo & Charles N. Rotimi & Daniel Shriner, 2023. "Unappreciated subcontinental admixture in Europeans and European Americans and implications for genetic epidemiology studies," Nature Communications, Nature, vol. 14(1), pages 1-11, December.
    10. Bryson, Alex & Morris, Tim & Bann, David & Wilkinson, David, 2025. "The gender wage gap across life: Effects of genetic predisposition towards higher educational attainment," Economics & Human Biology, Elsevier, vol. 56(C).
    11. Nicola Barban & Elisabetta De Cao & Sonia Oreffice & Climent Quintana-Domeque, 2016. "Assortative Mating on Education: A Genetic Assessment," Working Papers 2016-034, Human Capital and Economic Opportunity Working Group.
    12. Bryc, Katarzyna & Bryc, Wlodek & Silverstein, Jack W., 2013. "Separation of the largest eigenvalues in eigenanalysis of genotype data from discrete subpopulations," Theoretical Population Biology, Elsevier, vol. 89(C), pages 34-43.
    13. Guang Guo & Yilan Fu & Hedwig Lee & Tianji Cai & Kathleen Mullan Harris & Yi Li, 2014. "Genetic Bio-Ancestry and Social Construction of Racial Classification in Social Surveys in the Contemporary United States," Demography, Springer;Population Association of America (PAA), vol. 51(1), pages 141-172, February.
    14. Forien, Raphaël & Ringbauer, Harald & Coop, Graham, 2024. "Demographic inference for spatially heterogeneous populations using long shared haplotypes," Theoretical Population Biology, Elsevier, vol. 159(C), pages 108-124.
    15. repec:plo:pone00:0043759 is not listed on IDEAS
    16. Panczak, Radoslaw & Moser, André & Held, Leonhard & Jones, Philip A. & Rühli, Frank J. & Staub, Kaspar, 2017. "A tall order: Small area mapping and modelling of adult height among Swiss male conscripts," Economics & Human Biology, Elsevier, vol. 26(C), pages 61-69.
    17. Athias, Laure & Wicht, Pascal, 2025. "Make or buy for public services: Culture matters for efficiency considerations," The Quarterly Review of Economics and Finance, Elsevier, vol. 99(C).
    18. The International Multiple Sclerosis Genetics Consortium, 2011. "The Genetic Association of Variants in CD6, TNFRSF1A and IRF8 to Multiple Sclerosis: A Multicenter Case-Control Study," PLOS ONE, Public Library of Science, vol. 6(4), pages 1-6, April.
    19. Xiaodong Liu & Ke Zhang & Neslihan A. Kaya & Zhe Jia & Dafei Wu & Tingting Chen & Zhiyuan Liu & Sinan Zhu & Axel M. Hillmer & Torsten Wuestefeld & Jin Liu & Yun Shen Chan & Zheng Hu & Liang Ma & Li Ji, 2024. "Tumor phylogeography reveals block-shaped spatial heterogeneity and the mode of evolution in Hepatocellular Carcinoma," Nature Communications, Nature, vol. 15(1), pages 1-14, December.
    20. Marie-Claude Babron & Marie de Tayrac & Douglas N Rutledge & Eleftheria Zeggini & Emmanuelle Génin, 2012. "Rare and Low Frequency Variant Stratification in the UK Population: Description and Impact on Association Tests," PLOS ONE, Public Library of Science, vol. 7(10), pages 1-9, October.
    21. repec:plo:pone00:0016513 is not listed on IDEAS
    22. Priya Moorjani & Nick Patterson & Joel N Hirschhorn & Alon Keinan & Li Hao & Gil Atzmon & Edward Burns & Harry Ostrer & Alkes L Price & David Reich, 2011. "The History of African Gene Flow into Southern Europeans, Levantines, and Jews," PLOS Genetics, Public Library of Science, vol. 7(4), pages 1-13, April.
    23. Keith Humphreys & Alexander Grankvist & Monica Leu & Per Hall & Jianjun Liu & Samuli Ripatti & Karola Rehnström & Leif Groop & Lars Klareskog & Bo Ding & Henrik Grönberg & Jianfeng Xu & Nancy L Peders, 2011. "The Genetic Structure of the Swedish Population," PLOS ONE, Public Library of Science, vol. 6(8), pages 1-11, August.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pgen00:1007841. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosgenetics (email available below). General contact details of provider: https://journals.plos.org/plosgenetics/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.