IDEAS home Printed from https://ideas.repec.org/a/plo/pgen00/1007841.html
   My bibliography  Save this article

Expression reflects population structure

Author

Listed:
  • Brielin C Brown
  • Nicolas L Bray
  • Lior Pachter

Abstract

Population structure in genotype data has been extensively studied, and is revealed by looking at the principal components of the genotype matrix. However, no similar analysis of population structure in gene expression data has been conducted, in part because a naïve principal components analysis of the gene expression matrix does not cluster by population. We identify a linear projection that reveals population structure in gene expression data. Our approach relies on the coupling of the principal components of genotype to the principal components of gene expression via canonical correlation analysis. Our method is able to determine the significance of the variance in the canonical correlation projection explained by each gene. We identify 3,571 significant genes, only 837 of which had been previously reported to have an associated eQTL in the GEUVADIS results. We show that our projections are not primarily driven by differences in allele frequency at known cis-eQTLs and that similar projections can be recovered using only several hundred randomly selected genes and SNPs. Finally, we present preliminary work on the consequences for eQTL analysis. We observe that using our projection co-ordinates as covariates results in the discovery of slightly fewer genes with eQTLs, but that these genes replicate in GTEx matched tissue at a slightly higher rate.Author summary: Increasingly complex, high dimensional, multi-modal genomics datasets warrant investigation into analysis techniques that can reveal structure in the data without over-fitting. Here, we show that the coupling of principal component analysis to canonical correlation analysis offers an efficient approach to exploratory analysis of this kind of data. We apply this method to the GEUVADIS dataset of genotype and gene expression values of European and Yoruba individuals, finding as-of-yet unstudied population structure in gene expression abundances. We show that this structure is not driven by known eQTLs, and explore the consequences of our results for eQTL studies involving multiple populations.

Suggested Citation

  • Brielin C Brown & Nicolas L Bray & Lior Pachter, 2018. "Expression reflects population structure," PLOS Genetics, Public Library of Science, vol. 14(12), pages 1-15, December.
  • Handle: RePEc:plo:pgen00:1007841
    DOI: 10.1371/journal.pgen.1007841
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1007841
    Download Restriction: no

    File URL: https://journals.plos.org/plosgenetics/article/file?id=10.1371/journal.pgen.1007841&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pgen.1007841?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. John Novembre & Toby Johnson & Katarzyna Bryc & Zoltán Kutalik & Adam R. Boyko & Adam Auton & Amit Indap & Karen S. King & Sven Bergmann & Matthew R. Nelson & Matthew Stephens & Carlos D. Bustamante, 2008. "Genes mirror geography within Europe," Nature, Nature, vol. 456(7219), pages 274-274, November.
    2. John Novembre & Toby Johnson & Katarzyna Bryc & Zoltán Kutalik & Adam R. Boyko & Adam Auton & Amit Indap & Karen S. King & Sven Bergmann & Matthew R. Nelson & Matthew Stephens & Carlos D. Bustamante, 2008. "Genes mirror geography within Europe," Nature, Nature, vol. 456(7218), pages 98-101, November.
    3. Barbara E Stranger & Stephen B Montgomery & Antigone S Dimas & Leopold Parts & Oliver Stegle & Catherine E Ingle & Magda Sekowska & George Davey Smith & David Evans & Maria Gutierrez-Arcelus & Alkes P, 2012. "Patterns of Cis Regulatory Variation in Diverse Human Populations," PLOS Genetics, Public Library of Science, vol. 8(4), pages 1-13, April.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Marco Lopez-Cruz & Fernando M. Aguate & Jacob D. Washburn & Natalia Leon & Shawn M. Kaeppler & Dayane Cristina Lima & Ruijuan Tan & Addie Thompson & Laurence Willard Bretonne & Gustavo los Campos, 2023. "Leveraging data from the Genomes-to-Fields Initiative to investigate genotype-by-environment interactions in maize in North America," Nature Communications, Nature, vol. 14(1), pages 1-14, December.
    2. Beatrix Eugster & Rafael Lalive & Andreas Steinhauer & Josef Zweimüller, 2011. "The Demand for Social Insurance: Does Culture Matter?," Economic Journal, Royal Economic Society, vol. 121(556), pages 413-448, November.
    3. Filippini, Massimo & Wekhof, Tobias, 2021. "The effect of culture on energy efficient vehicle ownership," Journal of Environmental Economics and Management, Elsevier, vol. 105(C).
    4. Andrey V Khrunin & Denis V Khokhrin & Irina N Filippova & Tõnu Esko & Mari Nelis & Natalia A Bebyakova & Natalia L Bolotova & Janis Klovins & Liene Nikitina-Zake & Karola Rehnström & Samuli Ripatti & , 2013. "A Genome-Wide Analysis of Populations from European Russia Reveals a New Pole of Genetic Diversity in Northern Europe," PLOS ONE, Public Library of Science, vol. 8(3), pages 1-9, March.
    5. Wenhan Chen & Yang Wu & Zhili Zheng & Ting Qi & Peter M. Visscher & Zhihong Zhu & Jian Yang, 2021. "Improved analyses of GWAS summary statistics by reducing data heterogeneity and errors," Nature Communications, Nature, vol. 12(1), pages 1-10, December.
    6. Pierre Luisi & Angelina García & Juan Manuel Berros & Josefina M B Motti & Darío A Demarchi & Emma Alfaro & Eliana Aquilano & Carina Argüelles & Sergio Avena & Graciela Bailliet & Julieta Beltramo & C, 2020. "Fine-scale genomic analyses of admixed individuals reveal unrecognized genetic ancestry components in Argentina," PLOS ONE, Public Library of Science, vol. 15(7), pages 1-30, July.
    7. Gad Abraham & Michael Inouye, 2014. "Fast Principal Component Analysis of Large-Scale Genome-Wide Data," PLOS ONE, Public Library of Science, vol. 9(4), pages 1-5, April.
    8. Beatrix Brügger & Rafael Lalive & Josef Zweimüller, 2009. "Does Culture Affect Unemployment? Evidence from the Röstigraben," NRN working papers 2009-10, The Austrian Center for Labor Economics and the Analysis of the Welfare State, Johannes Kepler University Linz, Austria.
    9. Diana Chang & Alon Keinan, 2014. "Principal Component Analysis Characterizes Shared Pathogenetics from Genome-Wide Association Studies," PLOS Computational Biology, Public Library of Science, vol. 10(9), pages 1-14, September.
    10. Alejandro Ochoa & John D Storey, 2021. "Estimating FST and kinship for arbitrary population structures," PLOS Genetics, Public Library of Science, vol. 17(1), pages 1-36, January.
    11. Victor Ronda & Esben Agerbo & Dorthe Bleses & Preben Bo Mortensen & Anders Børglum & Ole Mors & Michael Rosholm & David M. Hougaard & Merete Nordentoft & Thomas Werge, 2022. "Family disadvantage, gender, and the returns to genetic human capital," Scandinavian Journal of Economics, Wiley Blackwell, vol. 124(2), pages 550-578, April.
    12. Feldman, Michael J., 2023. "Spiked singular values and vectors under extreme aspect ratios," Journal of Multivariate Analysis, Elsevier, vol. 196(C).
    13. Mateus H. Gouveia & Amy R. Bentley & Thiago P. Leal & Eduardo Tarazona-Santos & Carlos D. Bustamante & Adebowale A. Adeyemo & Charles N. Rotimi & Daniel Shriner, 2023. "Unappreciated subcontinental admixture in Europeans and European Americans and implications for genetic epidemiology studies," Nature Communications, Nature, vol. 14(1), pages 1-11, December.
    14. Nicola Barban & Elisabetta De Cao & Sonia Oreffice & Climent Quintana-Domeque, 2016. "Assortative Mating on Education: A Genetic Assessment," Working Papers 2016-034, Human Capital and Economic Opportunity Working Group.
    15. Bryc, Katarzyna & Bryc, Wlodek & Silverstein, Jack W., 2013. "Separation of the largest eigenvalues in eigenanalysis of genotype data from discrete subpopulations," Theoretical Population Biology, Elsevier, vol. 89(C), pages 34-43.
    16. Athias, Laure & Wicht, Pascal, 2014. "Cultural Biases in Public Service Delivery: Evidence from a Regression Discontinuity Approach," MPRA Paper 60639, University Library of Munich, Germany.
    17. Oscar Lao & Fan Liu & Andreas Wollstein & Manfred Kayser, 2014. "GAGA: A New Algorithm for Genomic Inference of Geographic Ancestry Reveals Fine Level Population Substructure in Europeans," PLOS Computational Biology, Public Library of Science, vol. 10(2), pages 1-11, February.
    18. Elena Gentili & Giuliano Masiero & Fabrizio Mazzonna, 2016. "The Role of Culture in Long-term Care," IdEP Economic Papers 1605, USI Università della Svizzera italiana.
    19. Gil McVean, 2009. "A Genealogical Interpretation of Principal Components Analysis," PLOS Genetics, Public Library of Science, vol. 5(10), pages 1-10, October.
    20. Guang Guo & Yilan Fu & Hedwig Lee & Tianji Cai & Kathleen Mullan Harris & Yi Li, 2014. "Genetic Bio-Ancestry and Social Construction of Racial Classification in Social Surveys in the Contemporary United States," Demography, Springer;Population Association of America (PAA), vol. 51(1), pages 141-172, February.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pgen00:1007841. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosgenetics (email available below). General contact details of provider: https://journals.plos.org/plosgenetics/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.