IDEAS home Printed from https://ideas.repec.org/a/plo/pgen00/1002886.html

A Quantitative Comparison of the Similarity between Genes and Geography in Worldwide Human Populations

Author

Listed:
  • Chaolong Wang
  • Sebastian Zöllner
  • Noah A Rosenberg

Abstract

Multivariate statistical techniques such as principal components analysis (PCA) and multidimensional scaling (MDS) have been widely used to summarize the structure of human genetic variation, often in easily visualized two-dimensional maps. Many recent studies have reported similarity between geographic maps of population locations and MDS or PCA maps of genetic variation inferred from single-nucleotide polymorphisms (SNPs). However, this similarity has been evident primarily in a qualitative sense; and, because different multivariate techniques and marker sets have been used in different studies, it has not been possible to formally compare genetic variation datasets in terms of their levels of similarity with geography. In this study, using genome-wide SNP data from 128 populations worldwide, we perform a systematic analysis to quantitatively evaluate the similarity of genes and geography in different geographic regions. For each of a series of regions, we apply a Procrustes analysis approach to find an optimal transformation that maximizes the similarity between PCA maps of genetic variation and geographic maps of population locations. We consider examples in Europe, Sub-Saharan Africa, Asia, East Asia, and Central/South Asia, as well as in a worldwide sample, finding that significant similarity between genes and geography exists in general at different geographic levels. The similarity is highest in our examples for Asia and, once highly distinctive populations have been removed, Sub-Saharan Africa. Our results provide a quantitative assessment of the geographic structure of human genetic variation worldwide, supporting the view that geography plays a strong role in giving rise to human population structure. Author Summary: The spatial pattern of human genetic variation provides a basis for investigating the history of human migrations. Statistical techniques such as principal components analysis (PCA) and multidimensional scaling (MDS) have been used to summarize spatial patterns of genetic variation, typically by placing individuals on a two-dimensional map in such a way that pairwise Euclidean distances between individuals on the map approximately reflect corresponding genetic relationships. Although similarity between these statistical maps of genetic variation and the geographic maps of sampling locations is often observed, it has not been assessed systematically across different parts of the world. In this study, we combine genome-wide SNP data from more than 100 populations worldwide to perform a formal comparison between genes and geography in different regions. By examining a worldwide sample and samples from Europe, Sub-Saharan Africa, Asia, East Asia, and Central/South Asia, we find that significant similarity between genes and geography exists in general in different geographic regions and at different geographic levels. Surprisingly, the highest similarity is found in Asia, even though the geographic barrier of the Himalaya Mountains has created a discontinuity on the PCA map of genetic variation.

Suggested Citation

  • Chaolong Wang & Sebastian Zöllner & Noah A Rosenberg, 2012. "A Quantitative Comparison of the Similarity between Genes and Geography in Worldwide Human Populations," PLOS Genetics, Public Library of Science, vol. 8(8), pages 1-16, August.
  • Handle: RePEc:plo:pgen00:1002886
    DOI: 10.1371/journal.pgen.1002886
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1002886
    Download Restriction: no

    File URL: https://journals.plos.org/plosgenetics/article/file?id=10.1371/journal.pgen.1002886&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pgen.1002886?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Mattias Jakobsson & Sonja W. Scholz & Paul Scheet & J. Raphael Gibbs & Jenna M. VanLiere & Hon-Chung Fung & Zachary A. Szpiech & James H. Degnan & Kai Wang & Rita Guerreiro & Jose M. Bras & Jennifer C, 2008. "Genotype, haplotype and copy-number variation in worldwide human populations," Nature, Nature, vol. 451(7181), pages 998-1003, February.
    2. John Novembre & Toby Johnson & Katarzyna Bryc & Zoltán Kutalik & Adam R. Boyko & Adam Auton & Amit Indap & Karen S. King & Sven Bergmann & Matthew R. Nelson & Matthew Stephens & Carlos D. Bustamante, 2008. "Genes mirror geography within Europe," Nature, Nature, vol. 456(7219), pages 274-274, November.
    3. John Novembre & Toby Johnson & Katarzyna Bryc & Zoltán Kutalik & Adam R. Boyko & Adam Auton & Amit Indap & Karen S. King & Sven Bergmann & Matthew R. Nelson & Matthew Stephens & Carlos D. Bustamante, 2008. "Genes mirror geography within Europe," Nature, Nature, vol. 456(7218), pages 98-101, November.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Oscar Lao & Fan Liu & Andreas Wollstein & Manfred Kayser, 2014. "GAGA: A New Algorithm for Genomic Inference of Geographic Ancestry Reveals Fine Level Population Substructure in Europeans," PLOS Computational Biology, Public Library of Science, vol. 10(2), pages 1-11, February.
    2. Nur Hani Syazwani Bakri & Nur Aisyah Nabilah Mat Razi & Mohd Firdaus Ahmad & Nur Syazwani Zulaikha Safwan & Nur Dalilah Dahlan & Ummi Kalthum Mokhtar, 2024. "Academic Performance (CGPA) Influences Mental Health: A Study of Students at Seremban Medical Assistant College (SMCA)," Information Management and Business Review, AMH International, vol. 16(2), pages 46-52.
    3. Wirtz, Johannes & Guindon, Stéphane, 2024. "On the connections between the spatial Lambda–Fleming–Viot model and other processes for analysing geo-referenced genetic data," Theoretical Population Biology, Elsevier, vol. 158(C), pages 139-149.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Wang Chaolong & Szpiech Zachary A & Degnan James H & Jakobsson Mattias & Pemberton Trevor J & Hardy John A & Singleton Andrew B & Rosenberg Noah A, 2010. "Comparing Spatial Maps of Human Population-Genetic Variation Using Procrustes Analysis," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 9(1), pages 1-22, January.
    2. Ricardo Kanitz & Elsa G Guillot & Sylvain Antoniazza & Samuel Neuenschwander & Jérôme Goudet, 2018. "Complex genetic patterns in human arise from a simple range-expansion model over continental landmasses," PLOS ONE, Public Library of Science, vol. 13(2), pages 1-16, February.
    3. Nicola Barban & Elisabetta De Cao & Sonia Oreffice & Climent Quintana-Domeque, 2016. "Assortative Mating on Education: A Genetic Assessment," Working Papers 2016-034, Human Capital and Economic Opportunity Working Group.
    4. Athias, Laure & Wicht, Pascal, 2025. "Make or buy for public services: Culture matters for efficiency considerations," The Quarterly Review of Economics and Finance, Elsevier, vol. 99(C).
    5. The International Multiple Sclerosis Genetics Consortium, 2011. "The Genetic Association of Variants in CD6, TNFRSF1A and IRF8 to Multiple Sclerosis: A Multicenter Case-Control Study," PLOS ONE, Public Library of Science, vol. 6(4), pages 1-6, April.
    6. Xiaodong Liu & Ke Zhang & Neslihan A. Kaya & Zhe Jia & Dafei Wu & Tingting Chen & Zhiyuan Liu & Sinan Zhu & Axel M. Hillmer & Torsten Wuestefeld & Jin Liu & Yun Shen Chan & Zheng Hu & Liang Ma & Li Ji, 2024. "Tumor phylogeography reveals block-shaped spatial heterogeneity and the mode of evolution in Hepatocellular Carcinoma," Nature Communications, Nature, vol. 15(1), pages 1-14, December.
    7. Marco Lopez-Cruz & Fernando M. Aguate & Jacob D. Washburn & Natalia Leon & Shawn M. Kaeppler & Dayane Cristina Lima & Ruijuan Tan & Addie Thompson & Laurence Willard Bretonne & Gustavo los Campos, 2023. "Leveraging data from the Genomes-to-Fields Initiative to investigate genotype-by-environment interactions in maize in North America," Nature Communications, Nature, vol. 14(1), pages 1-14, December.
    8. Marie-Claude Babron & Marie de Tayrac & Douglas N Rutledge & Eleftheria Zeggini & Emmanuelle Génin, 2012. "Rare and Low Frequency Variant Stratification in the UK Population: Description and Impact on Association Tests," PLOS ONE, Public Library of Science, vol. 7(10), pages 1-9, October.
    9. repec:plo:pone00:0016513 is not listed on IDEAS
    10. Priya Moorjani & Nick Patterson & Joel N Hirschhorn & Alon Keinan & Li Hao & Gil Atzmon & Edward Burns & Harry Ostrer & Alkes L Price & David Reich, 2011. "The History of African Gene Flow into Southern Europeans, Levantines, and Jews," PLOS Genetics, Public Library of Science, vol. 7(4), pages 1-13, April.
    11. repec:plo:pgen00:1002410 is not listed on IDEAS
    12. Keith Humphreys & Alexander Grankvist & Monica Leu & Per Hall & Jianjun Liu & Samuli Ripatti & Karola Rehnström & Leif Groop & Lars Klareskog & Bo Ding & Henrik Grönberg & Jianfeng Xu & Nancy L Peders, 2011. "The Genetic Structure of the Swedish Population," PLOS ONE, Public Library of Science, vol. 6(8), pages 1-11, August.
    13. Thomas Charlon & Manuel Martínez-Bueno & Lara Bossini-Castillo & F David Carmona & Alessandro Di Cara & Jérôme Wojcik & Sviatoslav Voloshynovskiy & Javier Martín & Marta E Alarcón-Riquelme, 2016. "Single Nucleotide Polymorphism Clustering in Systemic Autoimmune Diseases," PLOS ONE, Public Library of Science, vol. 11(8), pages 1-10, August.
    14. Diana Chang & Feng Gao & Andrea Slavney & Li Ma & Yedael Y Waldman & Aaron J Sams & Paul Billing-Ross & Aviv Madar & Richard Spritz & Alon Keinan, 2014. "Accounting for eXentricities: Analysis of the X Chromosome in GWAS Reveals X-Linked Genes Implicated in Autoimmune Diseases," PLOS ONE, Public Library of Science, vol. 9(12), pages 1-31, December.
    15. Andrey V Khrunin & Denis V Khokhrin & Irina N Filippova & Tõnu Esko & Mari Nelis & Natalia A Bebyakova & Natalia L Bolotova & Janis Klovins & Liene Nikitina-Zake & Karola Rehnström & Samuli Ripatti & , 2013. "A Genome-Wide Analysis of Populations from European Russia Reveals a New Pole of Genetic Diversity in Northern Europe," PLOS ONE, Public Library of Science, vol. 8(3), pages 1-9, March.
    16. Duforet-Frebourg, Nicolas & Slatkin, Montgomery, 2016. "Isolation-by-distance-and-time in a stepping-stone model," Theoretical Population Biology, Elsevier, vol. 108(C), pages 24-35.
    17. Lap Sum Chan & Gen Li & Eric B. Fauman & Xianyong Yin & Markku Laakso & Michael Boehnke & Peter X. K. Song, 2025. "DrFARM: identification of pleiotropic genetic variants in genome-wide association studies," Nature Communications, Nature, vol. 16(1), pages 1-14, December.
    18. Diana Dunca & Sandesh Chopade & María Gordillo-Marañón & Aroon D. Hingorani & Karoline Kuchenbaecker & Chris Finan & Amand F. Schmidt, 2024. "Comparing the effects of CETP in East Asian and European ancestries: a Mendelian randomization study," Nature Communications, Nature, vol. 15(1), pages 1-10, December.
    19. repec:plo:pone00:0078511 is not listed on IDEAS
    20. Wenhan Chen & Yang Wu & Zhili Zheng & Ting Qi & Peter M. Visscher & Zhihong Zhu & Jian Yang, 2021. "Improved analyses of GWAS summary statistics by reducing data heterogeneity and errors," Nature Communications, Nature, vol. 12(1), pages 1-10, December.
    21. repec:plo:pgen00:1002078 is not listed on IDEAS
    22. Alexander Dilthey & Stephen Leslie & Loukas Moutsianas & Judong Shen & Charles Cox & Matthew R Nelson & Gil McVean, 2013. "Multi-Population Classical HLA Type Imputation," PLOS Computational Biology, Public Library of Science, vol. 9(2), pages 1-13, February.
    23. Pierre Luisi & Angelina García & Juan Manuel Berros & Josefina M B Motti & Darío A Demarchi & Emma Alfaro & Eliana Aquilano & Carina Argüelles & Sergio Avena & Graciela Bailliet & Julieta Beltramo & C, 2020. "Fine-scale genomic analyses of admixed individuals reveal unrecognized genetic ancestry components in Argentina," PLOS ONE, Public Library of Science, vol. 15(7), pages 1-30, July.
    24. Aman Agrawal & Alec M Chiu & Minh Le & Eran Halperin & Sriram Sankararaman, 2020. "Scalable probabilistic PCA for large-scale genetic variation data," PLOS Genetics, Public Library of Science, vol. 16(5), pages 1-19, May.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pgen00:1002886. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosgenetics (email available below). General contact details of provider: https://journals.plos.org/plosgenetics/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.