IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0029901.html
   My bibliography  Save this article

Manifold Learning for Human Population Structure Studies

Author

Listed:
  • Hoicheong Siu
  • Li Jin
  • Momiao Xiong

Abstract

The dimension of the population genetics data produced by next-generation sequencing platforms is extremely high. However, the “intrinsic dimensionality” of sequence data, which determines the structure of populations, is much lower. This motivates us to use locally linear embedding (LLE) which projects high dimensional genomic data into low dimensional, neighborhood preserving embedding, as a general framework for population structure and historical inference. To facilitate application of the LLE to population genetic analysis, we systematically investigate several important properties of the LLE and reveal the connection between the LLE and principal component analysis (PCA). Identifying a set of markers and genomic regions which could be used for population structure analysis will provide invaluable information for population genetics and association studies. In addition to identifying the LLE-correlated or PCA-correlated structure informative marker, we have developed a new statistic that integrates genomic information content in a genomic region for collectively studying its association with the population structure and LASSO algorithm to search such regions across the genomes. We applied the developed methodologies to a low coverage pilot dataset in the 1000 Genomes Project and a PHASE III Mexico dataset of the HapMap. We observed that 25.1%, 44.9% and 21.4% of the common variants and 89.2%, 92.4% and 75.1% of the rare variants were the LLE-correlated markers in CEU, YRI and ASI, respectively. This showed that rare variants, which are often private to specific populations, have much higher power to identify population substructure than common variants. The preliminary results demonstrated that next generation sequencing offers a rich resources and LLE provide a powerful tool for population structure analysis.

Suggested Citation

  • Hoicheong Siu & Li Jin & Momiao Xiong, 2012. "Manifold Learning for Human Population Structure Studies," PLOS ONE, Public Library of Science, vol. 7(1), pages 1-18, January.
  • Handle: RePEc:plo:pone00:0029901
    DOI: 10.1371/journal.pone.0029901
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0029901
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0029901&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0029901?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Nick Patterson & Alkes L Price & David Reich, 2006. "Population Structure and Eigenanalysis," PLOS Genetics, Public Library of Science, vol. 2(12), pages 1-20, December.
    2. Francis Collins, 2010. "Has the revolution arrived?," Nature, Nature, vol. 464(7289), pages 674-675, April.
    3. Jun Zhang & Partha Niyogi & Mary Sara McPeek, 2009. "Laplacian Eigenfunctions Learn Population Structure," PLOS ONE, Public Library of Science, vol. 4(12), pages 1-6, December.
    4. Peristera Paschou & Elad Ziv & Esteban G Burchard & Shweta Choudhry & William Rodriguez-Cintron & Michael W Mahoney & Petros Drineas, 2007. "PCA-Correlated SNPs for Structure Identification in Worldwide Human Populations," PLOS Genetics, Public Library of Science, vol. 3(9), pages 1-15, September.
    5. Jun Zhang, 2010. "Ancestral Informative Marker Selection and Population Structure Visualization Using Sparse Laplacian Eigenfunctions," PLOS ONE, Public Library of Science, vol. 5(11), pages 1-12, November.
    6. Rasmus Nielsen, 2010. "In search of rare human variants," Nature, Nature, vol. 467(7319), pages 1050-1051, October.
    7. J. Craig Venter, 2010. "Multiple personal genomes await," Nature, Nature, vol. 464(7289), pages 676-677, April.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Ronald J Nowling & Krystal R Manke & Scott J Emrich, 2020. "Detecting inversions with PCA in the presence of population structure," PLOS ONE, Public Library of Science, vol. 15(10), pages 1-20, October.
    2. Peristera Paschou & Petros Drineas & Jamey Lewis & Caroline M Nievergelt & Deborah A Nickerson & Joshua D Smith & Paul M Ridker & Daniel I Chasman & Ronald M Krauss & Elad Ziv, 2008. "Tracing Sub-Structure in the European American Population with PCA-Informative Markers," PLOS Genetics, Public Library of Science, vol. 4(7), pages 1-13, July.
    3. André X C N Valente & Joseph Zischkau & Joo Heon Shin & Yuan Gao & Abhijit Sarkar, 2012. "Genome-Wide Association Study Heterogeneous Cohort Homogenization via Subject Weight Knock-Down," PLOS ONE, Public Library of Science, vol. 7(10), pages 1-10, October.
    4. Jason Sawler & Bruce Reisch & Mallikarjuna K Aradhya & Bernard Prins & Gan-Yuan Zhong & Heidi Schwaninger & Charles Simon & Edward Buckler & Sean Myles, 2013. "Genomics Assisted Ancestry Deconvolution in Grape," PLOS ONE, Public Library of Science, vol. 8(11), pages 1-1, November.
    5. Jun Zhang, 2010. "Ancestral Informative Marker Selection and Population Structure Visualization Using Sparse Laplacian Eigenfunctions," PLOS ONE, Public Library of Science, vol. 5(11), pages 1-12, November.
    6. Gyaneshwer Chaubey & Anurag Kadian & Saroj Bala & Vadlamudi Raghavendra Rao, 2015. "Genetic Affinity of the Bhil, Kol and Gond Mentioned in Epic Ramayana," PLOS ONE, Public Library of Science, vol. 10(6), pages 1-11, June.
    7. Daniel Svensson & Matilda Rentoft & Anna M Dahlin & Emma Lundholm & Pall I Olason & Andreas Sjödin & Carin Nylander & Beatrice S Melin & Johan Trygg & Erik Johansson, 2020. "A whole-genome sequenced control population in northern Sweden reveals subregional genetic differences," PLOS ONE, Public Library of Science, vol. 15(9), pages 1-18, September.
    8. Estavoyer, Maxime & François, Olivier, 2022. "Theoretical analysis of principal components in an umbrella model of intraspecific evolution," Theoretical Population Biology, Elsevier, vol. 148(C), pages 11-21.
    9. Zhiqiu Hu & Rong-Cai Yang, 2013. "A New Distribution-Free Approach to Constructing the Confidence Region for Multiple Parameters," PLOS ONE, Public Library of Science, vol. 8(12), pages 1-13, December.
    10. Felsenstein, Joseph, 2015. "Covariation of gene frequencies in a stepping-stone lattice of populations," Theoretical Population Biology, Elsevier, vol. 100(C), pages 88-97.
    11. Yaron Granot & Omri Tal & Saharon Rosset & Karl Skorecki, 2016. "On the Apportionment of Population Structure," PLOS ONE, Public Library of Science, vol. 11(8), pages 1-24, August.
    12. Daniel Z Sui, 2010. "Commentary," Environment and Planning A, , vol. 42(8), pages 1775-1781, August.
    13. Hyosik Jang & Ian M Ehrenreich, 2012. "Genome-Wide Characterization of Genetic Variation in the Unicellular, Green Alga Chlamydomonas reinhardtii," PLOS ONE, Public Library of Science, vol. 7(7), pages 1-9, July.
    14. Mathieu Gautier & Denis Laloë & Katayoun Moazami-Goudarzi, 2010. "Insights into the Genetic History of French Cattle from Dense SNP Data on 47 Worldwide Breeds," PLOS ONE, Public Library of Science, vol. 5(9), pages 1-11, September.
    15. Xiaofeng Cai & Xuepeng Sun & Chenxi Xu & Honghe Sun & Xiaoli Wang & Chenhui Ge & Zhonghua Zhang & Quanxi Wang & Zhangjun Fei & Chen Jiao & Quanhua Wang, 2021. "Genomic analyses provide insights into spinach domestication and the genetic basis of agronomic traits," Nature Communications, Nature, vol. 12(1), pages 1-12, December.
    16. Lee, Anthony J. & Hibbs, Courtney & Wright, Margaret J. & Martin, Nicholas G. & Keller, Matthew C. & Zietsch, Brendan P., 2017. "Assessing the accuracy of perceptions of intelligence based on heritable facial features," Intelligence, Elsevier, vol. 64(C), pages 1-8.
    17. Thompson Katherine L. & Linnen Catherine R. & Kubatko Laura, 2016. "Tree-based quantitative trait mapping in the presence of external covariates," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 15(6), pages 473-490, December.
    18. Matthieu Bouaziz & Caroline Paccard & Mickael Guedj & Christophe Ambroise, 2012. "SHIPS: Spectral Hierarchical Clustering for the Inference of Population Structure in Genetic Studies," PLOS ONE, Public Library of Science, vol. 7(10), pages 1-17, October.
    19. Jacobo Pardo-Seco & Alberto Gómez-Carballa & Jorge Amigo & Federico Martinón-Torres & Antonio Salas, 2014. "A Genome-Wide Study of Modern-Day Tuscans: Revisiting Herodotus's Theory on the Origin of the Etruscans," PLOS ONE, Public Library of Science, vol. 9(9), pages 1-11, September.
    20. Andrey V Khrunin & Denis V Khokhrin & Irina N Filippova & Tõnu Esko & Mari Nelis & Natalia A Bebyakova & Natalia L Bolotova & Janis Klovins & Liene Nikitina-Zake & Karola Rehnström & Samuli Ripatti & , 2013. "A Genome-Wide Analysis of Populations from European Russia Reveals a New Pole of Genetic Diversity in Northern Europe," PLOS ONE, Public Library of Science, vol. 8(3), pages 1-9, March.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0029901. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.