IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0135918.html
   My bibliography  Save this article

Using Hamming Distance as Information for SNP-Sets Clustering and Testing in Disease Association Studies

Author

Listed:
  • Charlotte Wang
  • Wen-Hsin Kao
  • Chuhsing Kate Hsiao

Abstract

The availability of high-throughput genomic data has led to several challenges in recent genetic association studies, including the large number of genetic variants that must be considered and the computational complexity in statistical analyses. Tackling these problems with a marker-set study such as SNP-set analysis can be an efficient solution. To construct SNP-sets, we first propose a clustering algorithm, which employs Hamming distance to measure the similarity between strings of SNP genotypes and evaluates whether the given SNPs or SNP-sets should be clustered. A dendrogram can then be constructed based on such distance measure, and the number of clusters can be determined. With the resulting SNP-sets, we next develop an association test HDAT to examine susceptibility to the disease of interest. This proposed test assesses, based on Hamming distance, whether the similarity between a diseased and a normal individual differs from the similarity between two individuals of the same disease status. In our proposed methodology, only genotype information is needed. No inference of haplotypes is required, and SNPs under consideration do not need to locate in nearby regions. The proposed clustering algorithm and association test are illustrated with applications and simulation studies. As compared with other existing methods, the clustering algorithm is faster and better at identifying sets containing SNPs exerting a similar effect. In addition, the simulation studies demonstrated that the proposed test works well for SNP-sets containing a large proportion of neutral SNPs. Furthermore, employing the clustering algorithm before testing a large set of data improves the knowledge in confining the genetic regions for susceptible genetic markers.

Suggested Citation

  • Charlotte Wang & Wen-Hsin Kao & Chuhsing Kate Hsiao, 2015. "Using Hamming Distance as Information for SNP-Sets Clustering and Testing in Disease Association Studies," PLOS ONE, Public Library of Science, vol. 10(8), pages 1-24, August.
  • Handle: RePEc:plo:pone00:0135918
    DOI: 10.1371/journal.pone.0135918
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0135918
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0135918&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0135918?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Li Ma & Andrew G Clark & Alon Keinan, 2013. "Gene-Based Testing of Interactions in Association Studies of Quantitative Traits," PLOS Genetics, Public Library of Science, vol. 9(2), pages 1-12, February.
    2. Yung-Hsiang Huang & Mei-Hsien Lee & Wei J Chen & Chuhsing Kate Hsiao, 2011. "Using an Uncertainty-Coding Matrix in Bayesian Regression Models for Haplotype-Specific Risk Detection in Family Association Studies," PLOS ONE, Public Library of Science, vol. 6(7), pages 1-9, July.
    3. Hailiang Huang & Pritam Chanda & Alvaro Alonso & Joel S Bader & Dan E Arking, 2011. "Gene-Based Tests of Association," PLOS Genetics, Public Library of Science, vol. 7(7), pages 1-15, July.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Emily Mathieu, 2016. "AGGrEGATOr: A Gene-based GEne-Gene interActTiOn test for case-control association studies," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 15(2), pages 151-171, April.
    2. Diana Chang & Feng Gao & Andrea Slavney & Li Ma & Yedael Y Waldman & Aaron J Sams & Paul Billing-Ross & Aviv Madar & Richard Spritz & Alon Keinan, 2014. "Accounting for eXentricities: Analysis of the X Chromosome in GWAS Reveals X-Linked Genes Implicated in Autoimmune Diseases," PLOS ONE, Public Library of Science, vol. 9(12), pages 1-31, December.
    3. Simone Marini & Ivan Limongelli & Ettore Rizzo & Alberto Malovini & Edoardo Errichiello & Annalisa Vetro & Tan Da & Orsetta Zuffardi & Riccardo Bellazzi, 2016. "A Data Fusion Approach to Enhance Association Study in Epilepsy," PLOS ONE, Public Library of Science, vol. 11(12), pages 1-16, December.
    4. Pallav Bhatnagar & Emily Barron-Casella & Christopher J Bean & Jacqueline N Milton & Clinton T Baldwin & Martin H Steinberg & Michael DeBaun & James F Casella & Dan E Arking, 2013. "Genome-Wide Meta-Analysis of Systolic Blood Pressure in Children with Sickle Cell Disease," PLOS ONE, Public Library of Science, vol. 8(9), pages 1-1, September.
    5. Le Zhang & Chunqiu Zheng & Tian Li & Lei Xing & Han Zeng & Tingting Li & Huan Yang & Jia Cao & Badong Chen & Ziyuan Zhou, 2017. "Building Up a Robust Risk Mathematical Platform to Predict Colorectal Cancer," Complexity, Hindawi, vol. 2017, pages 1-14, October.
    6. Zheng Xu, 2023. "Association Testing of a Group of Genetic Markers Based on Next-Generation Sequencing Data and Continuous Response Using a Linear Model Framework," Mathematics, MDPI, vol. 11(6), pages 1-32, March.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0135918. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.