IDEAS home Printed from https://ideas.repec.org/p/zbw/sfb475/200803.html
   My bibliography  Save this paper

Imputing missing genotypes with weighted k nearest neighbors

Author

Listed:
  • Schwender, Holger
  • Ickstadt, Katja

Abstract

Motivation: Missing values are a common problem in genetic association studies concerned with single nucleotide polymorphisms (SNPs). Since most statistical methods cannot handle missing values, they have to be removed prior to the actual analysis. Considering only complete observations, however, often leads to an immense loss of information. Therefore, procedures are needed that can be used to replace such missing values. In this article, we propose a method based on weighted k nearest neighbors that can be employed for imputing such missing genotypes. Results: In a comparison to other imputation approaches, our procedure called KNNcatImpute shows the lowest rates of falsely imputed genotypes when applied to the SNP data from the GENICA study, a study dedicated to the identification of genetic and gene-environment interactions associated with sporadic breast cancer. Moreover, in contrast to other imputation methods that take all variables into account when replacing missing values of a particular variable, KNNcatImpute is not restricted to association studies comprising several ten to a few hundred SNPs, but can also be applied to data from whole-genome studies, as an application to a subset of the HapMap data shows.

Suggested Citation

  • Schwender, Holger & Ickstadt, Katja, 2008. "Imputing missing genotypes with weighted k nearest neighbors," Technical Reports 2008,03, Technische Universität Dortmund, Sonderforschungsbereich 475: Komplexitätsreduktion in multivariaten Datenstrukturen.
  • Handle: RePEc:zbw:sfb475:200803
    as

    Download full text from publisher

    File URL: https://www.econstor.eu/bitstream/10419/36594/1/600052389.PDF
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Nunkesser, Robin & Bernholt, Thorsten & Schwender, Holger & Ickstadt, Katja & Wegener, Ing, 2007. "Detecting high-order interactions of single nucleotide polymorphisms using genetic programming," Technical Reports 2007,24, Technische Universität Dortmund, Sonderforschungsbereich 475: Komplexitätsreduktion in multivariaten Datenstrukturen.
    2. Ickstadt, Katja & Selinski, Silvia & Müller, Tina, 2005. "Cluster Analysis : A Comparison of Different Similarity Measures for SNP Data," Technical Reports 2005,14, Technische Universität Dortmund, Sonderforschungsbereich 475: Komplexitätsreduktion in multivariaten Datenstrukturen.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Rocco, Claudio M. & Hernandez-Perdomo, Elvis & Mun, Johnathan, 2021. "Application of logic regression to assess the importance of interactions between components in a network," Reliability Engineering and System Safety, Elsevier, vol. 205(C).
    2. Nunkesser, Robin & Morell, Oliver, 2008. "Evolutionary algorithms for robust methods," Technical Reports 2008,29, Technische Universität Dortmund, Sonderforschungsbereich 475: Komplexitätsreduktion in multivariaten Datenstrukturen.
    3. Selinski, Silvia, 2006. "Similarity Measures for Clustering SNP and Epidemiological Data," Technical Reports 2006,25, Technische Universität Dortmund, Sonderforschungsbereich 475: Komplexitätsreduktion in multivariaten Datenstrukturen.
    4. Schwender, Holger, 2007. "A note on the simultaneous computation of thousands of Pearson's X2-Statistics," Technical Reports 2007,19, Technische Universität Dortmund, Sonderforschungsbereich 475: Komplexitätsreduktion in multivariaten Datenstrukturen.
    5. Zhong Wang & Tian Liu & Zhenwu Lin & John Hegarty & Walter A Koltun & Rongling Wu, 2010. "A General Model for Multilocus Epistatic Interactions in Case-Control Studies," PLOS ONE, Public Library of Science, vol. 5(8), pages 1-9, August.
    6. Ickstadt, Katja & Selinski, Silvia, 2005. "Similarity Measures for Clustering SNP Data," Technical Reports 2005,27, Technische Universität Dortmund, Sonderforschungsbereich 475: Komplexitätsreduktion in multivariaten Datenstrukturen.
    7. Nunkesser, Robin, 2008. "RFreak-An R-package for evolutionary computation," Technical Reports 2008,12, Technische Universität Dortmund, Sonderforschungsbereich 475: Komplexitätsreduktion in multivariaten Datenstrukturen.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:zbw:sfb475:200803. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ZBW - Leibniz Information Centre for Economics (email available below). General contact details of provider: https://edirc.repec.org/data/isdorde.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.