IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0135832.html
   My bibliography  Save this article

Influence of Feature Encoding and Choice of Classifier on Disease Risk Prediction in Genome-Wide Association Studies

Author

Listed:
  • Florian Mittag
  • Michael Römer
  • Andreas Zell

Abstract

Various attempts have been made to predict the individual disease risk based on genotype data from genome-wide association studies (GWAS). However, most studies only investigated one or two classification algorithms and feature encoding schemes. In this study, we applied seven different classification algorithms on GWAS case-control data sets for seven different diseases to create models for disease risk prediction. Further, we used three different encoding schemes for the genotypes of single nucleotide polymorphisms (SNPs) and investigated their influence on the predictive performance of these models. Our study suggests that an additive encoding of the SNP data should be the preferred encoding scheme, as it proved to yield the best predictive performances for all algorithms and data sets. Furthermore, our results showed that the differences between most state-of-the-art classification algorithms are not statistically significant. Consequently, we recommend to prefer algorithms with simple models like the linear support vector machine (SVM) as they allow for better subsequent interpretation without significant loss of accuracy.

Suggested Citation

  • Florian Mittag & Michael Römer & Andreas Zell, 2015. "Influence of Feature Encoding and Choice of Classifier on Disease Risk Prediction in Genome-Wide Association Studies," PLOS ONE, Public Library of Science, vol. 10(8), pages 1-18, August.
  • Handle: RePEc:plo:pone00:0135832
    DOI: 10.1371/journal.pone.0135832
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0135832
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0135832&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0135832?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Robert Sladek & Ghislain Rocheleau & Johan Rung & Christian Dina & Lishuang Shen & David Serre & Philippe Boutin & Daniel Vincent & Alexandre Belisle & Samy Hadjadj & Beverley Balkau & Barbara Heude &, 2007. "A genome-wide association study identifies novel risk loci for type 2 diabetes," Nature, Nature, vol. 445(7130), pages 881-885, February.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Ping Rao & Hao Wang & Honghong Fang & Qing Gao & Jie Zhang & Manshu Song & Yong Zhou & Youxin Wang & Wei Wang, 2016. "Association between IGF2BP2 Polymorphisms and Type 2 Diabetes Mellitus: A Case–Control Study and Meta-Analysis," IJERPH, MDPI, vol. 13(6), pages 1-13, June.
    2. Greve, Jane, 2008. "Obesity and labor market outcomes in Denmark," Economics & Human Biology, Elsevier, vol. 6(3), pages 350-362, December.
    3. John PA Ioannidis & Nikolaos A Patsopoulos & Evangelos Evangelou, 2007. "Heterogeneity in Meta-Analyses of Genome-Wide Association Investigations," PLOS ONE, Public Library of Science, vol. 2(9), pages 1-7, September.
    4. Paul F O’Reilly & Clive J Hoggart & Yotsawat Pomyen & Federico C F Calboli & Paul Elliott & Marjo-Riitta Jarvelin & Lachlan J M Coin, 2012. "MultiPhen: Joint Model of Multiple Phenotypes Can Increase Discovery in GWAS," PLOS ONE, Public Library of Science, vol. 7(5), pages 1-1, May.
    5. Sato Yasunori & Laird Nan & Suganami Hideki & Hamada Chikuma & Niki Naoto & Yoshimura Isao & Yoshida Teruhiko, 2009. "Statistical Screening Method for Genetic Factors Influencing Susceptibility to Common Diseases in a Two-Stage Genome-Wide Association Study," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 8(1), pages 1-21, November.
    6. Jiajin Li & Brandon Jew & Lingyu Zhan & Sungoo Hwang & Giovanni Coppola & Nelson B Freimer & Jae Hoon Sul, 2019. "ForestQC: Quality control on genetic variants from next-generation sequencing data using random forest," PLOS Computational Biology, Public Library of Science, vol. 15(12), pages 1-30, December.
    7. Guang Guo, 2008. "Introduction to the Special Issue on Society and Genetics," Sociological Methods & Research, , vol. 37(2), pages 159-163, November.
    8. Peristera Paschou & Petros Drineas & Jamey Lewis & Caroline M Nievergelt & Deborah A Nickerson & Joshua D Smith & Paul M Ridker & Daniel I Chasman & Ronald M Krauss & Elad Ziv, 2008. "Tracing Sub-Structure in the European American Population with PCA-Informative Markers," PLOS Genetics, Public Library of Science, vol. 4(7), pages 1-13, July.
    9. Hongyan Mao & Qin Li & Shujun Gao, 2012. "Meta-Analysis of the Relationship between Common Type 2 Diabetes Risk Gene Variants with Gestational Diabetes Mellitus," PLOS ONE, Public Library of Science, vol. 7(9), pages 1-7, September.
    10. Ekaterina Alekseevna Sokolova & Irina Arkadievna Bondar & Olesya Yurievna Shabelnikova & Olga Vladimirovna Pyankova & Maxim Leonidovich Filipenko, 2015. "Replication of KCNJ11 (p.E23K) and ABCC8 (p.S1369A) Association in Russian Diabetes Mellitus 2 Type Cohort and Meta-Analysis," PLOS ONE, Public Library of Science, vol. 10(5), pages 1-21, May.
    11. Xiaobo Li & Yuqiong Li & Bei Song & Shujie Guo & Shaoli Chu & Nan Jia & Wenquan Niu, 2012. "Hematopoietically-Expressed Homeobox Gene Three Widely-Evaluated Polymorphisms and Risk for Diabetes: A Meta-Analysis," PLOS ONE, Public Library of Science, vol. 7(11), pages 1-10, November.
    12. Ren Matsuba & Kensuke Sakai & Minako Imamura & Yasushi Tanaka & Minoru Iwata & Hiroshi Hirose & Kohei Kaku & Hiroshi Maegawa & Hirotaka Watada & Kazuyuki Tobe & Atsunori Kashiwagi & Ryuzo Kawamori & S, 2015. "Replication Study in a Japanese Population to Evaluate the Association between 10 SNP Loci, Identified in European Genome-Wide Association Studies, and Type 2 Diabetes," PLOS ONE, Public Library of Science, vol. 10(5), pages 1-13, May.
    13. Shuang-Xia Zhao & Chun-Ming Pan & Huang-Ming Cao & Bing Han & Jing-Yi Shi & Jun Liang & Guan-Qi Gao & Yong-De Peng & Qing Su & Jia-Lun Chen & Jia-Jun Zhao & Huai-Dong Song, 2010. "Association of the CTLA4 Gene with Graves' Disease in the Chinese Han Population," PLOS ONE, Public Library of Science, vol. 5(3), pages 1-10, March.
    14. Ren Matsuba & Minako Imamura & Yasushi Tanaka & Minoru Iwata & Hiroshi Hirose & Kohei Kaku & Hiroshi Maegawa & Hirotaka Watada & Kazuyuki Tobe & Atsunori Kashiwagi & Ryuzo Kawamori & Shiro Maeda, 2016. "Replication Study in a Japanese Population of Six Susceptibility Loci for Type 2 Diabetes Originally Identified by a Transethnic Meta-Analysis of Genome-Wide Association Studies," PLOS ONE, Public Library of Science, vol. 11(4), pages 1-9, April.
    15. Ping Rao & Yong Zhou & Si-Qi Ge & An-Xin Wang & Xin-Wei Yu & Mohamed Ali Alzain & Andrea Katherine Veronica & Jing Qiu & Man-Shu Song & Jie Zhang & Hao Wang & Hong-Hong Fang & Qing Gao & You-Xin Wang , 2016. "Validation of Type 2 Diabetes Risk Variants Identified by Genome-Wide Association Studies in Northern Han Chinese," IJERPH, MDPI, vol. 13(9), pages 1-10, August.
    16. Nicholette D Palmer & Caitrin W McDonough & Pamela J Hicks & Bong H Roh & Maria R Wing & S Sandy An & Jessica M Hester & Jessica N Cooke & Meredith A Bostrom & Megan E Rudock & Matthew E Talbert & Jos, 2012. "A Genome-Wide Association Search for Type 2 Diabetes Genes in African Americans," PLOS ONE, Public Library of Science, vol. 7(1), pages 1-14, January.
    17. Qing Ma & Yini Xiao & Wenjun Xu & Menghan Wang & Sheng Li & Zhihao Yang & Minglu Xu & Tengjiao Zhang & Zhen-Ning Zhang & Rui Hu & Qiang Su & Fei Yuan & Tinghui Xiao & Xuan Wang & Qing He & Jiaxu Zhao , 2022. "ZnT8 loss-of-function accelerates functional maturation of hESC-derived β cells and resists metabolic stress in diabetes," Nature Communications, Nature, vol. 13(1), pages 1-16, December.
    18. Yuan Min & Tian Xin & Zheng Gang & Yang Yaning, 2009. "Adaptive Transmission Disequilibrium Test for Family Trio Design," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 8(1), pages 1-20, June.
    19. Raamesh Deshpande & Shikha Sharma & Catherine M Verfaillie & Wei-Shou Hu & Chad L Myers, 2010. "A Scalable Approach for Discovering Conserved Active Subnetworks across Species," PLOS Computational Biology, Public Library of Science, vol. 6(12), pages 1-18, December.
    20. Inga Prokopenko & Wenny Poon & Reedik Mägi & Rashmi Prasad B & S Albert Salehi & Peter Almgren & Peter Osmark & Nabila Bouatia-Naji & Nils Wierup & Tove Fall & Alena Stančáková & Adam Barker & Vasilik, 2014. "A Central Role for GRB10 in Regulation of Islet Function in Man," PLOS Genetics, Public Library of Science, vol. 10(4), pages 1-13, April.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0135832. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.