IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0024982.html
   My bibliography  Save this article

SNPpy - Database Management for SNP Data from Genome Wide Association Studies

Author

Listed:
  • Faheem Mitha
  • Herodotos Herodotou
  • Nedyalko Borisov
  • Chen Jiang
  • Josh Yoder
  • Kouros Owzar

Abstract

Background: We describe SNPpy, a hybrid script database system using the Python SQLAlchemy library coupled with the PostgreSQL database to manage genotype data from Genome-Wide Association Studies (GWAS). This system makes it possible to merge study data with HapMap data and merge across studies for meta-analyses, including data filtering based on the values of phenotype and Single-Nucleotide Polymorphism (SNP) data. SNPpy and its dependencies are open source software. Results: The current version of SNPpy offers utility functions to import genotype and annotation data from two commercial platforms. We use these to import data from two GWAS studies and the HapMap Project. We then export these individual datasets to standard data format files that can be imported into statistical software for downstream analyses. Conclusions: By leveraging the power of relational databases, SNPpy offers integrated management and manipulation of genotype and phenotype data from GWAS studies. The analysis of these studies requires merging across GWAS datasets as well as patient and marker selection. To this end, SNPpy enables the user to filter the data and output the results as standardized GWAS file formats. It does low level and flexible data validation, including validation of patient data. SNPpy is a practical and extensible solution for investigators who seek to deploy central management of their GWAS data.

Suggested Citation

  • Faheem Mitha & Herodotos Herodotou & Nedyalko Borisov & Chen Jiang & Josh Yoder & Kouros Owzar, 2011. "SNPpy - Database Management for SNP Data from Genome Wide Association Studies," PLOS ONE, Public Library of Science, vol. 6(10), pages 1-8, October.
  • Handle: RePEc:plo:pone00:0024982
    DOI: 10.1371/journal.pone.0024982
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0024982
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0024982&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0024982?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Bryan N Howie & Peter Donnelly & Jonathan Marchini, 2009. "A Flexible and Accurate Genotype Imputation Method for the Next Generation of Genome-Wide Association Studies," PLOS Genetics, Public Library of Science, vol. 5(6), pages 1-15, June.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Yikun Zhao & Bin Jiang & Yongxue Huo & Hongmei Yi & Hongli Tian & Haotian Wu & Rui Wang & Jiuran Zhao & Fengge Wang, 2021. "A High-Performance Database Management System for Managing and Analyzing Large-Scale SNP Data in Plant Genotyping and Breeding Applications," Agriculture, MDPI, vol. 11(11), pages 1-21, October.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Daniel Svensson & Matilda Rentoft & Anna M Dahlin & Emma Lundholm & Pall I Olason & Andreas Sjödin & Carin Nylander & Beatrice S Melin & Johan Trygg & Erik Johansson, 2020. "A whole-genome sequenced control population in northern Sweden reveals subregional genetic differences," PLOS ONE, Public Library of Science, vol. 15(9), pages 1-18, September.
    2. Chuan Gao & Nan Wang & Xiuqing Guo & Julie T Ziegler & Kent D Taylor & Anny H Xiang & Yang Hai & Steven J Kridel & Jerry L Nadler & Fouad Kandeel & Leslie J Raffel & Yii-Der I Chen & Jill M Norris & J, 2015. "A Comprehensive Analysis of Common and Rare Variants to Identify Adiposity Loci in Hispanic Americans: The IRAS Family Study (IRASFS)," PLOS ONE, Public Library of Science, vol. 10(11), pages 1-17, November.
    3. Paul S de Vries & Maria Sabater-Lleal & Daniel I Chasman & Stella Trompet & Tarunveer S Ahluwalia & Alexander Teumer & Marcus E Kleber & Ming-Huei Chen & Jie Jin Wang & John R Attia & Riccardo E Mario, 2017. "Comparison of HapMap and 1000 Genomes Reference Panels in a Large-Scale Genome-Wide Association Study," PLOS ONE, Public Library of Science, vol. 12(1), pages 1-22, January.
    4. Bo Jiang & Jun S. Liu, 2015. "Bayesian Partition Models for Identifying Expression Quantitative Trait Loci," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 110(512), pages 1350-1361, December.
    5. Rakesh Chettier & Lesa Nelson & James W Ogilvie & Hans M Albertsen & Kenneth Ward, 2015. "Haplotypes at LBX1 Have Distinct Inheritance Patterns with Opposite Effects in Adolescent Idiopathic Scoliosis," PLOS ONE, Public Library of Science, vol. 10(2), pages 1-11, February.
    6. Michel S. Naslavsky & Marilia O. Scliar & Guilherme L. Yamamoto & Jaqueline Yu Ting Wang & Stepanka Zverinova & Tatiana Karp & Kelly Nunes & José Ricardo Magliocco Ceroni & Diego Lima Carvalho & Carlo, 2022. "Whole-genome sequencing of 1,171 elderly admixed individuals from Brazil," Nature Communications, Nature, vol. 13(1), pages 1-11, December.
    7. Steinrücken, Matthias & Paul, Joshua S. & Song, Yun S., 2013. "A sequentially Markov conditional sampling distribution for structured populations with migration and recombination," Theoretical Population Biology, Elsevier, vol. 87(C), pages 51-61.
    8. Anshuman Sewda & A J Agopian & Elizabeth Goldmuntz & Hakon Hakonarson & Bernice E Morrow & Fadi Musfee & Deanne Taylor & Laura E Mitchell & on behalf of the Pediatric Cardiac Genomics Consortium, 2020. "Gene-based analyses of the maternal genome implicate maternal effect genes as risk factors for conotruncal heart defects," PLOS ONE, Public Library of Science, vol. 15(6), pages 1-15, June.
    9. Lin Yuan & Chang-An Yuan & De-Shuang Huang, 2017. "FAACOSE: A Fast Adaptive Ant Colony Optimization Algorithm for Detecting SNP Epistasis," Complexity, Hindawi, vol. 2017, pages 1-10, September.
    10. Carl Nettelblad, 2013. "Breakdown of Methods for Phasing and Imputation in the Presence of Double Genotype Sharing," PLOS ONE, Public Library of Science, vol. 8(3), pages 1-5, March.
    11. Viinikainen, Jutta & Bryson, Alex & Böckerman, Petri & Kari, Jaana T. & Lehtimäki, Terho & Raitakari, Olli & Viikari, Jorma & Pehkonen, Jaakko, 2022. "Does better education mitigate risky health behavior? A mendelian randomization study," Economics & Human Biology, Elsevier, vol. 46(C).
    12. Cavin K Ward-Caviness & Paul S de Vries & Kerri L Wiggins & Jennifer E Huffman & Lisa R Yanek & Lawrence F Bielak & Franco Giulianini & Xiuqing Guo & Marcus E Kleber & Tim Kacprowski & Stefan Groß & A, 2019. "Mendelian randomization evaluation of causal effects of fibrinogen on incident coronary heart disease," PLOS ONE, Public Library of Science, vol. 14(5), pages 1-18, May.
    13. Ani Manichaikul & Xin-Qun Wang & Solomon K Musani & David M Herrington & Wendy S Post & James G Wilson & Stephen S Rich & Annabelle Rodriguez, 2015. "Association of the Lipoprotein Receptor SCARB1 Common Missense Variant rs4238001 with Incident Coronary Heart Disease," PLOS ONE, Public Library of Science, vol. 10(5), pages 1-16, May.
    14. Morten Dybdahl Krebs & Gonçalo Espregueira Themudo & Michael Eriksen Benros & Ole Mors & Anders D. Børglum & David Hougaard & Preben Bo Mortensen & Merete Nordentoft & Michael J. Gandal & Chun Chieh F, 2021. "Associations between patterns in comorbid diagnostic trajectories of individuals with schizophrenia and etiological factors," Nature Communications, Nature, vol. 12(1), pages 1-12, December.
    15. Heejung Shim & Daniel I Chasman & Joshua D Smith & Samia Mora & Paul M Ridker & Deborah A Nickerson & Ronald M Krauss & Matthew Stephens, 2015. "A Multivariate Genome-Wide Association Analysis of 10 LDL Subfractions, and Their Response to Statin Treatment, in 1868 Caucasians," PLOS ONE, Public Library of Science, vol. 10(4), pages 1-20, April.
    16. Mette K Andersen & Emil Jørsboe & Line Skotte & Kristian Hanghøj & Camilla H Sandholt & Ida Moltke & Niels Grarup & Timo Kern & Yuvaraj Mahendran & Bolette Søborg & Peter Bjerregaard & Christina V L L, 2020. "The derived allele of a novel intergenic variant at chromosome 11 associates with lower body mass index and a favorable metabolic phenotype in Greenlanders," PLOS Genetics, Public Library of Science, vol. 16(1), pages 1-17, January.
    17. Gianmarco Mignogna & Caitlin E. Carey & Robbee Wedow & Nikolas Baya & Mattia Cordioli & Nicola Pirastu & Rino Bellocco & Kathryn Fiuza Malerbi & Michel G. Nivard & Benjamin M. Neale & Raymond K. Walte, 2023. "Patterns of item nonresponse behaviour to survey questionnaires are systematic and associated with genetic loci," Nature Human Behaviour, Nature, vol. 7(8), pages 1371-1387, August.
    18. Xiaodong Cai & Juan Andrés Bazerque & Georgios B Giannakis, 2013. "Inference of Gene Regulatory Networks with Sparse Structural Equation Models Exploiting Genetic Perturbations," PLOS Computational Biology, Public Library of Science, vol. 9(5), pages 1-13, May.
    19. Hans M Albertsen & Rakesh Chettier & Pamela Farrington & Kenneth Ward, 2013. "Genome-Wide Association Study Link Novel Loci to Endometriosis," PLOS ONE, Public Library of Science, vol. 8(3), pages 1-8, March.
    20. Gemma Cadby & Corey Giles & Phillip E. Melton & Kevin Huynh & Natalie A. Mellett & Thy Duong & Anh Nguyen & Michelle Cinel & Alex Smith & Gavriel Olshansky & Tingting Wang & Marta Brozynska & Mike Ino, 2022. "Comprehensive genetic analysis of the human lipidome identifies loci associated with lipid homeostasis with links to coronary artery disease," Nature Communications, Nature, vol. 13(1), pages 1-17, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0024982. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.