IDEAS home Printed from https://ideas.repec.org/p/boc/usug04/18.html
   My bibliography  Save this paper

Genome-wide linkage scans and basic bioinformatics implemented using Stata/SE

Author

Listed:
  • Toby Andrew

    (Twin & Genetic Epidemiology Research Unit, Department of Medicine, St Thomas' Hospital)

Abstract

Searches for genes using linkage analyses with genetic markers placed across the entire human genome are hypothesis-free experiments, which represent an extreme form of multiple testing. As such, the low p-values required to obtain nominal significance make accurate diagnostics essential to assess model fit and to eliminate naive incorrect results. In hypothesis-driven single tests, researchers usually take good care to assess model fit and the validity of model assumptions, but such concerns are usually ignored when it comes to linkage analysis. This is particularly problematic where low thresholds (p > 0.0001) can result in extreme sensitivity to outlying observations and for some models (e.g. standard variance component analysis), greater sensitivity to violation of model assumptions. Here we attempt to address these problems for genomic data based on 1300 healthy sib-pairs (dizygotic twins) using modified Haseman-Elston regression-based linkage analysis for quantitative traits, in which sib-pair phenotypic covariance is correlated with genetic marker covariance. The statistical theory underpinning the implementation of tests for linkage using generalized linear models (GLM) (Author-Email: glm in Stata) is documented in detail elsewhere. In brief, the advantage of analysing sib-pairs using GLM is that the approach shares all of the strengths of OLS and variance components, but none of their weaknesses. These are that (1) unlike OLS, the residual errors are correctly specified with a gamma distribution and known heteroscedasticity is accounted for; (2) unlike standard variance components, by freely estimating the coefficient of variation, GLM is robust to phenotypic deviations from multivariate normality. Just as important are the practical advantages. With the release of Stata8/Special Edition for large datasets, we have been able to store and check genetic markers for all 22 pairs of autosomal chromosomes plus sex chromosomes. In addition, we have generated 2-point and multipoint allele-sharing identical by descent (IBD) elsewhere and imported this into Stata. Using Stata scripts with a simple loop structure that calls on the glm command, we are able to perform genome-wide scans and save any summary statistics to file. We have been able to utilise the following features in Stata: 1. correct diagnostics on a genome-wide basis that are not normally made available to users of applied linkage packages 2. robust estimates of significance, such as Huber sandwich estimates, bootstrap routines, permutation tests, etc. 3. probability weighting to utilise the full probability distribution of the number of alleles shared IBD 4. computationally fast and easy to implement Finally, we also can perform basic, but powerful bioinformatics tasks such as: 1. using the xpose command to summarise marker information by chromosome and sib-pair 2. resolving marker order more accurately, which is essential for correct multipoint IBD generation, by interpolating genetic distance using the latest physical and genetic marker maps

Suggested Citation

  • Toby Andrew, 2004. "Genome-wide linkage scans and basic bioinformatics implemented using Stata/SE," United Kingdom Stata Users' Group Meetings 2004 18, Stata Users Group.
  • Handle: RePEc:boc:usug04:18
    as

    Download full text from publisher

    File URL: http://fmwww.bc.edu/repec/usug2004/Stata2004_ta.ppt.zip
    File Function: Lecture slides in PowerPoint format
    Download Restriction: no
    ---><---

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:boc:usug04:18. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Christopher F Baum (email available below). General contact details of provider: https://edirc.repec.org/data/stataea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.