Genome-wide linkage scans and basic bioinformatics implemented using Stata/SE

My bibliography Save this paper

Genome-wide linkage scans and basic bioinformatics implemented using Stata/SE

Author

Listed:

Toby Andrew
(Twin & Genetic Epidemiology Research Unit, Department of Medicine, St Thomas' Hospital)

Registered:

Abstract

Searches for genes using linkage analyses with genetic markers placed across the entire human genome are hypothesis-free experiments, which represent an extreme form of multiple testing. As such, the low p-values required to obtain nominal significance make accurate diagnostics essential to assess model fit and to eliminate naive incorrect results. In hypothesis-driven single tests, researchers usually take good care to assess model fit and the validity of model assumptions, but such concerns are usually ignored when it comes to linkage analysis. This is particularly problematic where low thresholds (p > 0.0001) can result in extreme sensitivity to outlying observations and for some models (e.g. standard variance component analysis), greater sensitivity to violation of model assumptions. Here we attempt to address these problems for genomic data based on 1300 healthy sib-pairs (dizygotic twins) using modified Haseman-Elston regression-based linkage analysis for quantitative traits, in which sib-pair phenotypic covariance is correlated with genetic marker covariance. The statistical theory underpinning the implementation of tests for linkage using generalized linear models (GLM) (Author-Email: glm in Stata) is documented in detail elsewhere. In brief, the advantage of analysing sib-pairs using GLM is that the approach shares all of the strengths of OLS and variance components, but none of their weaknesses. These are that (1) unlike OLS, the residual errors are correctly specified with a gamma distribution and known heteroscedasticity is accounted for; (2) unlike standard variance components, by freely estimating the coefficient of variation, GLM is robust to phenotypic deviations from multivariate normality. Just as important are the practical advantages. With the release of Stata8/Special Edition for large datasets, we have been able to store and check genetic markers for all 22 pairs of autosomal chromosomes plus sex chromosomes. In addition, we have generated 2-point and multipoint allele-sharing identical by descent (IBD) elsewhere and imported this into Stata. Using Stata scripts with a simple loop structure that calls on the glm command, we are able to perform genome-wide scans and save any summary statistics to file. We have been able to utilise the following features in Stata: 1. correct diagnostics on a genome-wide basis that are not normally made available to users of applied linkage packages 2. robust estimates of significance, such as Huber sandwich estimates, bootstrap routines, permutation tests, etc. 3. probability weighting to utilise the full probability distribution of the number of alleles shared IBD 4. computationally fast and easy to implement Finally, we also can perform basic, but powerful bioinformatics tasks such as: 1. using the xpose command to summarise marker information by chromosome and sib-pair 2. resolving marker order more accurately, which is essential for correct multipoint IBD generation, by interpolating genetic distance using the latest physical and genetic marker maps

Suggested Citation

Toby Andrew, 2004. "Genome-wide linkage scans and basic bioinformatics implemented using Stata/SE," United Kingdom Stata Users' Group Meetings 2004 18, Stata Users Group.

Handle: RePEc:boc:usug04:18

Download full text from publisher

More about this item

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:boc:usug04:18. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

We have no bibliographic references for this item. You can help adding them by using this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Christopher F Baum (email available below). General contact details of provider: https://edirc.repec.org/data/stataea.html .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

Genome-wide linkage scans and basic bioinformatics implemented using Stata/SE

Author

Abstract

Suggested Citation

Download full text from publisher

More about this item

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data