This file is part of IDEAS, which uses RePEc data


[ Papers | Articles | Software | Books | Chapters | Authors | Institutions | JEL Classification | NEP reports | Search | New papers by email | Author registration | Rankings | Volunteers | FAQ | Blog | Help! ]

Genome-wide linkage scans and basic bioinformatics implemented using Stata/SE

Author info | Abstract | Publisher info | Download info | Related research | Statistics
Author Info
Toby Andrew () (Twin & Genetic Epidemiology Research Unit, Department of Medicine, St Thomas' Hospital)
Abstract

Searches for genes using linkage analyses with genetic markers placed across the entire human genome are hypothesis-free experiments, which represent an extreme form of multiple testing. As such, the low p-values required to obtain nominal significance make accurate diagnostics essential to assess model fit and to eliminate naive incorrect results. In hypothesis-driven single tests, researchers usually take good care to assess model fit and the validity of model assumptions, but such concerns are usually ignored when it comes to linkage analysis. This is particularly problematic where low thresholds (p > 0.0001) can result in extreme sensitivity to outlying observations and for some models (e.g. standard variance component analysis), greater sensitivity to violation of model assumptions. Here we attempt to address these problems for genomic data based on 1300 healthy sib-pairs (dizygotic twins) using modified Haseman-Elston regression-based linkage analysis for quantitative traits, in which sib-pair phenotypic covariance is correlated with genetic marker covariance. The statistical theory underpinning the implementation of tests for linkage using generalized linear models (GLM) (Author-Email: glm in Stata) is documented in detail elsewhere. In brief, the advantage of analysing sib-pairs using GLM is that the approach shares all of the strengths of OLS and variance components, but none of their weaknesses. These are that (1) unlike OLS, the residual errors are correctly specified with a gamma distribution and known heteroscedasticity is accounted for; (2) unlike standard variance components, by freely estimating the coefficient of variation, GLM is robust to phenotypic deviations from multivariate normality. Just as important are the practical advantages. With the release of Stata8/Special Edition for large datasets, we have been able to store and check genetic markers for all 22 pairs of autosomal chromosomes plus sex chromosomes. In addition, we have generated 2-point and multipoint allele-sharing identical by descent (IBD) elsewhere and imported this into Stata. Using Stata scripts with a simple loop structure that calls on the glm command, we are able to perform genome-wide scans and save any summary statistics to file. We have been able to utilise the following features in Stata: 1. correct diagnostics on a genome-wide basis that are not normally made available to users of applied linkage packages 2. robust estimates of significance, such as Huber sandwich estimates, bootstrap routines, permutation tests, etc. 3. probability weighting to utilise the full probability distribution of the number of alleles shared IBD 4. computationally fast and easy to implement Finally, we also can perform basic, but powerful bioinformatics tasks such as: 1. using the xpose command to summarise marker information by chromosome and sib-pair 2. resolving marker order more accurately, which is essential for correct multipoint IBD generation, by interpolating genetic distance using the latest physical and genetic marker maps

Download Info
To download:

If you experience problems downloading a file, check if you have the proper application to view it first. Information about this may be contained in the File-Format links below. In case of further problems read the IDEAS help page. Note that these files are not on the IDEAS site. Please be patient as the files may be large.

File URL: http://fmwww.bc.edu/repec/usug2004/Stata2004_ta.ppt.zip
File Format: application/zip
File Function: Lecture slides in PowerPoint format
Download Restriction: no

Publisher Info
Paper provided by Stata Users Group in its series United Kingdom Stata Users' Group Meetings 2004 with number 18.

Download reference. The following formats are available: HTML (with abstract), plain text (with abstract), BibTeX, RIS (EndNote, RefMan, ProCite), ReDIF
Length:
Date of creation: 30 Jun 2004
Date of revision:
Handle: RePEc:boc:usug04:18

Contact details of provider:
Postal: Administration Building, 140 Commonwealth Avenue, Chestnut Hill MA 02467
Phone: 617-552-3670
Fax: 617-552-2308
Email:
Web page: http://www.stata.com/meeting/10uk
More information through EDIRC

For technical questions regarding this item, or to correct its listing, contact: (Christopher F Baum).

Related research
Keywords:

This paper has been announced in the following NEP Reports:

Statistics
Access and download statistics

Did you know? There is a FAQ (frequently asked questions).

This page was last updated on 2009-12-15.


This information is provided to you by IDEAS at the Department of Economics, College of Liberal Arts and Sciences, University of Connecticut using RePEc data on a server sponsored by the Society for Economic Dynamics.