We describe regression-based methods for analyzing multiple-source data arising from complex sample survey designs in Stata. We use the term multiple-source data to encompass all cases where data are simultaneously obtained from multiple informants, or raters (e.g., self-reports, family members, health care providers, administrators) or via different/parallel instruments, indicators or methods (e.g., symptom rating scales, standardized diagnostic interviews, or clinical diagnoses). We review regression models for analyzing multiple source risk factors or multiple source outcomes and show that they can be considered special cases of generalized linear models, albeit with correlated outcomes. We show how these methods can be extended to handle the common survey features of stratification, clustering, and sampling weights as well as missing reports, and how they can be fit within Stata. The methods are illustrated using data from the Stirling County Study, a longitudinal community study of psychopathology and mortality.
Download Info
To download:
If you experience problems downloading a file, check if you have the
proper application to
view it first. Information about this may be contained
in the File-Format links below. In case of further problems read
the IDEAS help
page. Note that these files are not on the IDEAS
site. Please be patient as the files may be large.