Many large data sets are created using clustered, rather than random sampling schemes. Clustered data arise when multiple observations exist on the same respondent, as in panel data, and when respondents share a common factor, such as a neighborhood or family. In the presence of clustered data, methods that rely on random sampling to measure the precision of an estimator may be incorrect. Many researchers, however, continue to treat respondents from the same sampling cluster as independent observations and thus implicitly ignore the potential intracluster correlation. In this paper, I use a robust method for drawing inferences and data from the Panel Survey of Income Dynamics, to examine the implications of clustered samples on inference. Consistent with the previous survey sampling literature, important differences are revealed in comparisons between the estimated asymptotic variances derived assuming random and clustered sampling, even when there are only a few observations per cluster. The estimates derived under random sampling are generally biased downward.
Download Info
To download:
If you experience problems downloading a file, check if you have the
proper application to
view it first. Information about this may be contained
in the File-Format links below. In case of further problems read
the IDEAS help
page. Note that these files are not on the IDEAS
site. Please be patient as the files may be large.
Publisher Info
Paper provided by University of Virginia, Department of Economics in its series Virginia Economics Online Papers with number
348.
References listed on IDEAS Please report citation or reference errors to , or , if you are the registered author of the cited work, log in to your RePEc Author Service profile, click on "citations" and make appropriate adjustments.: