Creating self-validating datasets
AbstractOne of Stata’s great strengths is its data management abilities. When either building or sharing datasets, some of the most time-consuming activities are validating the data and writing documentation for the data. Much of this futility could be avoided if datasets were self-contained, i.e., if they could validate themselves. I will show how to achieve this goal within Stata. I will demonstrate a package of commands for attaching validation rules to the variables themselves, via characteristics, along with commands for running error checks and marking suspicious observations in the dataset. The validation system is flexible enough that simple checks continue to work even if variable names change or if the data are reshaped, and it is rich enough that validation may depend on other variables in the dataset. Since the validation is at the variable level, the self-validation also works if variables are recombined with data from other datasets. With these tools, Stata’s datasets can become truly self-contained.
Download InfoIf you experience problems downloading a file, check if you have the proper application to view it first. In case of further problems read the IDEAS help page. Note that these files are not on the IDEAS site. Please be patient as the files may be large.
Bibliographic InfoPaper provided by Stata Users Group in its series United Kingdom Stata Users' Group Meetings 2007 with number 18.
Date of creation: 14 Sep 2007
Date of revision:
This paper has been announced in the following NEP Reports:
- NEP-ALL-2007-09-24 (All new papers)
You can help add them by filling out this form.
reading list or among the top items on IDEAS.Access and download statisticsgeneral information about how to correct material in RePEc.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: (Christopher F Baum).
If references are entirely missing, you can add them using this form.