Krishnan Bhaskaran () (MRC Clinical Trials Unit, London) Hannah Green () (MRC Clinical Trials Unit, London)
Abstract
We introduce the assertk command, beginning with a motivation and a comparison with the built-in assert command. We will then show some examples demonstrating the various options that can be used to produce customized output and to perform more complex checks. assertk is a simple utility that makes data consistency checking and reporting on data quality easy. The built-in Stata command assert checks each observation for a specified condition and halts do-files and ado-files when the specified condition is not satisfied. For example: . assert age entry < . 2 contradictions in 149 observations assertion is false; end of do-file r(9); Thus assert is a useful tool for checking important assumptions about the data you are about to process; your do-file will simply not continue if these assumptions do not pass the checks. The principle of the assert command also lends itself to consistency checking, i.e., performing a suite of checks on a dataset to identify potential errors. This is an important part of the process of data cleaning. However, in this application, the halting of do files is a hindrance, and there is a lack of detailed output showing which observations failed the check. In assertk, a condition is specified, and each observation is checked against this condition. If any data do not pass the check, the irregularities are output (with the output customizable by various options) and the do-file continues. For example: . assertk age ent < ., mess(Age at entry is missing) vars(id age ent) Age at entry is missing (1 obs) id age ent 38048 . 40352 . Thus a suite of checks can be programmed easily, with one line per check, and a meaningful log of data errors can be produced for use by data managers and statisticians.
Download Info
To our knowledge, this item is not available for
download. To find whether it is available, there are three
options:
1. Check below under "Related research" whether another version of this item is available online.
2. Check on the provider's web page
whether it is in fact available.
3. Perform a search for a similarly titled item that would be
available.
Did you know? Citation analysis on IDEAS includes online papers that are freely accessible and whose text could be automatically analyzed, currently about 210000 papers.