This file is part of IDEAS, which uses RePEc data


[ Papers | Articles | Software | Books | Chapters | Authors | Institutions | JEL Classification | NEP reports | Search | New papers by email | Author registration | Rankings | Volunteers | FAQ | Blog | Help! ]

Consistency checking with assertk

Author info | Abstract | Publisher info | Download info | Related research | Statistics
Author Info
Krishnan Bhaskaran () (MRC Clinical Trials Unit, London)
Hannah Green () (MRC Clinical Trials Unit, London)
Abstract

We introduce the assertk command, beginning with a motivation and a comparison with the built-in assert command. We will then show some examples demonstrating the various options that can be used to produce customized output and to perform more complex checks. assertk is a simple utility that makes data consistency checking and reporting on data quality easy. The built-in Stata command assert checks each observation for a specified condition and halts do-files and ado-files when the specified condition is not satisfied. For example: . assert age entry < . 2 contradictions in 149 observations assertion is false; end of do-file r(9); Thus assert is a useful tool for checking important assumptions about the data you are about to process; your do-file will simply not continue if these assumptions do not pass the checks. The principle of the assert command also lends itself to consistency checking, i.e., performing a suite of checks on a dataset to identify potential errors. This is an important part of the process of data cleaning. However, in this application, the halting of do files is a hindrance, and there is a lack of detailed output showing which observations failed the check. In assertk, a condition is specified, and each observation is checked against this condition. If any data do not pass the check, the irregularities are output (with the output customizable by various options) and the do-file continues. For example: . assertk age ent < ., mess(Age at entry is missing) vars(id age ent) Age at entry is missing (1 obs) id age ent 38048 . 40352 . Thus a suite of checks can be programmed easily, with one line per check, and a meaningful log of data errors can be produced for use by data managers and statisticians.

Download Info
To our knowledge, this item is not available for download. To find whether it is available, there are three options:
1. Check below under "Related research" whether another version of this item is available online.
2. Check on the provider's web page whether it is in fact available.
3. Perform a search for a similarly titled item that would be available.

Publisher Info
Paper provided by Stata Users Group in its series United Kingdom Stata Users' Group Meetings 2006 with number 08.

Download reference. The following formats are available: HTML (with abstract), plain text (with abstract), BibTeX, RIS (EndNote, RefMan, ProCite), ReDIF
Length:
Date of creation: 18 Sep 2006
Date of revision:
Handle: RePEc:boc:usug06:08

Contact details of provider:
Postal: Administration Building, 140 Commonwealth Avenue, Chestnut Hill MA 02467
Phone: 617-552-3670
Fax: 617-552-2308
Email:
Web page: http://www.stata.com/meeting/12uk
More information through EDIRC

For technical questions regarding this item, or to correct its listing, contact: (Christopher F Baum).

Related research
Keywords:

Statistics
Access and download statistics

Did you know? To receive notification of recent additions to the database, subscribe to the free NEP reports.

This page was last updated on 2009-10-24.


This information is provided to you by IDEAS at the Department of Economics, College of Liberal Arts and Sciences, University of Connecticut using RePEc data on a server sponsored by the Society for Economic Dynamics.