This file is part of IDEAS, which uses RePEc data


[ Papers | Articles | Software | Books | Chapters | Authors | Institutions | JEL Classification | NEP reports | Search | New papers by email | Author registration | Rankings | Volunteers | FAQ | Blog | Help! ]

Stata's mishandling of missing data: A problem and two solutions

Author info | Abstract | Publisher info | Download info | Related research | Statistics
Author Info
Kenneth I. MacDonald () (Nuffield College, University of Oxford)
Abstract

The design decisions made by Stata in handling missing data in relational and logical expressions have, for the user, complex, pernicious, and poorly understood consequences. This presentation intends to substantiate that claim and to present two possible resolutions to the problem. As is well documented and reasonably well known, Stata considers p & q (and p | q) to be true when both p and q are indeterminate. This interpretation is counterintuitive and at odds with the formal-logic definition of these operators. To assert two unknowns is not to assert truth. Nevertheless, introductions to Stata characteristically present this as merely a “feature†and suggest that the obligation imposed on users (us) to explicitly test for missing data is straightforwardly implementable. Simple cases are indeed simple but, it will be argued, do not readily scale up to complex, real-life instances. For example, the one-line Stata command to implement the intention, "generate v = p|q" becomes "generate v = p|q if !mi(p,q)|(p&!mi(p))|(q&!mi(q))" And so forth. Such coding is a problem, not a feature—so solutions should be sought. One solution (really a work-around) introduces my command, validly, which allows expressions such as "validly generate v = p|q" and correctly, without fuss, interprets the logical or relational operators (here returning true if p is true but q indeterminate and indeterminate if p is false but q indeterminate). More generally, validly serves as a “wrapper†for any standard conditional command. So, for example, "validly reg a b c if p|q" is handled correctly. But validly (its code deploys nested calls to cond()) is computationally expensive. The better resolution would be for Stata, in its next release, to redesign its core code so that logical and relational operators would (as algebraic operators currently do) handle missing data appropriately. (Objections to this strategy are examined and deemed to lack force.) I would like to enlist the informed and active judgment of the participants of the 14th Users Group meeting to help bring this about.

Download Info
To download:

If you experience problems downloading a file, check if you have the proper application to view it first. Information about this may be contained in the File-Format links below. In case of further problems read the IDEAS help page. Note that these files are not on the IDEAS site. Please be patient as the files may be large.

File URL: http://repec.org/usug2008/KIMacD.presentation.ppt
File Format: application/x-ms-powerpoint
File Function: presentation slides
Download Restriction: no

Publisher Info
Paper provided by Stata Users Group in its series United Kingdom Stata Users' Group Meetings 2008 with number 01.

Download reference. The following formats are available: HTML (with abstract), plain text (with abstract), BibTeX, RIS (EndNote, RefMan, ProCite), ReDIF
Length:
Date of creation: 11 Sep 2008
Date of revision:
Handle: RePEc:boc:usug08:01

Contact details of provider:
Postal: Administration Building, 140 Commonwealth Avenue, Chestnut Hill MA 02467
Phone: 617-552-3670
Fax: 617-552-2308
Email:
Web page: http://www.stata.com/meeting/uk08
More information through EDIRC

For technical questions regarding this item, or to correct its listing, contact: (Christopher F Baum).

Related research
Keywords:

This paper has been announced in the following NEP Reports:

Statistics
Access and download statistics

Did you know? The most prolific authors have over 700 items listed on IDEAS.

This page was last updated on 2009-11-25.


This information is provided to you by IDEAS at the Department of Economics, College of Liberal Arts and Sciences, University of Connecticut using RePEc data on a server sponsored by the Society for Economic Dynamics.