This file is part of IDEAS, which uses RePEc data


[ Papers | Articles | Software | Books | Chapters | Authors | Institutions | JEL Classification | NEP reports | Search | New papers by email | Author registration | Rankings | Volunteers | FAQ | Blog | Help! ]

Modeling for response variables that are proportions

Author info | Abstract | Publisher info | Download info | Related research | Statistics
Author Info
Maarten L. Buis () (Department of Social Research Methodology, Vrije Universiteit Amsterdam)

Additional information is available for the following registered author(s):

Abstract

When dealing with response variables that are proportions, people often use regress. This approach can be problematic since the model can lead to predicted proportions less than zero or more than one and errors that are likely to be heteroskedastic and nonnormally distributed. This talk will discuss three more appropriate methods for proportions as response variables: betafit, dirifit, and glm. betafit is a maximum likelihood estimator using a beta likelihood, dirifit is a maximum likelihood estimator using a Dirichlet likelihood, and glm can be used to create a quasi–maximum likelihood estimator using a binomial likelihood. On an applied level, a difference between dirifit and the others is that the others can handle only one response variable, whereas dirifit can handle multiple response variables. For instance, betafit and glm can model the proportion of city budget spent on the category security (police and fire department), whereas dirifit can simultaneously model the proportions spent on categories security, social policy, infrastructure, and other. Another difference between betafit and glm is that glm can handle a proportion of exactly zero and one, whereas betafit can handle only proportions between zero and one. Special attention will be given on how to fit these models in Stata and on how to interpret the results. This presentation will end with a warning not to use any of these techniques for ecological inference, i.e., using aggregated data to infer about individual units. To use a classic example: In the United States in the 1930s, states with a high proportion of immigrants also had a high literacy rate (in the English language), whereas immigrants were on average less literate than nonimmigrants. Regressing state level literacy rate on state level proportion of immigrants would thus give a completely wrong picture about the relationship between individual immigrant status and literacy.

Download Info
To download:

If you experience problems downloading a file, check if you have the proper application to view it first. Information about this may be contained in the File-Format links below. In case of further problems read the IDEAS help page. Note that these files are not on the IDEAS site. Please be patient as the files may be large.

File URL: http://repec.org/usug2006/Buis_proportions.pdf
File Format: application/pdf
File Function: presentation slides
Download Restriction: no

Publisher Info
Paper provided by Stata Users Group in its series United Kingdom Stata Users' Group Meetings 2006 with number 15.

Download reference. The following formats are available: HTML (with abstract), plain text (with abstract), BibTeX, RIS (EndNote, RefMan, ProCite), ReDIF
Length:
Date of creation: 18 Sep 2006
Date of revision:
Handle: RePEc:boc:usug06:15

Contact details of provider:
Postal: Administration Building, 140 Commonwealth Avenue, Chestnut Hill MA 02467
Phone: 617-552-3670
Fax: 617-552-2308
Email:
Web page: http://www.stata.com/meeting/12uk
More information through EDIRC

For technical questions regarding this item, or to correct its listing, contact: (Christopher F Baum).

Related research
Keywords:

This paper has been announced in the following NEP Reports:

Statistics
Access and download statistics

Did you know? Authors registered on the RePEc Author Service receive monthly emails with details about downloads and abstract views of their works.

This page was last updated on 2009-11-20.


This information is provided to you by IDEAS at the Department of Economics, College of Liberal Arts and Sciences, University of Connecticut using RePEc data on a server sponsored by the Society for Economic Dynamics.