IDEAS home Printed from
   My bibliography  Save this paper

Modeling for response variables that are proportions


  • Maarten L. Buis

    () (Department of Social Research Methodology, Vrije Universiteit Amsterdam)


When dealing with response variables that are proportions, people often use regress. This approach can be problematic since the model can lead to predicted proportions less than zero or more than one and errors that are likely to be heteroskedastic and nonnormally distributed. This talk will discuss three more appropriate methods for proportions as response variables: betafit, dirifit, and glm. betafit is a maximum likelihood estimator using a beta likelihood, dirifit is a maximum likelihood estimator using a Dirichlet likelihood, and glm can be used to create a quasi–maximum likelihood estimator using a binomial likelihood. On an applied level, a difference between dirifit and the others is that the others can handle only one response variable, whereas dirifit can handle multiple response variables. For instance, betafit and glm can model the proportion of city budget spent on the category security (police and fire department), whereas dirifit can simultaneously model the proportions spent on categories security, social policy, infrastructure, and other. Another difference between betafit and glm is that glm can handle a proportion of exactly zero and one, whereas betafit can handle only proportions between zero and one. Special attention will be given on how to fit these models in Stata and on how to interpret the results. This presentation will end with a warning not to use any of these techniques for ecological inference, i.e., using aggregated data to infer about individual units. To use a classic example: In the United States in the 1930s, states with a high proportion of immigrants also had a high literacy rate (in the English language), whereas immigrants were on average less literate than nonimmigrants. Regressing state level literacy rate on state level proportion of immigrants would thus give a completely wrong picture about the relationship between individual immigrant status and literacy.

Suggested Citation

  • Maarten L. Buis, 2006. "Modeling for response variables that are proportions," United Kingdom Stata Users' Group Meetings 2006 15, Stata Users Group.
  • Handle: RePEc:boc:usug06:15

    Download full text from publisher

    File URL:
    File Function: presentation slides
    Download Restriction: no

    References listed on IDEAS

    1. Papke, Leslie E & Wooldridge, Jeffrey M, 1996. "Econometric Methods for Fractional Response Variables with an Application to 401(K) Plan Participation Rates," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 11(6), pages 619-632, Nov.-Dec..
    Full references (including those not matched with items on IDEAS)


    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.

    Cited by:

    1. Sanchez Santos Jose Manuel & Dopico Jesús & Castellanos Pablo, 2012. "Playing Success and Local Market Size in Spanish Football League: Can Small Cities Dream of Winning Teams?," Journal of Quantitative Analysis in Sports, De Gruyter, vol. 8(2), pages 1-23, June.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:


    Access and download statistics


    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:boc:usug06:15. See general information about how to correct material in RePEc.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: (Christopher F Baum). General contact details of provider: .

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service hosted by the Research Division of the Federal Reserve Bank of St. Louis . RePEc uses bibliographic data supplied by the respective publishers.