Estimating Log Models: To Transform or Not to Transform?
Data on health care expenditures, length of stay, utilization of health services, consumption of unhealthy commodities, etc. are typically characterized by: (a) nonnegative outcomes; (b) nontrivial fractions of zero outcomes in the population (and sample); and (c) positively-skewed distributions of the nonzero realizations. Similar data structures are encountered in labor economics as well. This paper provides simulation-based evidence on the finite-sample behavior of two sets of estimators designed to look at the effect of a set of covariates x on the expected outcome, E(y|x), under a range of data problems encountered in every day practice: generalized linear models (GLM), a subset of which can simply be viewed as differentially weighted nonlinear least-squares estimators, and those derived from least-squares estimators for the ln(y). We consider the first- and second- order behavior of these candidate estimators under alternative assumptions on the data generating processes. Our results indicate that the choice of estimator for models of ln(E(x|y)) can have major implications for empirical results if the estimator is not designed to deal with the specific data generating mechanism. Garden-variety statistical problems - skewness, kurtosis, and heteroscedasticity - can lead to an appreciable bias for some estimators or appreciable losses in precision for others.
|Date of creation:||Nov 1999|
|Publication status:||published as Manning, Willard G. and John Mullahy. "Estimating Log Models: To Transform Or Not To Transform?," Journal of Health Economics, 2001, v20(4,Jul), 461-494.|
|Contact details of provider:|| Postal: National Bureau of Economic Research, 1050 Massachusetts Avenue Cambridge, MA 02138, U.S.A.|
Web page: http://www.nber.org
More information through EDIRC
Please report citation or reference errors to , or , if you are the registered author of the cited work, log in to your RePEc Author Service profile, click on "citations" and make appropriate adjustments.:
- Kennedy, Peter, 1983. "Logarithmic Dependent Variables and Prediction Bias," Oxford Bulletin of Economics and Statistics, Department of Economics, University of Oxford, vol. 45(4), pages 389-92, November.
- Jones, Andrew M., 2000.
Handbook of Health Economics,
in: A. J. Culyer & J. P. Newhouse (ed.), Handbook of Health Economics, edition 1, volume 1, chapter 6, pages 265-344
- Wooldridge, Jeffrey M., 1991. "On the application of robust, regression- based diagnostics to models of conditional means and conditional variances," Journal of Econometrics, Elsevier, vol. 47(1), pages 5-46, January.
- Manning, Willard G., 1998. "The logged dependent variable, heteroscedasticity, and the retransformation problem," Journal of Health Economics, Elsevier, vol. 17(3), pages 283-295, June.
- Gourieroux, Christian & Monfort, Alain & Trognon, Alain, 1984.
"Pseudo Maximum Likelihood Methods: Applications to Poisson Models,"
Econometric Society, vol. 52(3), pages 701-20, May.
- Gourieroux Christian & Monfort Alain & Trognon A, 1982. "Pseudo maximum lilelihood methods : applications to poisson models," CEPREMAP Working Papers (Couverture Orange) 8203, CEPREMAP.
- Manning, Willard G, et al, 1987. "Health Insurance and the Demand for Medical Care: Evidence from a Randomized Experiment," American Economic Review, American Economic Association, vol. 77(3), pages 251-77, June.
- Mullahy, John, 1998. "Much ado about two: reconsidering retransformation and the two-part model in health econometrics," Journal of Health Economics, Elsevier, vol. 17(3), pages 247-281, June.
- Duan, Naihua, et al, 1983. "A Comparison of Alternative Models for the Demand for Medical Care," Journal of Business & Economic Statistics, American Statistical Association, vol. 1(2), pages 115-126, April.
- Manning, W. G. & Duan, N. & Rogers, W. H., 1987. "Monte Carlo evidence on the choice between sample selection and two-part models," Journal of Econometrics, Elsevier, vol. 35(1), pages 59-82, May.
- Kennedy, Peter E, 1981. "Estimation with Correctly Interpreted Dummy Variables in Semilogarithmic Equations [The Interpretation of Dummy Variables in Semilogarithmic Equations]," American Economic Review, American Economic Association, vol. 71(4), pages 801, September.
- Blough, David K. & Madden, Carolyn W. & Hornbrook, Mark C., 1999. "Modeling risk using generalized linear models," Journal of Health Economics, Elsevier, vol. 18(2), pages 153-171, April.
When requesting a correction, please mention this item's handle: RePEc:nbr:nberte:0246. See general information about how to correct material in RePEc.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ()
If references are entirely missing, you can add them using this form.