IDEAS home Printed from
MyIDEAS: Log in (now much improved!) to save this paper

Semiparametric regression in Stata

Listed author(s):
  • Vincenzo Verardi

    (Free University of Brussels)

Semiparametric regression deals with the introduction of some very general nonlinear functional forms in regression analyses. This class of regression models is generally used to fit a parametric model in which the functional form of a subset of the explanatory variables is not known and/or in which the distribution of the error term cannot be assumed of being of a specific type beforehand. To fix ideas, consider the partial linear model y = zb + f(x) + e, in which the shape of the potentially nonlinear function of predictor x is of particular interest. Two approaches to modeling f(x) are to use splines or fractional polynomials. This talk reviews other more general approaches, and the commands available in Stata to fit such models. The main topic of the talk will be partial linear regression models, with some brief discussion also of so-called single index and generalized additive models. Though several semiparametric regression methods have been proposed and developed in the literature, these are probably the most popular ones. The general idea of partial linear regression models is that a dependent variable is regressed on i) a set of explanatory variables entering the model linearly and ii) a set of variables entering the model nonlinearly but without assuming any specific functional form. Several estimators have been proposed in the literature and are available in Stata. For example, the semipar command makes available what is called the double residuals estimator introduced by Robinson (1988), which is consistent and efficient. Similarly, the plreg command fits an alternative difference-based estimator proposed by Yatchew (1998) that has similar statistical properties to Robinson’s estimator. These estimators will be briefly compared to identify some drawbacks and pitfalls of both methods. A natural concern of researchers is how these estimators could be modified to deal with heteroskedasticity, serial correlation, and endogeneity in cross-sectional data or how they could be adapted in the context of panel data to control for unobserved heterogeneity. As a consequence, a substantial part of the talk will be devoted to explaining i) how the plreg and semipar commands can be used to tackle these very common violations of the Gauss–Markov assumptions in cross-sectional data and ii) how the user-written xtsemipar command makes a semiparametric regression easy to fit in the context of panel data. Because it is sometimes possible to move toward pure parametric models, a test proposed by Hardle and Mammen (1993) and built to check whether the nonparametric fit can be satisfactorily approximated by a parametric polynomial adjustment of order p will be described.

If you experience problems downloading a file, check if you have the proper application to view it first. In case of further problems read the IDEAS help page. Note that these files are not on the IDEAS site. Please be patient as the files may be large.

File URL:
Download Restriction: no

Paper provided by Stata Users Group in its series United Kingdom Stata Users' Group Meetings 2013 with number 14.

in new window

Date of creation: 16 Sep 2013
Handle: RePEc:boc:usug13:14
Contact details of provider: Web page:

More information through EDIRC

References listed on IDEAS
Please report citation or reference errors to , or , if you are the registered author of the cited work, log in to your RePEc Author Service profile, click on "citations" and make appropriate adjustments.:

in new window

  1. Vincenzo Verardi & Nicolas Debarsy, 2012. "Robinson's square root of N consistent semiparametric regression estimator in Stata," Stata Journal, StataCorp LP, vol. 12(4), pages 726-735, December.
  2. Bruffaerts, Christopher & Verardi, Vincenzo & Vermandele, Catherine, 2014. "A generalized boxplot for skewed and heavy-tailed distributions," Statistics & Probability Letters, Elsevier, vol. 95(C), pages 110-117.
  3. Vincenzo Verardi, 2013. "Semiparametric regression in Stata," United Kingdom Stata Users' Group Meetings 2013 14, Stata Users Group.
  4. Adonis Yatchew, 1998. "Nonparametric Regression Techniques in Economics," Journal of Economic Literature, American Economic Association, vol. 36(2), pages 669-721, June.
  5. Yatchew,Adonis, 2003. "Semiparametric Regression for the Applied Econometrician," Cambridge Books, Cambridge University Press, number 9780521012263, October.
  6. Hubert, M. & Vandervieren, E., 2008. "An adjusted boxplot for skewed distributions," Computational Statistics & Data Analysis, Elsevier, vol. 52(12), pages 5186-5201, August.
Full references (including those not matched with items on IDEAS)

This item is not listed on Wikipedia, on a reading list or among the top items on IDEAS.

When requesting a correction, please mention this item's handle: RePEc:boc:usug13:14. See general information about how to correct material in RePEc.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: (Christopher F Baum)

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If references are entirely missing, you can add them using this form.

If the full references list an item that is present in RePEc, but the system did not link to it, you can help with this form.

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your profile, as there may be some citations waiting for confirmation.

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

This information is provided to you by IDEAS at the Research Division of the Federal Reserve Bank of St. Louis using RePEc data.