IDEAS home Printed from
   My bibliography  Save this paper

Semiparametric regression in Stata


  • Vincenzo Verardi

    (Free University of Brussels)


The boxplot is probably the most commonly used tool to represent the distribution of the data and identify atypical observations in a univariate dataset. The problem with the standard boxplot is that as soon as asymmetry or tail heaviness appears, the percentage of values identified as atypical becomes excessive. To cope with this issue, Hubert and Vandervieren (2008) proposed an adjusted boxplot for skewed data. Their idea is to move the whiskers of the boxplot according to the degree of asymmetry of the data. The rule to set the whiskers of the adjusted boxplot was found by running a large number of simulations using a wide range of (moderately) skewed distributions. The idea was to find a rule that guaranteed that 0.7% of the observations would lie outside the interval delimited by the whiskers. Even if their rule works satisfactorily for most commonly used distributions, it suffers from some limitations: (i) the adjusted boxplot is not appropriate for severely skewed distributions and for distributions with heavy tails; (ii) it is specifically related to a theoretical rejection rate of 0.7%; (iii) it is extremely sensitive to the estimated value of the asymmetry parameter; and (iv) it requires a substantial computational complexity, O(n \log n). To tackle these drawbacks, we propose a much simpler method to find the whiskers of the boxplot in case of (eventually) skewed and heavy-tailed data. We apply a simple rank-preserving transformation on the original data so that the transformed data can be adjusted by a so-called Tukey g-and-h distribution. Using the quantiles of this distribution, we can easily recover whiskers of the boxplot related to the original data. The computational complexity of the proposed method is O(n), the same as the standard boxplot.

Suggested Citation

  • Vincenzo Verardi, 2014. "Semiparametric regression in Stata," United Kingdom Stata Users' Group Meetings 2014 09, Stata Users Group.
  • Handle: RePEc:boc:usug14:09

    Download full text from publisher

    File URL:
    Download Restriction: no

    Other versions of this item:

    References listed on IDEAS

    1. Nicholas J. Cox, 2009. "Speaking Stata: Creating and varying box plots," Stata Journal, StataCorp LP, vol. 9(3), pages 478-496, September.
    2. Bruffaerts, Christopher & Verardi, Vincenzo & Vermandele, Catherine, 2014. "A generalized boxplot for skewed and heavy-tailed distributions," Statistics & Probability Letters, Elsevier, vol. 95(C), pages 110-117.
    3. Hubert, M. & Vandervieren, E., 2008. "An adjusted boxplot for skewed distributions," Computational Statistics & Data Analysis, Elsevier, vol. 52(12), pages 5186-5201, August.
    4. Vincenzo Verardi, 2013. "Semiparametric regression in Stata," United Kingdom Stata Users' Group Meetings 2013 14, Stata Users Group.
    5. Yatchew,Adonis, 2003. "Semiparametric Regression for the Applied Econometrician," Cambridge Books, Cambridge University Press, number 9780521812832, November.
    6. Nicholas J. Cox, 2013. "Speaking Stata: Creating and varying box plots: Correction," Stata Journal, StataCorp LP, vol. 13(2), pages 398-400, June.
    7. Vincenzo Verardi & Nicolas Debarsy, 2012. "Robinson's square root of N consistent semiparametric regression estimator in Stata," Stata Journal, StataCorp LP, vol. 12(4), pages 726-735, December.
    8. Adonis Yatchew, 1998. "Nonparametric Regression Techniques in Economics," Journal of Economic Literature, American Economic Association, vol. 36(2), pages 669-721, June.
    Full references (including those not matched with items on IDEAS)


    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.

    Cited by:

    1. Vincenzo Verardi, 2013. "Semiparametric regression in Stata," United Kingdom Stata Users' Group Meetings 2013 14, Stata Users Group.
    2. Rojas Valdes, Ruben I. & Lin Lawell, C.-Y. Cynthia & Taylor, J. Edward, 2017. "The Dynamic Migration Game: A Structural Econometric Model and Application to Rural Mexico," 2017 Annual Meeting, July 30-August 1, Chicago, Illinois 259184, Agricultural and Applied Economics Association.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Syed Abul Hasan, 2016. "Engel curves and equivalence scales for Bangladesh," Journal of the Asia Pacific Economy, Taylor & Francis Journals, vol. 21(2), pages 301-315, April.
    2. Ichimura, Hidehiko & Todd, Petra E., 2007. "Implementing Nonparametric and Semiparametric Estimators," Handbook of Econometrics, in: J.J. Heckman & E.E. Leamer (ed.), Handbook of Econometrics, edition 1, volume 6, chapter 74, Elsevier.
    3. Nuria Gallego & Carlos Llano, 2014. "The Border Effect and the Nonlinear Relationship between Trade and Distance," Review of International Economics, Wiley Blackwell, vol. 22(5), pages 1016-1048, November.
    4. Michael Lokshin, 2006. "Difference-based semiparametric estimation of partial linear regression models," Stata Journal, StataCorp LP, vol. 6(3), pages 377-383, September.
    5. Pholo Bala, Alain, 2009. "Urban concentration and economic growth: checking for specific regional effects," LIDAM Discussion Papers CORE 2009038, Université catholique de Louvain, Center for Operations Research and Econometrics (CORE).
    6. Cheolsung Park, 2014. "Why do children transfer to their parents? Evidence from South Korea," Review of Economics of the Household, Springer, vol. 12(3), pages 461-485, September.
    7. Lessmann, Christian, 2014. "Spatial inequality and development — Is there an inverted-U relationship?," Journal of Development Economics, Elsevier, vol. 106(C), pages 35-51.
    8. Daniel McFadden, 2014. "The new science of pleasure: consumer choice behavior and the measurement of well-being," Chapters, in: Stephane Hess & Andrew Daly (ed.), Handbook of Choice Modelling, chapter 2, pages 7-48, Edward Elgar Publishing.
    9. Ayhan Kose, M. & Prasad, Eswar S. & Taylor, Ashley D., 2011. "Thresholds in the process of international financial integration," Journal of International Money and Finance, Elsevier, vol. 30(1), pages 147-179, February.
    10. Tai-Hsin Huang & Kuan-Chen Chen & Chien-Hsiu Lin & Ming-Tai Chung, 2014. "Consistent estimation of technical and allocative efficiencies for a semiparametric stochastic cost frontier with shadow input prices," Journal of Productivity Analysis, Springer, vol. 41(2), pages 307-320, April.
    11. Daniel L. McFadden, 2013. "The New Science of Pleasure," NBER Working Papers 18687, National Bureau of Economic Research, Inc.
    12. Syed Abul Hasan, 2013. "The impact of a large rice price increase on welfare and poverty in Bangladesh," ASARC Working Papers 2013-11, The Australian National University, Australia South Asia Research Centre.
    13. Kevin Denny & Orla Doyle, 2010. "Returns to basic skills in central and eastern Europe," The Economics of Transition, The European Bank for Reconstruction and Development, vol. 18(1), pages 183-208, January.
    14. Mia Hubert & Peter Rousseeuw & Pieter Segaert, 2015. "Multivariate functional outlier detection," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 24(2), pages 177-202, July.
    15. Corak, Miles & Lauzon, Darren, 2009. "Differences in the distribution of high school achievement: The role of class-size and time-in-term," Economics of Education Review, Elsevier, vol. 28(2), pages 189-198, April.
    16. Almeida, Alexandre N. & Bravo-Ureta, Boris E., 2019. "Agricultural productivity, shadow wages and off-farm labor decisions in Nicaragua," Economic Systems, Elsevier, vol. 43(1), pages 99-110.
    17. Koop, Gary & Poirier, Dale J., 2004. "Bayesian variants of some classical semiparametric regression techniques," Journal of Econometrics, Elsevier, vol. 123(2), pages 259-282, December.
    18. Nicodemo, Catia & Satorra, Albert, 2020. "Exploratory Data Analysis on Large Data Sets: The Example of Salary Variation in Spanish Social Security Data," IZA Discussion Papers 13459, Institute of Labor Economics (IZA).
    19. SCHAFGANS, Marcia M.A. & ZINDE-WALSH, Victoria, 2007. "Robust Average Derivative Estimation," Cahiers de recherche 12-2007, Centre interuniversitaire de recherche en économie quantitative, CIREQ.
    20. Feng, Guohua & McLaren, Keith R. & Yang, Ou & Zhang, Xiaohui & Zhao, Xueyan, 2021. "The impact of environmental policy stringency on industrial productivity growth: A semi-parametric study of OECD countries," Energy Economics, Elsevier, vol. 100(C).

    More about this item


    Access and download statistics


    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:boc:usug14:09. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Christopher F Baum (email available below). General contact details of provider: .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.