IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2404.09528.html
   My bibliography  Save this paper

Overfitting Reduction in Convex Regression

Author

Listed:
  • Zhiqiang Liao
  • Sheng Dai
  • Eunji Lim
  • Timo Kuosmanen

Abstract

Convex regression is a method for estimating an unknown function $f_0$ from a data set of $n$ noisy observations when $f_0$ is known to be convex. This method has played an important role in operations research, economics, machine learning, and many other areas. It has been empirically observed that the convex regression estimator produces inconsistent estimates of $f_0$ and extremely large subgradients near the boundary of the domain of $f_0$ as $n$ increases. In this paper, we provide theoretical evidence of this overfitting behaviour. We also prove that the penalised convex regression estimator, one of the variants of the convex regression estimator, exhibits overfitting behaviour. To eliminate this behaviour, we propose two new estimators by placing a bound on the subgradients of the estimated function. We further show that our proposed estimators do not exhibit the overfitting behaviour by proving that (a) they converge to $f_0$ and (b) their subgradients converge to the gradient of $f_0$, both uniformly over the domain of $f_0$ with probability one as $n \rightarrow \infty$. We apply the proposed methods to compute the cost frontier function for Finnish electricity distribution firms and confirm their superior performance in predictive power over some existing methods.

Suggested Citation

  • Zhiqiang Liao & Sheng Dai & Eunji Lim & Timo Kuosmanen, 2024. "Overfitting Reduction in Convex Regression," Papers 2404.09528, arXiv.org.
  • Handle: RePEc:arx:papers:2404.09528
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2404.09528
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Rahul Mazumder & Arkopal Choudhury & Garud Iyengar & Bodhisattva Sen, 2019. "A Computational Framework for Multivariate Convex Regression and Its Variants," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 114(525), pages 318-331, January.
    2. Daisuke Yagi & Yining Chen & Andrew L. Johnson & Timo Kuosmanen, 2020. "Shape-Constrained Kernel-Weighted Least Squares: Estimating Production Functions for Chilean Manufacturing Industries," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 38(1), pages 43-54, January.
    3. Lee, Chia-Yen & Johnson, Andrew L. & Moreno-Centeno, Erick & Kuosmanen, Timo, 2013. "A more efficient algorithm for Convex Nonparametric Least Squares," European Journal of Operational Research, Elsevier, vol. 227(2), pages 391-400.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Dai, Sheng & Kuosmanen, Timo & Zhou, Xun, 2023. "Generalized quantile and expectile properties for shape constrained nonparametric estimation," European Journal of Operational Research, Elsevier, vol. 310(2), pages 914-927.
    2. Lee, Chia-Yen & Wang, Ke, 2019. "Nash marginal abatement cost estimation of air pollutant emissions using the stochastic semi-nonparametric frontier," European Journal of Operational Research, Elsevier, vol. 273(1), pages 390-400.
    3. José Luis Preciado Arreola & Daisuke Yagi & Andrew L. Johnson, 2020. "Insights from machine learning for evaluating production function estimators on manufacturing survey data," Journal of Productivity Analysis, Springer, vol. 53(2), pages 181-225, April.
    4. Jose M. Cordero & Cristina Polo & Daniel Santín, 2020. "Assessment of new methods for incorporating contextual variables into efficiency measures: a Monte Carlo simulation," Operational Research, Springer, vol. 20(4), pages 2245-2265, December.
    5. Dai, Sheng, 2023. "Variable selection in convex quantile regression: L1-norm or L0-norm regularization?," European Journal of Operational Research, Elsevier, vol. 305(1), pages 338-355.
    6. Tsionas, Mike, 2022. "Efficiency estimation using probabilistic regression trees with an application to Chilean manufacturing industries," International Journal of Production Economics, Elsevier, vol. 249(C).
    7. Ruitu Xu & Yifei Min & Tianhao Wang & Zhaoran Wang & Michael I. Jordan & Zhuoran Yang, 2023. "Finding Regularized Competitive Equilibria of Heterogeneous Agent Macroeconomic Models with Reinforcement Learning," Papers 2303.04833, arXiv.org.
    8. Eunji Lim, 2021. "Consistency of Penalized Convex Regression," International Journal of Statistics and Probability, Canadian Center of Science and Education, vol. 10(1), pages 1-69, January.
    9. Tsionas, Mike G. & Izzeldin, Marwan, 2018. "Smooth approximations to monotone concave functions in production analysis: An alternative to nonparametric concave least squares," European Journal of Operational Research, Elsevier, vol. 271(3), pages 797-807.
    10. Cristina Polo & Julián Ramajo & Alejandro Ricci‐Risquete, 2021. "A stochastic semi‐non‐parametric analysis of regional efficiency in the European Union," Regional Science Policy & Practice, Wiley Blackwell, vol. 13(1), pages 7-24, February.
    11. Timo Kuosmanen & Sheng Dai, 2023. "Modeling economies of scope in joint production: Convex regression of input distance function," Papers 2311.11637, arXiv.org.
    12. K. Hervé Dakpo & Yann Desjeux & Laure Latruffe, 2023. "Cost of abating excess nitrogen on wheat plots in France: An assessment with multi‐technology modelling," Journal of Agricultural Economics, Wiley Blackwell, vol. 74(3), pages 800-815, September.
    13. Feng, Oliver Y. & Chen, Yining & Han, Qiyang & Carroll, Raymond J & Samworth, Richard J., 2022. "Nonparametric, tuning-free estimation of S-shaped functions," LSE Research Online Documents on Economics 111889, London School of Economics and Political Science, LSE Library.
    14. Stefan Seifert, 2015. "Measuring Productivity When Technologies Are Heterogeneous: A Semi-Parametric Approach for Electricity Generation," Discussion Papers of DIW Berlin 1526, DIW Berlin, German Institute for Economic Research.
    15. Pang Du & Christopher F. Parmeter & Jeffrey S. Racine, 2012. "Nonparametric Kernel Regression with Multiple Predictors and Multiple Shape Constraints," Department of Economics Working Papers 2012-08, McMaster University.
    16. Chia-Yen Lee, 2017. "Directional marginal productivity: a foundation of meta-data envelopment analysis," Journal of the Operational Research Society, Palgrave Macmillan;The OR Society, vol. 68(5), pages 544-555, May.
    17. Layer, Kevin & Johnson, Andrew L. & Sickles, Robin C. & Ferrier, Gary D., 2020. "Direction selection in stochastic directional distance functions," European Journal of Operational Research, Elsevier, vol. 280(1), pages 351-364.
    18. Eunji Lim & Kihwan Kim, 2020. "Estimating Smooth and Convex Functions," International Journal of Statistics and Probability, Canadian Center of Science and Education, vol. 9(5), pages 1-40, September.
    19. Oliver Y. Feng & Yining Chen & Qiyang Han & Raymond J. Carroll & Richard J. Samworth, 2022. "Nonparametric, tuning‐free estimation of S‐shaped functions," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 84(4), pages 1324-1352, September.
    20. Aubin-Frankowski, Pierre-Cyril & Szabo, Zoltan, 2022. "Handling hard affine SDP shape constraints in RKHSs," LSE Research Online Documents on Economics 115724, London School of Economics and Political Science, LSE Library.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2404.09528. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.