IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v202y2025ics0167947324001270.html
   My bibliography  Save this article

Spline regression with automatic knot selection

Author

Listed:
  • Goepp, Vivien
  • Bouaziz, Olivier
  • Nuel, Grégory

Abstract

Spline regression has proven to be a useful tool for nonparametric regression. The flexibility of this function family is based on basepoints defining shifts in the behavior of the function – called knots. The question of setting the adequate number of knots and their placement is usually overcome by penalizing over the spline's overall smoothness (e.g. P-splines). However, there are areas of application where finding the best knot placement is of interest. A new method is introduced for automatically selecting knots in spline regression. The approach consists in setting many initial knots and fitting the spline regression through a penalized likelihood procedure called adaptive ridge, which discards the least relevant knots. The method – called A-splines, for adaptive splines – compares favorably with other knot selection methods: it runs way faster (∼10 to ∼400 faster) than comparable methods and has close to equal predictive performance. A-splines are applied to both simulated and real datasets.

Suggested Citation

  • Goepp, Vivien & Bouaziz, Olivier & Nuel, Grégory, 2025. "Spline regression with automatic knot selection," Computational Statistics & Data Analysis, Elsevier, vol. 202(C).
  • Handle: RePEc:eee:csdana:v:202:y:2025:i:c:s0167947324001270
    DOI: 10.1016/j.csda.2024.108043
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167947324001270
    Download Restriction: Full text for ScienceDirect subscribers only.

    File URL: https://libkey.io/10.1016/j.csda.2024.108043?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to

    for a different version of it.

    References listed on IDEAS

    as
    1. Jiahua Chen & Zehua Chen, 2008. "Extended Bayesian information criteria for model selection with large model spaces," Biometrika, Biometrika Trust, vol. 95(3), pages 759-771.
    2. Marx, Brian D. & Eilers, Paul H. C., 1998. "Direct generalized additive modeling with penalized likelihood," Computational Statistics & Data Analysis, Elsevier, vol. 28(2), pages 193-209, August.
    3. Leitenstorfer, Florian & Tutz, Gerhard, 2007. "Knot selection by boosting techniques," Computational Statistics & Data Analysis, Elsevier, vol. 51(9), pages 4605-4621, May.
    4. M. P. Wand, 2000. "A Comparison of Regression Spline Smoothing Procedures," Computational Statistics, Springer, vol. 15(4), pages 443-462, December.
    5. Florian Frommlet & Grégory Nuel, 2016. "An Adaptive Ridge Procedure for L0 Regularization," PLOS ONE, Public Library of Science, vol. 11(2), pages 1-23, February.
    6. Ruppert,David & Wand,M. P. & Carroll,R. J., 2003. "Semiparametric Regression," Cambridge Books, Cambridge University Press, number 9780521780506, Enero-Abr.
    7. Eilers, Paul H.C. & Currie, Iain D. & Durban, Maria, 2006. "Fast and compact smoothing on large multidimensional grids," Computational Statistics & Data Analysis, Elsevier, vol. 50(1), pages 61-76, January.
    8. Ralph C A Rippe & Jacqueline J Meulman & Paul H C Eilers, 2012. "Visualization of Genomic Changes by Segmented Smoothing Using an L0 Penalty," PLOS ONE, Public Library of Science, vol. 7(6), pages 1-14, June.
    9. D. G. T. Denison & B. K. Mallick & A. F. M. Smith, 1998. "Automatic Bayesian curve fitting," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 60(2), pages 333-350.
    10. Friedman, Jerome H. & Hastie, Trevor & Tibshirani, Rob, 2010. "Regularization Paths for Generalized Linear Models via Coordinate Descent," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 33(i01).
    11. Wang Q. & Linton O. & Hardle W., 2004. "Semiparametric Regression Analysis With Missing Response at Random," Journal of the American Statistical Association, American Statistical Association, vol. 99, pages 334-345, January.
    12. C. C. Holmes & B. K. Mallick, 2001. "Bayesian regression with multivariate linear splines," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 63(1), pages 3-17.
    13. Ruppert,David & Wand,M. P. & Carroll,R. J., 2003. "Semiparametric Regression," Cambridge Books, Cambridge University Press, number 9780521785167, Enero-Abr.
    14. Wallstrom, Garrick & Liebner, Jeffrey & Kass, Robert E., 2008. "An Implementation of Bayesian Adaptive Regression Splines (BARS) in C with S and R Wrappers," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 26(i01).
    15. Inyoung Kim & Noah D. Cohen & Raymond J. Carroll, 2003. "Semiparametric Regression Splines in Matched Case-Control Studies," Biometrics, The International Biometric Society, vol. 59(4), pages 1158-1169, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Morteza Amini & Mahdi Roozbeh & Nur Anisah Mohamed, 2024. "Separation of the Linear and Nonlinear Covariates in the Sparse Semi-Parametric Regression Model in the Presence of Outliers," Mathematics, MDPI, vol. 12(2), pages 1-17, January.
    2. Maximilian Osterhaus, 2024. "A Sparse Grid Approach for the Nonparametric Estimation of High-Dimensional Random Coefficient Models," Papers 2408.07185, arXiv.org.
    3. Dlugosz, Stephan & Mammen, Enno & Wilke, Ralf A., 2017. "Generalized partially linear regression with misclassified data and an application to labour market transitions," Computational Statistics & Data Analysis, Elsevier, vol. 110(C), pages 145-159.
    4. Akdeniz Duran, Esra & Härdle, Wolfgang Karl & Osipenko, Maria, 2012. "Difference based ridge and Liu type estimators in semiparametric regression models," Journal of Multivariate Analysis, Elsevier, vol. 105(1), pages 164-175.
    5. Shirun Shen & Huiya Zhou & Kejun He & Lan Zhou, 2024. "Principal Component Analysis of Two-dimensional Functional Data with Serial Correlation," Journal of Agricultural, Biological and Environmental Statistics, Springer;The International Biometric Society;American Statistical Association, vol. 29(3), pages 601-620, September.
    6. Afonso, António & Alves, José & Beck, Krzysztof & Jackson, Karen, 2024. "Financial, institutional, and macroeconomic determinants of cross-country portfolio equity flows: The case of developed countries," Economic Modelling, Elsevier, vol. 141(C).
    7. David O'Donnell & Alastair Rushworth & Adrian W. Bowman & E. Marian Scott & Mark Hallard, 2014. "Flexible regression models over river networks," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 63(1), pages 47-63, January.
    8. Takuma Yoshida, 2019. "Two stage smoothing in additive models with missing covariates," Statistical Papers, Springer, vol. 60(6), pages 1803-1826, December.
    9. Nolan, Tui H. & Richardson, Sylvia & Ruffieux, Hélène, 2025. "Efficient Bayesian functional principal component analysis of irregularly-observed multivariate curves," Computational Statistics & Data Analysis, Elsevier, vol. 203(C).
    10. Hübler, Olaf, 2017. "Health and Body Mass Index: No Simple Relationship," IZA Discussion Papers 10620, Institute of Labor Economics (IZA).
    11. Zanin, Luca, 2023. "A flexible estimation of sectoral portfolio exposure to climate transition risks in the European stock market," Journal of Behavioral and Experimental Finance, Elsevier, vol. 39(C).
    12. Gao, Lisa & Shi, Peng, 2022. "Leveraging high-resolution weather information to predict hail damage claims: A spatial point process for replicated point patterns," Insurance: Mathematics and Economics, Elsevier, vol. 107(C), pages 161-179.
    13. Rachid Muleia & Shelsea Luís Damião & Áuria Ribeiro Banze & Cynthia Semá Baltazar & Isaac Akpor Adjei, 2025. "Factors associated with early sexual debut among adolescents and youth in Mozambique: A geo-additive survival analysis of the Mozambique 2021 AIDS indicator survey," PLOS ONE, Public Library of Science, vol. 20(6), pages 1-17, June.
    14. Qi Qian & Danh V. Nguyen & Esra Kürüm & Connie M. Rhee & Sudipto Banerjee & Yihao Li & Damla Şentürk, 2024. "Multivariate Varying Coefficient Spatiotemporal Model," Statistics in Biosciences, Springer;International Chinese Statistical Association, vol. 16(3), pages 761-786, December.
    15. Yu Liu & Chin-Shang Li, 2023. "A linear spline Cox cure model with its applications," Computational Statistics, Springer, vol. 38(2), pages 935-954, June.
    16. Leitenstorfer, Florian & Tutz, Gerhard, 2007. "Knot selection by boosting techniques," Computational Statistics & Data Analysis, Elsevier, vol. 51(9), pages 4605-4621, May.
    17. Elizabeth Goult & Laura Andrea Barrero Guevara & Michael Briga & Matthieu Domenech de Cellès, 2024. "Estimating the optimal age for infant measles vaccination," Nature Communications, Nature, vol. 15(1), pages 1-14, December.
    18. Caldeira, João F. & Santos, André A.P. & Torrent, Hudson S., 2023. "Semiparametric portfolios: Improving portfolio performance by exploiting non-linearities in firm characteristics," Economic Modelling, Elsevier, vol. 122(C).
    19. Benjamin Owusu & Bettina Bökemeier & Alfred Greiner, 2023. "Assessing nonlinearities and heterogeneity in debt sustainability analysis: a panel spline approach," Empirical Economics, Springer, vol. 64(3), pages 1315-1346, March.
    20. Francesca Bruno & Fedele Greco & Massimo Ventrucci, 2016. "Non-parametric regression on compositional covariates using Bayesian P-splines," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 25(1), pages 75-88, March.

    More about this item

    Keywords

    ;
    ;
    ;
    ;
    ;

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:202:y:2025:i:c:s0167947324001270. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.