IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v54y2010i12p2976-2989.html
   My bibliography  Save this article

Measuring the prediction error. A comparison of cross-validation, bootstrap and covariance penalty methods

Author

Listed:
  • Borra, Simone
  • Di Ciaccio, Agostino

Abstract

The estimators most widely used to evaluate the prediction error of a non-linear regression model are examined. An extensive simulation approach allowed the comparison of the performance of these estimators for different non-parametric methods, and with varying signal-to-noise ratio and sample size. Estimators based on resampling methods such as Leave-one-out, parametric and non-parametric Bootstrap, as well as repeated Cross Validation methods and Hold-out, were considered. The methods used are Regression Trees, Projection Pursuit Regression and Neural Networks. The repeated-corrected 10-fold Cross-Validation estimator and the Parametric Bootstrap estimator obtained the best performance in the simulations.

Suggested Citation

  • Borra, Simone & Di Ciaccio, Agostino, 2010. "Measuring the prediction error. A comparison of cross-validation, bootstrap and covariance penalty methods," Computational Statistics & Data Analysis, Elsevier, vol. 54(12), pages 2976-2989, December.
  • Handle: RePEc:eee:csdana:v:54:y:2010:i:12:p:2976-2989
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167-9473(10)00106-4
    Download Restriction: Full text for ScienceDirect subscribers only.
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Chunming Zhang, 2008. "Prediction Error Estimation Under Bregman Divergence for Non‐Parametric Regression and Classification," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 35(3), pages 496-523, September.
    2. Bradley Efron, 2004. "The Estimation of Prediction Error: Covariance Penalties and Cross-Validation," Journal of the American Statistical Association, American Statistical Association, vol. 99, pages 619-632, January.
    3. Daudin, Jean-Jacques & Mary-Huard, Tristan, 2008. "Estimation of the conditional risk in classification: The swapping method," Computational Statistics & Data Analysis, Elsevier, vol. 52(6), pages 3220-3232, February.
    4. Yoshua Bengio & Yves Grandvalet, 2003. "No unbiased Estimator of the Variance of K-Fold Cross-Validation," CIRANO Working Papers 2003s-22, CIRANO.
    5. Kim, Ji-Hyun, 2009. "Estimating classification error rate: Repeated cross-validation, repeated hold-out and bootstrap," Computational Statistics & Data Analysis, Elsevier, vol. 53(11), pages 3735-3745, September.
    6. Wisnowski, James W. & Simpson, James R. & Montgomery, Douglas C. & Runger, George C., 2003. "Resampling methods for variable selection in robust regression," Computational Statistics & Data Analysis, Elsevier, vol. 43(3), pages 341-355, July.
    7. Shen X. & Ye J., 2002. "Adaptive Model Selection," Journal of the American Statistical Association, American Statistical Association, vol. 97, pages 210-221, March.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Khaled Yousef Almansi & Abdul Rashid Mohamed Shariff & Bahareh Kalantar & Ahmad Fikri Abdullah & Sharifah Norkhadijah Syed Ismail & Naonori Ueda, 2022. "Performance Evaluation of Hospital Site Suitability Using Multilayer Perceptron (MLP) and Analytical Hierarchy Process (AHP) Models in Malacca, Malaysia," Sustainability, MDPI, vol. 14(7), pages 1-36, March.
    2. Bergmeir, Christoph & Hyndman, Rob J. & Koo, Bonsoo, 2018. "A note on the validity of cross-validation for evaluating autoregressive time series prediction," Computational Statistics & Data Analysis, Elsevier, vol. 120(C), pages 70-83.
    3. Mario Guevara & Rodrigo Vargas, 2019. "Downscaling satellite soil moisture using geomorphometry and machine learning," PLOS ONE, Public Library of Science, vol. 14(9), pages 1-20, September.
    4. Nader Salari & Shamarina Shohaimi & Farid Najafi & Meenakshii Nallappan & Isthrinayagy Karishnarajah, 2014. "A Novel Hybrid Classification Model of Genetic Algorithms, Modified k-Nearest Neighbor and Developed Backpropagation Neural Network," PLOS ONE, Public Library of Science, vol. 9(11), pages 1-50, November.
    5. George Chalamandaris & Nikos E. Vlachogiannakis, 2018. "Are financial ratios relevant for trading credit risk? Evidence from the CDS market," Annals of Operations Research, Springer, vol. 266(1), pages 395-440, July.
    6. Melissa Adelman & Francisco Haimovich & Andres Ham & Emmanuel Vazquez, 2018. "Predicting school dropout with administrative data: new evidence from Guatemala and Honduras," Education Economics, Taylor & Francis Journals, vol. 26(4), pages 356-372, July.
    7. Bergmeir, Christoph & Costantini, Mauro & Benítez, José M., 2014. "On the usefulness of cross-validation for directional forecast evaluation," Computational Statistics & Data Analysis, Elsevier, vol. 76(C), pages 132-143.
    8. Abbasabadi, Narjes & Ashayeri, Mehdi & Azari, Rahman & Stephens, Brent & Heidarinejad, Mohammad, 2019. "An integrated data-driven framework for urban energy use modeling (UEUM)," Applied Energy, Elsevier, vol. 253(C), pages 1-1.
    9. Keunhyun Park & Sadegh Sabouri & Torrey Lyons & Guang Tian & Reid Ewing, 2020. "Intrazonal or interzonal? Improving intrazonal travel forecast in a four-step travel demand model," Transportation, Springer, vol. 47(5), pages 2087-2108, October.
    10. Ha, Tran Vinh & Asada, Takumi & Arimura, Mikiharu, 2019. "Determination of the influence factors on household vehicle ownership patterns in Phnom Penh using statistical and machine learning methods," Journal of Transport Geography, Elsevier, vol. 78(C), pages 70-86.
    11. Conde, David & Fernández, Miguel & Salvador, Bonifacio & Rueda, Cristina, 2015. "dawai: An R Package for Discriminant Analysis with Additional Information," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 66(i10).
    12. Usta, Ilhan & Kantar, Yeliz Mert, 2011. "On the performance of the flexible maximum entropy distributions within partially adaptive estimation," Computational Statistics & Data Analysis, Elsevier, vol. 55(6), pages 2172-2182, June.
    13. Christoph Bergmeir & Rob J Hyndman & Bonsoo Koo, 2015. "A Note on the Validity of Cross-Validation for Evaluating Time Series Prediction," Monash Econometrics and Business Statistics Working Papers 10/15, Monash University, Department of Econometrics and Business Statistics.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Zhang, Bo & Shen, Xiaotong & Mumford, Sunni L., 2012. "Generalized degrees of freedom and adaptive model selection in linear mixed-effects models," Computational Statistics & Data Analysis, Elsevier, vol. 56(3), pages 574-586.
    2. Philip Reiss & Lei Huang & Joseph Cavanaugh & Amy Roy, 2012. "Resampling-based information criteria for best-subset regression," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 64(6), pages 1161-1186, December.
    3. Hirose, Kei & Tateishi, Shohei & Konishi, Sadanori, 2013. "Tuning parameter selection in sparse regression modeling," Computational Statistics & Data Analysis, Elsevier, vol. 59(C), pages 28-40.
    4. Yi, Feng & Zou, Hui, 2013. "SURE-tuned tapering estimation of large covariance matrices," Computational Statistics & Data Analysis, Elsevier, vol. 58(C), pages 339-351.
    5. Theo Dijkstra, 2014. "Ridge regression and its degrees of freedom," Quality & Quantity: International Journal of Methodology, Springer, vol. 48(6), pages 3185-3193, November.
    6. In-Koo Cho & Kenneth Kasa, 2015. "Learning and Model Validation," Review of Economic Studies, Oxford University Press, vol. 82(1), pages 45-82.
    7. Sieds, 2012. "Complete Volume LXVI n.1 2012," RIEDS - Rivista Italiana di Economia, Demografia e Statistica - The Italian Journal of Economic, Demographic and Statistical Studies, SIEDS Societa' Italiana di Economia Demografia e Statistica, vol. 66(1), pages 1-296.
    8. Mark G E White & Neil E Bezodis & Jonathon Neville & Huw Summers & Paul Rees, 2022. "Determining jumping performance from a single body-worn accelerometer using machine learning," PLOS ONE, Public Library of Science, vol. 17(2), pages 1-25, February.
    9. Airola, Antti & Pahikkala, Tapio & Waegeman, Willem & De Baets, Bernard & Salakoski, Tapio, 2011. "An experimental comparison of cross-validation techniques for estimating the area under the ROC curve," Computational Statistics & Data Analysis, Elsevier, vol. 55(4), pages 1828-1844, April.
    10. Stefano Marchetti & Maciej Beręsewicz & Nicola Salvati & Marcin Szymkowiak & Łukasz Wawrowski, 2018. "The use of a three‐level M‐quantile model to map poverty at local administrative unit 1 in Poland," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 181(4), pages 1077-1104, October.
    11. Hettihewa, Samanthala & Saha, Shrabani & Zhang, Hanxiong, 2018. "Does an aging population influence stock markets? Evidence from New Zealand," Economic Modelling, Elsevier, vol. 75(C), pages 142-158.
    12. Mendez, Guillermo & Lohr, Sharon, 2011. "Estimating residual variance in random forest regression," Computational Statistics & Data Analysis, Elsevier, vol. 55(11), pages 2937-2950, November.
    13. Matthias Schmid & Thomas Hielscher & Thomas Augustin & Olaf Gefeller, 2011. "A Robust Alternative to the Schemper–Henderson Estimator of Prediction Error," Biometrics, The International Biometric Society, vol. 67(2), pages 524-535, June.
    14. Yanagihara, Hirokazu & Satoh, Kenichi, 2010. "An unbiased Cp criterion for multivariate ridge regression," Journal of Multivariate Analysis, Elsevier, vol. 101(5), pages 1226-1238, May.
    15. Yongli Zhang & Xiaotong Shen, 2015. "Adaptive Modeling Procedure Selection by Data Perturbation," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 33(4), pages 541-551, October.
    16. G. Saharidis & I. Androulakis & M. Ierapetritou, 2011. "Model building using bi-level optimization," Journal of Global Optimization, Springer, vol. 49(1), pages 49-67, January.
    17. Luts, Jan & Ormerod, John T., 2014. "Mean field variational Bayesian inference for support vector machine classification," Computational Statistics & Data Analysis, Elsevier, vol. 73(C), pages 163-176.
    18. David Rios Insua & Roi Naveiro & Victor Gallego, 2020. "Perspectives on Adversarial Classification," Mathematics, MDPI, vol. 8(11), pages 1-21, November.
    19. Zhang, Xinyu & Yu, Jihai, 2018. "Spatial weights matrix selection and model averaging for spatial autoregressive models," Journal of Econometrics, Elsevier, vol. 203(1), pages 1-18.
    20. Chunming Zhang, 2008. "Prediction Error Estimation Under Bregman Divergence for Non‐Parametric Regression and Classification," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 35(3), pages 496-523, September.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:54:y:2010:i:12:p:2976-2989. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.