IDEAS home Printed from https://ideas.repec.org/a/taf/gnstxx/v30y2018i1p197-215.html
   My bibliography  Save this article

Multiple predicting K-fold cross-validation for model selection

Author

Listed:
  • Yoonsuh Jung

Abstract

K-fold cross-validation (CV) is widely adopted as a model selection criterion. In K-fold CV, $ (K-1) $ (K−1) folds are used for model construction and the hold-out fold is allocated to model validation. This implies model construction is more emphasised than the model validation procedure. However, some studies have revealed that more emphasis on the validation procedure may result in improved model selection. Specifically, leave-m-out CV with n samples may achieve variable-selection consistency when m/n approaches to 1. In this study, a new CV method is proposed within the framework of K-fold CV. The proposed method uses $ (K-1) $ (K−1) folds of the data for model validation, while the other fold is for model construction. This provides $ (K-1) $ (K−1) predicted values for each observation. These values are averaged to produce a final predicted value. Then, the model selection based on the averaged predicted values can reduce variation in the assessment due to the averaging. The variable-selection consistency of the suggested method is established. Its advantage over K-fold CV with finite samples are examined under linear, non-linear, and high-dimensional models.

Suggested Citation

  • Yoonsuh Jung, 2018. "Multiple predicting K-fold cross-validation for model selection," Journal of Nonparametric Statistics, Taylor & Francis Journals, vol. 30(1), pages 197-215, January.
  • Handle: RePEc:taf:gnstxx:v:30:y:2018:i:1:p:197-215
    DOI: 10.1080/10485252.2017.1404598
    as

    Download full text from publisher

    File URL: http://hdl.handle.net/10.1080/10485252.2017.1404598
    Download Restriction: Access to full text is restricted to subscribers.

    File URL: https://libkey.io/10.1080/10485252.2017.1404598?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Zou, Hui, 2006. "The Adaptive Lasso and Its Oracle Properties," Journal of the American Statistical Association, American Statistical Association, vol. 101, pages 1418-1429, December.
    2. Jiahua Chen & Zehua Chen, 2008. "Extended Bayesian information criteria for model selection with large model spaces," Biometrika, Biometrika Trust, vol. 95(3), pages 759-771.
    3. Friedman, Jerome H. & Hastie, Trevor & Tibshirani, Rob, 2010. "Regularization Paths for Generalized Linear Models via Coordinate Descent," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 33(i01).
    4. Simon, Noah & Friedman, Jerome H. & Hastie, Trevor & Tibshirani, Rob, 2011. "Regularization Paths for Cox's Proportional Hazards Model via Coordinate Descent," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 39(i05).
    5. Cavanaugh, Joseph E., 1997. "Unifying the derivations for the Akaike and corrected Akaike information criteria," Statistics & Probability Letters, Elsevier, vol. 33(2), pages 201-208, April.
    6. Fuchun Huang, 2003. "Prediction Error Property of the Lasso Estimator and its Generalization," Australian & New Zealand Journal of Statistics, Australian Statistical Publishing Association Inc., vol. 45(2), pages 217-228, June.
    7. Fan J. & Li R., 2001. "Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties," Journal of the American Statistical Association, American Statistical Association, vol. 96, pages 1348-1360, December.
    8. Hansheng Wang & Bo Li & Chenlei Leng, 2009. "Shrinkage tuning parameter selection with a diverging number of parameters," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 71(3), pages 671-683, June.
    9. Patrick Carmack & Jeffrey Spence & William Schucany, 2012. "Generalised correlated cross-validation," Journal of Nonparametric Statistics, Taylor & Francis Journals, vol. 24(2), pages 269-282.
    10. Zhang, Yongli & Yang, Yuhong, 2015. "Cross-validation for selecting a model selection procedure," Journal of Econometrics, Elsevier, vol. 187(1), pages 95-112.
    11. Hui Zou & Trevor Hastie, 2005. "Addendum: Regularization and variable selection via the elastic net," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(5), pages 768-768, November.
    12. Hui Zou & Trevor Hastie, 2005. "Regularization and variable selection via the elastic net," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(2), pages 301-320, April.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Jorge Antunes & Rangan Gupta & Zinnia Mukherjee & Peter Wanke, 2022. "Information entropy, continuous improvement, and US energy performance: a novel stochastic-entropic analysis for ideal solutions (SEA-IS)," Annals of Operations Research, Springer, vol. 313(1), pages 289-318, June.
    2. Nicola Baldo & Matteo Miani & Fabio Rondinella & Clara Celauro, 2021. "A Machine Learning Approach to Determine Airport Asphalt Concrete Layer Moduli Using Heavy Weight Deflectometer Data," Sustainability, MDPI, vol. 13(16), pages 1-17, August.
    3. Jiachuang Wang & Haoji Ma & Xianhang Yan, 2023. "Rockburst Intensity Classification Prediction Based on Multi-Model Ensemble Learning Algorithms," Mathematics, MDPI, vol. 11(4), pages 1-29, February.
    4. Alex Jose & Angus S. Macdonald & George Tzougas & George Streftaris, 2022. "A Combined Neural Network Approach for the Prediction of Admission Rates Related to Respiratory Diseases," Risks, MDPI, vol. 10(11), pages 1-35, November.
    5. Thao Nguyen-Da & Yi-Min Li & Chi-Lu Peng & Ming-Yuan Cho & Phuong Nguyen-Thanh, 2023. "Tourism Demand Prediction after COVID-19 with Deep Learning Hybrid CNN–LSTM—Case Study of Vietnam and Provinces," Sustainability, MDPI, vol. 15(9), pages 1-22, April.
    6. Weizhang Liang & Suizhi Luo & Guoyan Zhao & Hao Wu, 2020. "Predicting Hard Rock Pillar Stability Using GBDT, XGBoost, and LightGBM Algorithms," Mathematics, MDPI, vol. 8(5), pages 1-17, May.
    7. Xingyu Li & Long Li & Longgao Chen & Ting Zhang & Jianying Xiao & Longqian Chen, 2022. "Random Forest Estimation and Trend Analysis of PM 2.5 Concentration over the Huaihai Economic Zone, China (2000–2020)," Sustainability, MDPI, vol. 14(14), pages 1-22, July.
    8. Weizhang Liang & Asli Sari & Guoyan Zhao & Stephen D. McKinnon & Hao Wu, 2020. "Short-term rockburst risk prediction using ensemble learning methods," Natural Hazards: Journal of the International Society for the Prevention and Mitigation of Natural Hazards, Springer;International Society for the Prevention and Mitigation of Natural Hazards, vol. 104(2), pages 1923-1946, November.
    9. Rui-Si Ma & Si-Ioi Ng & Tan Lee & Yi-Jian Yang & Raymond Kim-Wai Sum, 2022. "Validation of a Speech Database for Assessing College Students’ Physical Competence under the Concept of Physical Literacy," IJERPH, MDPI, vol. 19(12), pages 1-11, June.
    10. Hülya Yürekli & Öyküm Esra Yiğit & Okan Bulut & Min Lu & Ersoy Öz, 2022. "Exploring Factors That Affected Student Well-Being during the COVID-19 Pandemic: A Comparison of Data-Mining Approaches," IJERPH, MDPI, vol. 19(18), pages 1-16, September.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Zhang, Ting & Wang, Lei, 2020. "Smoothed empirical likelihood inference and variable selection for quantile regression with nonignorable missing response," Computational Statistics & Data Analysis, Elsevier, vol. 144(C).
    2. Dai, Linlin & Chen, Kani & Sun, Zhihua & Liu, Zhenqiu & Li, Gang, 2018. "Broken adaptive ridge regression and its asymptotic properties," Journal of Multivariate Analysis, Elsevier, vol. 168(C), pages 334-351.
    3. Hui Xiao & Yiguo Sun, 2019. "On Tuning Parameter Selection in Model Selection and Model Averaging: A Monte Carlo Study," JRFM, MDPI, vol. 12(3), pages 1-16, June.
    4. Daniel, Jeffrey & Horrocks, Julie & Umphrey, Gary J., 2018. "Penalized composite likelihoods for inhomogeneous Gibbs point process models," Computational Statistics & Data Analysis, Elsevier, vol. 124(C), pages 104-116.
    5. Hirose, Kei & Tateishi, Shohei & Konishi, Sadanori, 2013. "Tuning parameter selection in sparse regression modeling," Computational Statistics & Data Analysis, Elsevier, vol. 59(C), pages 28-40.
    6. Yanxin Wang & Qibin Fan & Li Zhu, 2018. "Variable selection and estimation using a continuous approximation to the $$L_0$$ L 0 penalty," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 70(1), pages 191-214, February.
    7. Zhihua Sun & Yi Liu & Kani Chen & Gang Li, 2022. "Broken adaptive ridge regression for right-censored survival data," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 74(1), pages 69-91, February.
    8. Wang, Tao & Zhu, Lixing, 2011. "Consistent tuning parameter selection in high dimensional sparse linear regression," Journal of Multivariate Analysis, Elsevier, vol. 102(7), pages 1141-1151, August.
    9. Jie Ding & Vahid Tarokh & Yuhong Yang, 2018. "Model Selection Techniques -- An Overview," Papers 1810.09583, arXiv.org.
    10. Zhixuan Fu & Chirag R. Parikh & Bingqing Zhou, 2017. "Penalized variable selection in competing risks regression," Lifetime Data Analysis: An International Journal Devoted to Statistical Methods and Applications for Time-to-Event Data, Springer, vol. 23(3), pages 353-376, July.
    11. Tutz, Gerhard & Pößnecker, Wolfgang & Uhlmann, Lorenz, 2015. "Variable selection in general multinomial logit models," Computational Statistics & Data Analysis, Elsevier, vol. 82(C), pages 207-222.
    12. Camila Epprecht & Dominique Guegan & Álvaro Veiga & Joel Correa da Rosa, 2017. "Variable selection and forecasting via automated methods for linear models: LASSO/adaLASSO and Autometrics," Post-Print halshs-00917797, HAL.
    13. Peter Bühlmann & Jacopo Mandozzi, 2014. "High-dimensional variable screening and bias in subsequent inference, with an empirical comparison," Computational Statistics, Springer, vol. 29(3), pages 407-430, June.
    14. Peter Martey Addo & Dominique Guegan & Bertrand Hassani, 2018. "Credit Risk Analysis Using Machine and Deep Learning Models," Risks, MDPI, vol. 6(2), pages 1-20, April.
    15. Capanu, Marinela & Giurcanu, Mihai & Begg, Colin B. & Gönen, Mithat, 2023. "Subsampling based variable selection for generalized linear models," Computational Statistics & Data Analysis, Elsevier, vol. 184(C).
    16. Loann David Denis Desboulets, 2018. "A Review on Variable Selection in Regression Analysis," Econometrics, MDPI, vol. 6(4), pages 1-27, November.
    17. Zeyu Bian & Erica E. M. Moodie & Susan M. Shortreed & Sahir Bhatnagar, 2023. "Variable selection in regression‐based estimation of dynamic treatment regimes," Biometrics, The International Biometric Society, vol. 79(2), pages 988-999, June.
    18. Jingxuan Luo & Lili Yue & Gaorong Li, 2023. "Overview of High-Dimensional Measurement Error Regression Models," Mathematics, MDPI, vol. 11(14), pages 1-22, July.
    19. Takumi Saegusa & Tianzhou Ma & Gang Li & Ying Qing Chen & Mei-Ling Ting Lee, 2020. "Variable Selection in Threshold Regression Model with Applications to HIV Drug Adherence Data," Statistics in Biosciences, Springer;International Chinese Statistical Association, vol. 12(3), pages 376-398, December.
    20. Zanhua Yin, 2020. "Variable selection for sparse logistic regression," Metrika: International Journal for Theoretical and Applied Statistics, Springer, vol. 83(7), pages 821-836, October.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:taf:gnstxx:v:30:y:2018:i:1:p:197-215. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Longhurst (email available below). General contact details of provider: http://www.tandfonline.com/GNST20 .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.