IDEAS home Printed from
   My bibliography  Save this article

Optimally adjusted last cluster for prediction based on balancing the bias and variance by bootstrapping


  • Jeongwoo Kim


Estimating a predictive model from a dataset is best initiated with an unbiased estimator. However, since the unbiased estimator is unknown in general, the problem of the bias-variance tradeoff is raised. Aside from searching for an unbiased estimator, the convenient approach to the problem of the bias-variance tradeoff may be to use the clustering method. Within a cluster whose size is smaller than the whole sample, we would expect the simple form of the estimator for prediction to avoid the overfitting problem. In this paper, we propose a new method to find the optimal cluster for prediction. Based on the previous literature, this cluster is considered to exist somewhere between the whole dataset and the typical cluster determined by partitioning data. To obtain a reliable cluster size, we use the bootstrap method in this paper. Additionally, through experiments with simulated and real-world data, we show that the prediction error can be reduced by applying this new method. We believe that our proposed method will be useful in many applications using a clustering algorithm for a stable prediction performance.

Suggested Citation

  • Jeongwoo Kim, 2019. "Optimally adjusted last cluster for prediction based on balancing the bias and variance by bootstrapping," PLOS ONE, Public Library of Science, vol. 14(11), pages 1-31, November.
  • Handle: RePEc:plo:pone00:0223529
    DOI: 10.1371/journal.pone.0223529

    Download full text from publisher

    File URL:
    Download Restriction: no

    File URL:
    Download Restriction: no

    References listed on IDEAS

    1. Zou, Hui, 2006. "The Adaptive Lasso and Its Oracle Properties," Journal of the American Statistical Association, American Statistical Association, vol. 101, pages 1418-1429, December.
    2. Diebold, Francis X. & Chen, Celia, 1996. "Testing structural stability with endogenous breakpoint A size comparison of analytic and bootstrap procedures," Journal of Econometrics, Elsevier, vol. 70(1), pages 221-241, January.
    3. Todd E. Clark & Michael W. McCracken, 2009. "Improving Forecast Accuracy By Combining Recursive And Rolling Forecasts," International Economic Review, Department of Economics, University of Pennsylvania and Osaka University Institute of Social and Economic Research Association, vol. 50(2), pages 363-395, May.
    4. Diebold, Francis X & Mariano, Roberto S, 2002. "Comparing Predictive Accuracy," Journal of Business & Economic Statistics, American Statistical Association, vol. 20(1), pages 134-144, January.
    5. Lessard, Donald R. & Modigliani, Franco., 1975. "Inflation and the housing market : problems and potential solutions," Working papers 813-75., Massachusetts Institute of Technology (MIT), Sloan School of Management.
    6. Kwon, Chung S. & Shin, Tai S., 1999. "Cointegration and causality between macroeconomic variables and stock market returns," Global Finance Journal, Elsevier, vol. 10(1), pages 71-81.
    7. Fan Cai & Nhien-An Le-Khac & Tahar Kechadi, 2016. "Clustering Approaches for Financial Data Analysis: a Survey," Papers 1609.08520,
    8. Pesaran, M. Hashem & Timmermann, Allan, 2007. "Selection of estimation window in the presence of breaks," Journal of Econometrics, Elsevier, vol. 137(1), pages 134-161, March.
    9. John Barkoulas & Christopher F. Baum & Atreya Chakraborty, 1996. "Nearest-Neighbor Forecasts of U.S. Interest Rates," Boston College Working Papers in Economics 313., Boston College Department of Economics, revised 01 Apr 2003.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Rossi, Barbara, 2013. "Advances in Forecasting under Instability," Handbook of Economic Forecasting, in: G. Elliott & C. Granger & A. Timmermann (ed.), Handbook of Economic Forecasting, edition 1, volume 2, chapter 0, pages 1203-1324, Elsevier.
    2. Christopher J. Neely & David E. Rapach & Jun Tu & Guofu Zhou, 2014. "Forecasting the Equity Risk Premium: The Role of Technical Indicators," Management Science, INFORMS, vol. 60(7), pages 1772-1791, July.
    3. Barbara Rossi, 2019. "Forecasting in the presence of instabilities: How do we know whether models predict well and how to improve them," Economics Working Papers 1711, Department of Economics and Business, Universitat Pompeu Fabra.
    4. Jana Eklund & George Kapetanios & Simon Price, 2013. "Robust Forecast Methods and Monitoring during Structural Change," Manchester School, University of Manchester, vol. 81, pages 3-27, October.
    5. Pesaran, M.H. & Pick, A., 2008. "Forecasting Random Walks Under Drift Instability," Cambridge Working Papers in Economics 0814, Faculty of Economics, University of Cambridge.
    6. Kim, Hyun Hak & Swanson, Norman R., 2014. "Forecasting financial and macroeconomic variables using data reduction methods: New empirical evidence," Journal of Econometrics, Elsevier, vol. 178(P2), pages 352-367.
    7. Barbara Rossi & Atsushi Inoue, 2012. "Out-of-Sample Forecast Tests Robust to the Choice of Window Size," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 30(3), pages 432-453, April.
    8. Cunha, Ronan & Pereira, Pedro L. Valls, 2015. "Automatic model selection for forecasting Brazilian stock returns," Textos para discussão 398, FGV EESP - Escola de Economia de São Paulo, Fundação Getulio Vargas (Brazil).
    9. Dalibor Stevanovic & Stéphane Surprenant & Philippe Goulet Coulombe, 2019. "How is Machine Learning Useful for Macroeconomic Forecasting?," CIRANO Working Papers 2019s-22, CIRANO.
    10. Morales-Arias, Leonardo & Moura, Guilherme V., 2013. "Adaptive forecasting of exchange rates with panel data," International Journal of Forecasting, Elsevier, vol. 29(3), pages 493-509.
    11. Tarassow, Artur, 2019. "Forecasting U.S. money growth using economic uncertainty measures and regularisation techniques," International Journal of Forecasting, Elsevier, vol. 35(2), pages 443-457.
    12. Rapach, David & Zhou, Guofu, 2013. "Forecasting Stock Returns," Handbook of Economic Forecasting, in: G. Elliott & C. Granger & A. Timmermann (ed.), Handbook of Economic Forecasting, edition 1, volume 2, chapter 0, pages 328-383, Elsevier.
    13. Hyun Hak Kim & Norman Swanson, 2013. "Mining Big Data Using Parsimonious Factor and Shrinkage Methods," Departmental Working Papers 201316, Rutgers University, Department of Economics.
    14. Todd E. Clark & Michael W. McCracken, 2010. "Averaging forecasts from VARs with uncertain instabilities," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 25(1), pages 5-29, January.
    15. Andrea Bucci, 2020. "Realized Volatility Forecasting with Neural Networks," Journal of Financial Econometrics, Society for Financial Econometrics, vol. 18(3), pages 502-531.
    16. Norman R. Swanson & Weiqi Xiong, 2018. "Big data analytics in economics: What have we learned so far, and where should we go from here?," Canadian Journal of Economics/Revue canadienne d'économique, John Wiley & Sons, vol. 51(3), pages 695-746, August.
    17. Grzegorz Marcjasz & Tomasz Serafin & Rafał Weron, 2018. "Selection of Calibration Windows for Day-Ahead Electricity Price Forecasting," Energies, MDPI, Open Access Journal, vol. 11(9), pages 1-20, September.
    18. Ahmed, Shamim & Liu, Xiaoquan & Valente, Giorgio, 2016. "Can currency-based risk factors help forecast exchange rates?," International Journal of Forecasting, Elsevier, vol. 32(1), pages 75-97.
    19. Knut Are Aastveit & Francesco Ravazzolo & Herman K. van Dijk, 2018. "Combined Density Nowcasting in an Uncertain Economic Environment," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 36(1), pages 131-145, January.
    20. Monticini, Andrea & Ravazzolo, Francesco, 2014. "Forecasting the intraday market price of money," Journal of Empirical Finance, Elsevier, vol. 29(C), pages 304-315.

    More about this item


    Access and download statistics


    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0223529. See general information about how to correct material in RePEc.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: (plosone). General contact details of provider: .

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service hosted by the Research Division of the Federal Reserve Bank of St. Louis . RePEc uses bibliographic data supplied by the respective publishers.