IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v89y2015icp51-71.html

Improving cross-validated bandwidth selection using subsampling-extrapolation techniques

Author

Listed:
  • Wang, Qing
  • Lindsay, Bruce G.

Abstract

Cross-validation methodologies have been widely used as a means of selecting tuning parameters in nonparametric statistical problems. In this paper we focus on a new method for improving the reliability of cross-validation. We implement this method in the context of the kernel density estimator, where one needs to select the bandwidth parameter so as to minimize L2 risk. This method is a two-stage subsampling-extrapolation bandwidth selection procedure, which is realized by first evaluating the risk at a fictional sample size m(m≤sample size n) and then extrapolating the optimal bandwidth from m to n. This two-stage method can dramatically reduce the variability of the conventional unbiased cross-validation bandwidth selector. This simple first-order extrapolation estimator is equivalent to the rescaled “bagging-CV” bandwidth selector in Hall and Robinson (2009) if one sets the bootstrap size equal to the fictional sample size. However, our simplified expression for the risk estimator enables us to compute the aggregated risk without any bootstrapping. Furthermore, we developed a second-order extrapolation technique as an extension designed to improve the approximation of the true optimal bandwidth. To select the optimal choice of the fictional size m given a sample of size n, we propose a nested cross-validation methodology. Based on simulation study, the proposed new methods show promising performance across a wide selection of distributions. In addition, we also investigated the asymptotic properties of the proposed bandwidth selectors.

Suggested Citation

  • Wang, Qing & Lindsay, Bruce G., 2015. "Improving cross-validated bandwidth selection using subsampling-extrapolation techniques," Computational Statistics & Data Analysis, Elsevier, vol. 89(C), pages 51-71.
  • Handle: RePEc:eee:csdana:v:89:y:2015:i:c:p:51-71
    DOI: 10.1016/j.csda.2015.03.005
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167947315000730
    Download Restriction: Full text for ScienceDirect subscribers only.

    File URL: https://libkey.io/10.1016/j.csda.2015.03.005?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to

    for a different version of it.

    References listed on IDEAS

    as
    1. Berwin A. TURLACH, "undated". "Bandwidth selection in kernel density estimation: a rewiew," Statistic und Oekonometrie 9307, Humboldt Universitaet Berlin.
    2. Nicolai Meinshausen & Peter Bühlmann, 2010. "Stability selection," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 72(4), pages 417-473, September.
    3. Savchuk, Olga Y. & Hart, Jeffrey D. & Sheather, Simon J., 2010. "Indirect Cross-Validation for Density Estimation," Journal of the American Statistical Association, American Statistical Association, vol. 105(489), pages 415-423.
    4. Peter Hall & Andrew P. Robinson, 2009. "Reducing variability of crossvalidation for smoothing-parameter choice," Biometrika, Biometrika Trust, vol. 96(1), pages 175-186.
    5. Surajit Ray & Bruce G. Lindsay, 2008. "Model selection in high dimensions: a quadratic‐risk‐based approach," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 70(1), pages 95-118, February.
    6. Jones, M. C. & Sheather, S. J., 1991. "Using non-stochastic terms to advantage in kernel-based estimation of integrated squared density derivatives," Statistics & Probability Letters, Elsevier, vol. 11(6), pages 511-514, June.
    7. Jones, M. C., 1991. "The roles of ISE and MISE in density estimation," Statistics & Probability Letters, Elsevier, vol. 12(1), pages 51-56, July.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Karim M Abadir & Michel Lubrano, 2024. "Explicit solutions for the asymptotically optimal bandwidth in cross-validation," Biometrika, Biometrika Trust, vol. 111(3), pages 809-823.
    2. Berwin A. TURLACH, "undated". "Bandwidth selection in kernel density estimation: a rewiew," Statistic und Oekonometrie 9307, Humboldt Universitaet Berlin.
    3. Nils-Bastian Heidenreich & Anja Schindler & Stefan Sperlich, 2013. "Bandwidth selection for kernel density estimation: a review of fully automatic selectors," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 97(4), pages 403-433, October.
    4. Miguel Reyes & Mario Francisco-Fernández & Ricardo Cao, 2017. "Bandwidth selection in kernel density estimation for interval-grouped data," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 26(3), pages 527-545, September.
    5. Duc Devroye & J. Beirlant & R. Cao & R. Fraiman & P. Hall & M. Jones & Gábor Lugosi & E. Mammen & J. Marron & C. Sánchez-Sellero & J. Uña & F. Udina & L. Devroye, 1997. "Universal smoothing factor selection in density estimation: theory and practice," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 6(2), pages 223-320, December.
    6. Scrucca, Luca, 2016. "Identifying connected components in Gaussian finite mixture models for clustering," Computational Statistics & Data Analysis, Elsevier, vol. 93(C), pages 5-17.
    7. Diego Vidaurre & Concha Bielza & Pedro Larrañaga, 2013. "A Survey of L1 Regression," International Statistical Review, International Statistical Institute, vol. 81(3), pages 361-387, December.
    8. Wang, Qing & Chen, Shiwen, 2015. "A general class of linearly extrapolated variance estimators," Statistics & Probability Letters, Elsevier, vol. 98(C), pages 29-38.
    9. Yang, Yihe & Zhou, Jie & Pan, Jianxin, 2021. "Estimation and optimal structure selection of high-dimensional Toeplitz covariance matrix," Journal of Multivariate Analysis, Elsevier, vol. 184(C).
    10. Wei, Jie & Chen, Hui, 2020. "Determining the number of factors in approximate factor models by twice K-fold cross validation," Economics Letters, Elsevier, vol. 191(C).
    11. Subbiah, Mohan & Fabozzi, Frank J., 2016. "Hedge fund allocation: Evaluating parametric and nonparametric forecasts using alternative portfolio construction techniques," International Review of Financial Analysis, Elsevier, vol. 45(C), pages 189-201.
    12. Ke-Lin Du & Rengong Zhang & Bingchun Jiang & Jie Zeng & Jiabin Lu, 2025. "Foundations and Innovations in Data Fusion and Ensemble Learning for Effective Consensus," Mathematics, MDPI, vol. 13(4), pages 1-49, February.
    13. Xiaoyu Liu & Yan Song & Hong-Fa Cheng & Kun Zhang, 2025. "A bootstrap-based bandwidth selection rule for kernel quantile estimators," Computational Statistics, Springer, vol. 40(7), pages 4037-4058, September.
    14. Aßmann, Christian & Boysen-Hogrefe, Jens, 2011. "A Bayesian approach to model-based clustering for binary panel probit models," Computational Statistics & Data Analysis, Elsevier, vol. 55(1), pages 261-279, January.
    15. Hidehiko Ichimura & Oliver Linton, 2001. "Asymptotic expansions for some semiparametric program evaluation estimators," CeMMAP working papers 04/01, Institute for Fiscal Studies.
    16. Adriano Z. Zambom & Ronaldo Dias, 2013. "A Review of Kernel Density Estimation with Applications to Econometrics," International Econometric Review (IER), Econometric Research Association, vol. 5(1), pages 20-42, April.
    17. Duong, Tarn & Hazelton, Martin L., 2005. "Convergence rates for unconstrained bandwidth matrix selectors in multivariate kernel density estimation," Journal of Multivariate Analysis, Elsevier, vol. 93(2), pages 417-433, April.
    18. Capanu, Marinela & Giurcanu, Mihai & Begg, Colin B. & Gönen, Mithat, 2023. "Subsampling based variable selection for generalized linear models," Computational Statistics & Data Analysis, Elsevier, vol. 184(C).
    19. Du, Lilun & Lan, Wei & Luo, Ronghua & Zhong, Pingshou, 2018. "Factor-adjusted multiple testing of correlations," Computational Statistics & Data Analysis, Elsevier, vol. 128(C), pages 34-47.
    20. Olga Y. Savchuk & Jeffrey D. Hart, 2017. "Fully robust one-sided cross-validation for regression functions," Computational Statistics, Springer, vol. 32(3), pages 1003-1025, September.

    More about this item

    Keywords

    ;
    ;
    ;
    ;
    ;
    ;

    JEL classification:

    • L2 - Industrial Organization - - Firm Objectives, Organization, and Behavior

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:89:y:2015:i:c:p:51-71. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.