IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v89y2015icp51-71.html
   My bibliography  Save this article

Improving cross-validated bandwidth selection using subsampling-extrapolation techniques

Author

Listed:
  • Wang, Qing
  • Lindsay, Bruce G.

Abstract

Cross-validation methodologies have been widely used as a means of selecting tuning parameters in nonparametric statistical problems. In this paper we focus on a new method for improving the reliability of cross-validation. We implement this method in the context of the kernel density estimator, where one needs to select the bandwidth parameter so as to minimize L2 risk. This method is a two-stage subsampling-extrapolation bandwidth selection procedure, which is realized by first evaluating the risk at a fictional sample size m(m≤sample size n) and then extrapolating the optimal bandwidth from m to n. This two-stage method can dramatically reduce the variability of the conventional unbiased cross-validation bandwidth selector. This simple first-order extrapolation estimator is equivalent to the rescaled “bagging-CV” bandwidth selector in Hall and Robinson (2009) if one sets the bootstrap size equal to the fictional sample size. However, our simplified expression for the risk estimator enables us to compute the aggregated risk without any bootstrapping. Furthermore, we developed a second-order extrapolation technique as an extension designed to improve the approximation of the true optimal bandwidth. To select the optimal choice of the fictional size m given a sample of size n, we propose a nested cross-validation methodology. Based on simulation study, the proposed new methods show promising performance across a wide selection of distributions. In addition, we also investigated the asymptotic properties of the proposed bandwidth selectors.

Suggested Citation

  • Wang, Qing & Lindsay, Bruce G., 2015. "Improving cross-validated bandwidth selection using subsampling-extrapolation techniques," Computational Statistics & Data Analysis, Elsevier, vol. 89(C), pages 51-71.
  • Handle: RePEc:eee:csdana:v:89:y:2015:i:c:p:51-71
    DOI: 10.1016/j.csda.2015.03.005
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167947315000730
    Download Restriction: Full text for ScienceDirect subscribers only.

    File URL: https://libkey.io/10.1016/j.csda.2015.03.005?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Savchuk, Olga Y. & Hart, Jeffrey D. & Sheather, Simon J., 2010. "Indirect Cross-Validation for Density Estimation," Journal of the American Statistical Association, American Statistical Association, vol. 105(489), pages 415-423.
    2. Surajit Ray & Bruce G. Lindsay, 2008. "Model selection in high dimensions: a quadratic‐risk‐based approach," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 70(1), pages 95-118, February.
    3. Jones, M. C. & Sheather, S. J., 1991. "Using non-stochastic terms to advantage in kernel-based estimation of integrated squared density derivatives," Statistics & Probability Letters, Elsevier, vol. 11(6), pages 511-514, June.
    4. Berwin A. TURLACH, "undated". "Bandwidth selection in kernel density estimation: a rewiew," Statistic und Oekonometrie 9307, Humboldt Universitaet Berlin.
    5. Nicolai Meinshausen & Peter Bühlmann, 2010. "Stability selection," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 72(4), pages 417-473, September.
    6. Peter Hall & Andrew P. Robinson, 2009. "Reducing variability of crossvalidation for smoothing-parameter choice," Biometrika, Biometrika Trust, vol. 96(1), pages 175-186.
    7. Jones, M. C., 1991. "The roles of ISE and MISE in density estimation," Statistics & Probability Letters, Elsevier, vol. 12(1), pages 51-56, July.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Berwin A. TURLACH, "undated". "Bandwidth selection in kernel density estimation: a rewiew," Statistic und Oekonometrie 9307, Humboldt Universitaet Berlin.
    2. Nils-Bastian Heidenreich & Anja Schindler & Stefan Sperlich, 2013. "Bandwidth selection for kernel density estimation: a review of fully automatic selectors," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 97(4), pages 403-433, October.
    3. Miguel Reyes & Mario Francisco-Fernández & Ricardo Cao, 2017. "Bandwidth selection in kernel density estimation for interval-grouped data," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 26(3), pages 527-545, September.
    4. Duc Devroye & J. Beirlant & R. Cao & R. Fraiman & P. Hall & M. Jones & Gábor Lugosi & E. Mammen & J. Marron & C. Sánchez-Sellero & J. Uña & F. Udina & L. Devroye, 1997. "Universal smoothing factor selection in density estimation: theory and practice," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 6(2), pages 223-320, December.
    5. Wang, Qing & Chen, Shiwen, 2015. "A general class of linearly extrapolated variance estimators," Statistics & Probability Letters, Elsevier, vol. 98(C), pages 29-38.
    6. Wei, Jie & Chen, Hui, 2020. "Determining the number of factors in approximate factor models by twice K-fold cross validation," Economics Letters, Elsevier, vol. 191(C).
    7. Adriano Z. Zambom & Ronaldo Dias, 2013. "A Review of Kernel Density Estimation with Applications to Econometrics," International Econometric Review (IER), Econometric Research Association, vol. 5(1), pages 20-42, April.
    8. Capanu, Marinela & Giurcanu, Mihai & Begg, Colin B. & Gönen, Mithat, 2023. "Subsampling based variable selection for generalized linear models," Computational Statistics & Data Analysis, Elsevier, vol. 184(C).
    9. Gautier Marti & Frank Nielsen & Philippe Donnat & S'ebastien Andler, 2016. "On clustering financial time series: a need for distances between dependent random variables," Papers 1603.07822, arXiv.org.
    10. Skripnikov, A. & Michailidis, G., 2019. "Joint estimation of multiple network Granger causal models," Econometrics and Statistics, Elsevier, vol. 10(C), pages 120-133.
    11. Cordoni, Francesco & Dorémus, Nicolas & Moneta, Alessio, 2024. "Identification of vector autoregressive models with nonlinear contemporaneous structure," Journal of Economic Dynamics and Control, Elsevier, vol. 162(C).
    12. José E. Chacón & Carlos Tenreiro, 2012. "Exact and Asymptotically Optimal Bandwidths for Kernel Estimation of Density Functionals," Methodology and Computing in Applied Probability, Springer, vol. 14(3), pages 523-548, September.
    13. Chu, Chi-Yang & Henderson, Daniel J. & Parmeter, Christopher F., 2017. "On discrete Epanechnikov kernel functions," Computational Statistics & Data Analysis, Elsevier, vol. 116(C), pages 79-105.
    14. Meng Li & Sijia Xiang & Weixin Yao, 2016. "Robust estimation of the number of components for mixtures of linear regression models," Computational Statistics, Springer, vol. 31(4), pages 1539-1555, December.
    15. Mokkadem, Abdelkader & Pelletier, Mariane, 2020. "Online estimation of integrated squared density derivatives," Statistics & Probability Letters, Elsevier, vol. 166(C).
    16. J. Liao & Yujun Wu & Yong Lin, 2010. "Improving Sheather and Jones’ bandwidth selector for difficult densities in kernel density estimation," Journal of Nonparametric Statistics, Taylor & Francis Journals, vol. 22(1), pages 105-114.
    17. Tan, Xin Lu, 2019. "Optimal estimation of slope vector in high-dimensional linear transformation models," Journal of Multivariate Analysis, Elsevier, vol. 169(C), pages 179-204.
    18. Hall, Peter & Wolff, Rodney C. L., 1995. "Estimators of integrals of powers of density derivatives," Statistics & Probability Letters, Elsevier, vol. 24(2), pages 105-110, August.
    19. M. Hiabu & E. Mammen & M. D. Martìnez-Miranda & J. P. Nielsen, 2016. "In-sample forecasting with local linear survival densities," Biometrika, Biometrika Trust, vol. 103(4), pages 843-859.
    20. Rudolf Grübel, 1994. "Estimation of density functionals," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 46(1), pages 67-75, March.

    More about this item

    Keywords

    Bandwidth selection; Cross-validation; Extrapolation; L2 distance; Nonparametric kernel density estimator; Subsampling;
    All these keywords.

    JEL classification:

    • L2 - Industrial Organization - - Firm Objectives, Organization, and Behavior

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:89:y:2015:i:c:p:51-71. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.