IDEAS home Printed from https://ideas.repec.org/a/spr/stpapr/v66y2025i3d10.1007_s00362-025-01678-x.html
   My bibliography  Save this article

Distributed penalizing function criterion for local polynomial estimation in nonparametric regression with massive data

Author

Listed:
  • Tianqi Sun

    (Shandong University)

  • Weiyu Li

    (Shandong University
    Shandong University)

  • Lu Lin

    (Shandong University
    Shandong University)

Abstract

The selection of bandwidth is one of the most important issues in local polynomial estimation. However, the related researches about data-driven bandwidth selection methodology in combination with divide-and-conquer (DC) strategy have still been rare in the existing literature, which is not feasible to support the application of local polynomial estimation for massive data sets. In this paper, as a development of traditional penalizing function criterion, we propose a distributed penalizing function (DPF) to achieve the selection of optimal bandwidth. The proposed DPF is computationally efficient for massive data sets and is shown to be “globally optimal” in the sense that the minimization of the DPF is asymptotically equivalent to the minimization of the true empirical loss of the averaged function estimator, i.e., the DC estimator. Besides, a novel algorithm is proposed to resolve the selection of bandwidth parameter with imbalance DC strategy. The performance of this DPF is presented in the simulation studies and the real data analysis.

Suggested Citation

  • Tianqi Sun & Weiyu Li & Lu Lin, 2025. "Distributed penalizing function criterion for local polynomial estimation in nonparametric regression with massive data," Statistical Papers, Springer, vol. 66(3), pages 1-26, April.
  • Handle: RePEc:spr:stpapr:v:66:y:2025:i:3:d:10.1007_s00362-025-01678-x
    DOI: 10.1007/s00362-025-01678-x
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s00362-025-01678-x
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s00362-025-01678-x?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Runze Li & Dennis K.J. Lin & Bing Li, 2013. "Statistical inference in massive data sets," Applied Stochastic Models in Business and Industry, John Wiley & Sons, vol. 29(5), pages 399-409, September.
    2. PARK, Byeong U. & TURLACH, Berwin A., 1992. "Practical performance of several data driven bandwidth selectors," LIDAM Reprints CORE 1001, Université catholique de Louvain, Center for Operations Research and Econometrics (CORE).
    3. Park, B. & Turlach, B., 1992. "Practical Performance of Several Data Driven Bandwidih Selectors," Papers 9203, Catholique de Louvain - Institut de statistique.
    4. PARK, Byeong & TURLACH, Berwin, 1992. "Practical performance of several data driven bandwidth selectors," LIDAM Discussion Papers CORE 1992005, Université catholique de Louvain, Center for Operations Research and Econometrics (CORE).
    5. Park, B.U. & Turlach, B.A., 1992. "Rejoinder to ``Practical performance of several data driven bandwidth selectors"," LIDAM Reprints CORE 1022, Université catholique de Louvain, Center for Operations Research and Econometrics (CORE).
    6. Fang, Jianglin, 2023. "A split-and-conquer variable selection approach for high-dimensional general semiparametric models with massive data," Journal of Multivariate Analysis, Elsevier, vol. 194(C).
    7. Rong Jiang & Wei-wei Chen & Xin Liu, 2021. "Adaptive quantile regressions for massive datasets," Statistical Papers, Springer, vol. 62(4), pages 1981-1995, August.
    8. Shi, Chengchun & Lu, Wenbin & Song, Rui, 2018. "A massive data framework for M-estimators with cubic-rate," LSE Research Online Documents on Economics 102111, London School of Economics and Political Science, LSE Library.
    9. Chengchun Shi & Wenbin Lu & Rui Song, 2018. "A Massive Data Framework for M-Estimators with Cubic-Rate," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 113(524), pages 1698-1709, October.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Robert J. R. Elliott & Liza Jabbour & Liyun Zhang, 2016. "Firm productivity and importing: Evidence from Chinese manufacturing firms," Canadian Journal of Economics/Revue canadienne d'économique, John Wiley & Sons, vol. 49(3), pages 1086-1124, August.
    2. Wen-Ching Wang, 2018. "Setting up evaluate indicators for slope control engineering based on spatial clustering analysis," Natural Hazards: Journal of the International Society for the Prevention and Mitigation of Natural Hazards, Springer;International Society for the Prevention and Mitigation of Natural Hazards, vol. 93(2), pages 921-939, September.
    3. J. S. Marron & S. S. Chung, 2001. "Presentation of smoothers: the family approach," Computational Statistics, Springer, vol. 16(1), pages 195-207, March.
    4. Seok-Oh Jeong & Byeong Park & Léopold Simar, 2010. "Nonparametric conditional efficiency measures: asymptotic properties," Annals of Operations Research, Springer, vol. 173(1), pages 105-122, January.
    5. Declan Curran & Michael Funke & Jue Wang, 2007. "Economic Growth across Space and Time: subprovincial Evidence from Mainland China," Quantitative Macroeconomics Working Papers 20710, Hamburg University, Department of Economics.
    6. Ichimura, Hidehiko & Todd, Petra E., 2007. "Implementing Nonparametric and Semiparametric Estimators," Handbook of Econometrics, in: J.J. Heckman & E.E. Leamer (ed.), Handbook of Econometrics, edition 1, volume 6, chapter 74, Elsevier.
    7. Jos'e E. Figueroa-L'opez & Cheng Li, 2016. "Optimal Kernel Estimation of Spot Volatility of Stochastic Differential Equations," Papers 1612.04507, arXiv.org.
    8. repec:zbw:bofitp:2007_021 is not listed on IDEAS
    9. Declan Curran & Michael Funke & Jue Wang, 2007. "Economic Growth across Space and Time: subprovincial Evidence from Mainland China," Quantitative Macroeconomics Working Papers 20710, Hamburg University, Department of Economics.
    10. Duc Devroye & J. Beirlant & R. Cao & R. Fraiman & P. Hall & M. Jones & Gábor Lugosi & E. Mammen & J. Marron & C. Sánchez-Sellero & J. Uña & F. Udina & L. Devroye, 1997. "Universal smoothing factor selection in density estimation: theory and practice," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 6(2), pages 223-320, December.
    11. Fengrui Di & Lei Wang, 2022. "Multi-round smoothed composite quantile regression for distributed data," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 74(5), pages 869-893, October.
    12. Tortosa-Ausina, Emili, 2002. "Exploring efficiency differences over time in the Spanish banking industry," European Journal of Operational Research, Elsevier, vol. 139(3), pages 643-664, June.
    13. Xuejun Ma & Shaochen Wang & Wang Zhou, 2022. "Statistical inference in massive datasets by empirical likelihood," Computational Statistics, Springer, vol. 37(3), pages 1143-1164, July.
    14. Nils-Bastian Heidenreich & Anja Schindler & Stefan Sperlich, 2013. "Bandwidth selection for kernel density estimation: a review of fully automatic selectors," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 97(4), pages 403-433, October.
    15. Emili Tortosa-Ausina, 2000. "Inefficient banks or inefficient assets," Working Papers 0005, Departament Empresa, Universitat Autònoma de Barcelona, revised Dec 2000.
    16. Lu Lin & Feng Li, 2023. "Global debiased DC estimations for biased estimators via pro forma regression," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 32(2), pages 726-758, June.
    17. Zhang, Haixiang & Wang, HaiYing, 2021. "Distributed subdata selection for big data via sampling-based approach," Computational Statistics & Data Analysis, Elsevier, vol. 153(C).
    18. Farmen, Mark & Marron, J. S., 1999. "An assessment of finite sample performance of adaptive methods in density estimation," Computational Statistics & Data Analysis, Elsevier, vol. 30(2), pages 143-168, April.
    19. Corak, Miles & Lauzon, Darren, 2009. "Differences in the distribution of high school achievement: The role of class-size and time-in-term," Economics of Education Review, Elsevier, vol. 28(2), pages 189-198, April.
    20. Roberta Colavecchio & Declan Curran & Michael Funke, 2009. "Drifting together or falling apart? The empirics of regional economic growth in post-unification Germany," Applied Economics, Taylor & Francis Journals, vol. 43(9), pages 1087-1098.
    21. Emili Tortosa-Ausina, 2003. "Bank cost efficiency as distribution dynamics: controlling for specialization is important," Investigaciones Economicas, Fundación SEPI, vol. 27(1), pages 71-96, January.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:stpapr:v:66:y:2025:i:3:d:10.1007_s00362-025-01678-x. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.