IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v54y2010i7p1707-1718.html
   My bibliography  Save this article

Fast kernel conditional density estimation: A dual-tree Monte Carlo approach

Author

Listed:
  • Holmes, Michael P.
  • Gray, Alexander G.
  • Isbell Jr., Charles Lee

Abstract

We describe a fast, data-driven bandwidth selection procedure for kernel conditional density estimation (KCDE). Specifically, we give a Monte Carlo dual-tree algorithm for efficient, error-controlled approximation of a cross-validated likelihood objective. While exact evaluation of this objective has an unscalable O(n2) computational cost, our method is practical and shows speedup factors as high as 286,000 when applied to real multivariate datasets containing up to one million points. In absolute terms, computation times are reduced from months to minutes. This enables applications at much greater scale than previously possible. The core idea in our method is to first derive a standard deterministic dual-tree approximation, whose loose deterministic bounds we then replace with tight, probabilistic Monte Carlo bounds. The resulting Monte Carlo dual-tree algorithm exhibits strong error control and high speedup across a broad range of datasets several orders of magnitude greater in size than those reported in previous work. The cost of this high acceleration is the loss of the formal error guarantee of the deterministic dual-tree framework; however, our experiments show that error is still amply controlled by our Monte Carlo algorithm, and the many-order-of-magnitude speedups are worth this sacrifice in the large-data case, where cross-validated bandwidth selection for KCDE would otherwise be impractical.

Suggested Citation

  • Holmes, Michael P. & Gray, Alexander G. & Isbell Jr., Charles Lee, 2010. "Fast kernel conditional density estimation: A dual-tree Monte Carlo approach," Computational Statistics & Data Analysis, Elsevier, vol. 54(7), pages 1707-1718, July.
  • Handle: RePEc:eee:csdana:v:54:y:2010:i:7:p:1707-1718
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167-9473(10)00026-5
    Download Restriction: Full text for ScienceDirect subscribers only.
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Jianqing Fan & Tsz Ho Yim, 2004. "A crossvalidation method for estimating conditional densities," Biometrika, Biometrika Trust, vol. 91(4), pages 819-834, December.
    2. Christoph Seeger, 2006. "Forecasting for WorldWide Supply Chain Processes with SAP's APO," Foresight: The International Journal of Applied Forecasting, International Institute of Forecasters, issue 5, pages 51-54, Fall.
    3. Jan G. De Gooijer & Dawit Zerom, 2003. "On Conditional Density Estimation," Statistica Neerlandica, Netherlands Society for Statistics and Operations Research, vol. 57(2), pages 159-176, May.
    4. Fan, Jianqing & Yao, Qiwei & Tong, Howell, 1996. "Estimation of conditional densities and sensitivity measures in nonlinear dynamical systems," LSE Research Online Documents on Economics 6704, London School of Economics and Political Science, LSE Library.
    5. Bashtannyk, David M. & Hyndman, Rob J., 2001. "Bandwidth selection for kernel conditional density estimation," Computational Statistics & Data Analysis, Elsevier, vol. 36(3), pages 279-298, May.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Hu, Shuowen & Poskitt, D.S. & Zhang, Xibin, 2012. "Bayesian adaptive bandwidth kernel density estimation of irregular multivariate distributions," Computational Statistics & Data Analysis, Elsevier, vol. 56(3), pages 732-740.
    2. Ann-Kathrin Bott & Michael Kohler, 2016. "Adaptive Estimation of a Conditional Density," International Statistical Review, International Statistical Institute, vol. 84(2), pages 291-316, August.
    3. Aas Kjersti & Nagler Thomas & Jullum Martin & Løland Anders, 2021. "Explaining predictive models using Shapley values and non-parametric vine copulas," Dependence Modeling, De Gruyter, vol. 9(1), pages 62-81, January.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Ann-Kathrin Bott & Michael Kohler, 2016. "Adaptive Estimation of a Conditional Density," International Statistical Review, International Statistical Institute, vol. 84(2), pages 291-316, August.
    2. Kateřina Konečná & Ivanka Horová, 2019. "Maximum likelihood method for bandwidth selection in kernel conditional density estimate," Computational Statistics, Springer, vol. 34(4), pages 1871-1887, December.
    3. Wen, Kuangyu & Wu, Ximing, 2017. "Smoothed kernel conditional density estimation," Economics Letters, Elsevier, vol. 152(C), pages 112-116.
    4. Liang, Han-Ying & Liu, Ai-Ai, 2013. "Kernel estimation of conditional density with truncated, censored and dependent data," Journal of Multivariate Analysis, Elsevier, vol. 120(C), pages 40-58.
    5. Faugeras, Olivier P., 2009. "A quantile-copula approach to conditional density estimation," Journal of Multivariate Analysis, Elsevier, vol. 100(9), pages 2083-2099, October.
    6. Michael Kohler & Adam Krzyżak, 2020. "Estimating quantiles in imperfect simulation models using conditional density estimation," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 72(1), pages 123-155, February.
    7. Ann-Kathrin Bott & Michael Kohler, 2017. "Nonparametric estimation of a conditional density," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 69(1), pages 189-214, February.
    8. Liang, Han-Ying & Peng, Liang, 2010. "Asymptotic normality and Berry-Esseen results for conditional density estimator with censored and dependent data," Journal of Multivariate Analysis, Elsevier, vol. 101(5), pages 1043-1054, May.
    9. Kim Huynh & David Jacho-Chavez, 2007. "Conditional density estimation: an application to the Ecuadorian manufacturing sector," Economics Bulletin, AccessEcon, vol. 3(62), pages 1-6.
    10. repec:ebl:ecbull:v:3:y:2007:i:62:p:1-6 is not listed on IDEAS
    11. Wang, Xiao-Feng & Ye, Deping, 2015. "Conditional density estimation in measurement error problems," Journal of Multivariate Analysis, Elsevier, vol. 133(C), pages 38-50.
    12. Otneim, Håkon & Tjøstheim, Dag, 2016. "Non-parametric estimation of conditional densities: A new method," Discussion Papers 2016/22, Norwegian School of Economics, Department of Business and Management Science.
    13. Xiong, Xianzhu & Ou, Meijuan & Chen, Ailian, 2021. "Reweighted Nadaraya–Watson estimation of conditional density function in the right-censored model," Statistics & Probability Letters, Elsevier, vol. 168(C).
    14. Manzan, Sebastiano & Zerom, Dawit, 2008. "A bootstrap-based non-parametric forecast density," International Journal of Forecasting, Elsevier, vol. 24(3), pages 535-550.
    15. Adolfo Maza & María Hierro & José Villaverde, 2010. "Measuring intra-distribution income dynamics: an application to the European regions," The Annals of Regional Science, Springer;Western Regional Science Association, vol. 45(2), pages 313-329, October.
    16. Roberto Basile, 2010. "Intra-distribution dynamics of regional per-capita income in Europe: evidence from alternative conditional density estimators," Statistica, Department of Statistics, University of Bologna, vol. 70(1), pages 3-22.
    17. Roberto Basile, 2009. "Productivity Polarization across Regions in Europe," International Regional Science Review, , vol. 32(1), pages 92-115, January.
    18. Manfred Fischer & Peter Stumpner, 2008. "Income distribution dynamics and cross-region convergence in Europe," Journal of Geographical Systems, Springer, vol. 10(2), pages 109-139, June.
    19. Arora, Siddharth & Taylor, James W., 2016. "Forecasting electricity smart meter data using conditional kernel density estimation," Omega, Elsevier, vol. 59(PA), pages 47-59.
    20. Janssen, Paul & Swanepoel, Jan & Veraverbeke, Noël, 2017. "Smooth copula-based estimation of the conditional density function with a single covariate," Journal of Multivariate Analysis, Elsevier, vol. 159(C), pages 39-48.
    21. Ichimura, Tsuyoshi & Fukuda, Daisuke, 2010. "A fast algorithm for computing least-squares cross-validations for nonparametric conditional kernel density functions," Computational Statistics & Data Analysis, Elsevier, vol. 54(12), pages 3404-3410, December.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:54:y:2010:i:7:p:1707-1718. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.