IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v54y2010i1p16-24.html
   My bibliography  Save this article

Fast surrogates of U-statistics

Author

Listed:
  • Lin, N.
  • Xi, R.

Abstract

U-statistics have long been known as a class of nonparametric estimators with good theoretical properties such as unbiasedness and asymptotic normality. However, their applications in modern statistical analysis are limited due to the high computational complexity, especially when massive data sets are becoming more and more common nowadays. In this paper, using the "divide-and-conquer" technique, we developed two surrogates of the U-statistics, aggregated U-statistics and average aggregated U-statistics, both of which are shown asymptotically equivalent to U-statistics and computationally much more efficient. When dividing the raw data set into K subsets, the two proposed estimators reduce the computational complexity from O(Nm) to O(K(N/K)m), which results in significant time reduction as long as K=o(N) and m>=2. The merit of the two proposed statistics is demonstrated by both simulation studies and real data examples.

Suggested Citation

  • Lin, N. & Xi, R., 2010. "Fast surrogates of U-statistics," Computational Statistics & Data Analysis, Elsevier, vol. 54(1), pages 16-24, January.
  • Handle: RePEc:eee:csdana:v:54:y:2010:i:1:p:16-24
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167-9473(09)00280-1
    Download Restriction: Full text for ScienceDirect subscribers only.
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Haataja, Riina & Larocque, Denis & Nevalainen, Jaakko & Oja, Hannu, 2009. "A weighted multivariate signed-rank test for cluster-correlated data," Journal of Multivariate Analysis, Elsevier, vol. 100(6), pages 1107-1119, July.
    2. Shen, Gang, 2008. "Asymptotics of Oja Median Estimate," Statistics & Probability Letters, Elsevier, vol. 78(14), pages 2137-2141, October.
    3. Marc Hallin & Thomas S. Ferguson & Christian Genest, 2000. "Kendall's tau for serial dependence," ULB Institutional Repository 2013/2093, ULB -- Universite Libre de Bruxelles.
    4. Oja, Hannu, 1983. "Descriptive statistics for multivariate distributions," Statistics & Probability Letters, Elsevier, vol. 1(6), pages 327-332, October.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Shen, Gang, 2009. "Asymptotics of a Theil-type estimate in multiple linear regression," Statistics & Probability Letters, Elsevier, vol. 79(8), pages 1053-1064, April.
    2. Eliana Christou, 2020. "Robust dimension reduction using sliced inverse median regression," Statistical Papers, Springer, vol. 61(5), pages 1799-1818, October.
    3. G. Zioutas & C. Chatzinakos & T. D. Nguyen & L. Pitsoulis, 2017. "Optimization techniques for multivariate least trimmed absolute deviation estimation," Journal of Combinatorial Optimization, Springer, vol. 34(3), pages 781-797, October.
    4. Gangwei Cai & Baoping Zou & Xiaoting Chi & Xincheng He & Yuang Guo & Wen Jiang & Qian Wu & Yujin Zhang & Yanna Zhou, 2023. "Neighborhood Spatio-Temporal Impacts of SDG 8.9: The Case of Urban and Rural Exhibition-Driven Tourism by Multiple Methods," Land, MDPI, vol. 12(2), pages 1-37, January.
    5. Victor Chernozhukov & Alfred Galichon & Marc Hallin & Marc Henry, 2014. "Monge-Kantorovich Depth, Quantiles, Ranks, and Signs," Papers 1412.8434, arXiv.org, revised Sep 2015.
    6. Zhou, Xinyu & Ma, Yijia & Wu, Wei, 2023. "Statistical depth for point process via the isometric log-ratio transformation," Computational Statistics & Data Analysis, Elsevier, vol. 187(C).
    7. Hwang, Jinsoo & Jorn, Hongsuk & Kim, Jeankyung, 2004. "On the performance of bivariate robust location estimators under contamination," Computational Statistics & Data Analysis, Elsevier, vol. 44(4), pages 587-601, January.
    8. J. T. A. S. Ferreira & M. F. J. Steel, 2004. "On Describing Multivariate Skewness: A Directional Approach," Econometrics 0409010, University Library of Munich, Germany.
    9. Masato Okamoto, 2009. "Decomposition of gini and multivariate gini indices," The Journal of Economic Inequality, Springer;Society for the Study of Economic Inequality, vol. 7(2), pages 153-177, June.
    10. Victor Chernozhukov & Alfred Galichon & Marc Hallin & Marc Henry, 2014. "Monge-Kantorovich Depth, Quantiles, Ranks, and Signs," Papers 1412.8434, arXiv.org, revised Sep 2015.
    11. Averous, Jean & Meste, Michel, 1997. "Median Balls: An Extension of the Interquantile Intervals to Multivariate Distributions," Journal of Multivariate Analysis, Elsevier, vol. 63(2), pages 222-241, November.
    12. Möttönen, J. & Hettmansperger, T. P. & Oja, H. & Tienari, J., 1998. "On the Efficiency of Affine Invariant Multivariate Rank Tests," Journal of Multivariate Analysis, Elsevier, vol. 66(1), pages 118-132, July.
    13. Taskinen, Sara & Kankainen, Annaliisa & Oja, Hannu, 2003. "Sign test of independence between two random vectors," Statistics & Probability Letters, Elsevier, vol. 62(1), pages 9-21, March.
    14. Nasri, Bouchra R., 2022. "Tests of serial dependence for multivariate time series with arbitrary distributions," Journal of Multivariate Analysis, Elsevier, vol. 192(C).
    15. Fantazzini, Dean, 2011. "Analysis of multidimensional probability distributions with copula functions," Applied Econometrics, Russian Presidential Academy of National Economy and Public Administration (RANEPA), vol. 22(2), pages 98-134.
    16. Eisenberg, Bennett, 2015. "The multivariate Gini ratio," Statistics & Probability Letters, Elsevier, vol. 96(C), pages 292-298.
    17. Aurora Torrente & Juan Romo, 2021. "Initializing k-means Clustering by Bootstrap and Data Depth," Journal of Classification, Springer;The Classification Society, vol. 38(2), pages 232-256, July.
    18. Fraiman, Ricardo & Gamboa, Fabrice & Moreno, Leonardo, 2019. "Connecting pairwise geodesic spheres by depth: DCOPS," Journal of Multivariate Analysis, Elsevier, vol. 169(C), pages 81-94.
    19. Kim, Jeankyung, 2000. "Rate of convergence of depth contours: with application to a multivariate metrically trimmed mean," Statistics & Probability Letters, Elsevier, vol. 49(4), pages 393-400, October.
    20. Bauwens, Luc & Veredas, David, 2004. "The stochastic conditional duration model: a latent variable model for the analysis of financial durations," Journal of Econometrics, Elsevier, vol. 119(2), pages 381-412, April.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:54:y:2010:i:1:p:16-24. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.