IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v54y2010i1p16-24.html

Fast surrogates of U-statistics

Author

Listed:
  • Lin, N.
  • Xi, R.

Abstract

U-statistics have long been known as a class of nonparametric estimators with good theoretical properties such as unbiasedness and asymptotic normality. However, their applications in modern statistical analysis are limited due to the high computational complexity, especially when massive data sets are becoming more and more common nowadays. In this paper, using the "divide-and-conquer" technique, we developed two surrogates of the U-statistics, aggregated U-statistics and average aggregated U-statistics, both of which are shown asymptotically equivalent to U-statistics and computationally much more efficient. When dividing the raw data set into K subsets, the two proposed estimators reduce the computational complexity from O(Nm) to O(K(N/K)m), which results in significant time reduction as long as K=o(N) and m>=2. The merit of the two proposed statistics is demonstrated by both simulation studies and real data examples.

Suggested Citation

  • Lin, N. & Xi, R., 2010. "Fast surrogates of U-statistics," Computational Statistics & Data Analysis, Elsevier, vol. 54(1), pages 16-24, January.
  • Handle: RePEc:eee:csdana:v:54:y:2010:i:1:p:16-24
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167-9473(09)00280-1
    Download Restriction: Full text for ScienceDirect subscribers only.
    ---><---

    As the access to this document is restricted, you may want to

    for a different version of it.

    References listed on IDEAS

    as
    1. Shen, Gang, 2008. "Asymptotics of Oja Median Estimate," Statistics & Probability Letters, Elsevier, vol. 78(14), pages 2137-2141, October.
    2. Haataja, Riina & Larocque, Denis & Nevalainen, Jaakko & Oja, Hannu, 2009. "A weighted multivariate signed-rank test for cluster-correlated data," Journal of Multivariate Analysis, Elsevier, vol. 100(6), pages 1107-1119, July.
    3. Oja, Hannu, 1983. "Descriptive statistics for multivariate distributions," Statistics & Probability Letters, Elsevier, vol. 1(6), pages 327-332, October.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Dimitris N Politis, 2024. "Scalable subsampling: computation, aggregation and inference," Biometrika, Biometrika Trust, vol. 111(1), pages 347-354.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Eliana Christou, 2020. "Robust dimension reduction using sliced inverse median regression," Statistical Papers, Springer, vol. 61(5), pages 1799-1818, October.
    2. Shen, Gang, 2009. "Asymptotics of a Theil-type estimate in multiple linear regression," Statistics & Probability Letters, Elsevier, vol. 79(8), pages 1053-1064, April.
    3. Romanazzi, Mario, 2009. "Data depth, random simplices and multivariate dispersion," Statistics & Probability Letters, Elsevier, vol. 79(12), pages 1473-1479, June.
    4. G. Zioutas & C. Chatzinakos & T. D. Nguyen & L. Pitsoulis, 2017. "Optimization techniques for multivariate least trimmed absolute deviation estimation," Journal of Combinatorial Optimization, Springer, vol. 34(3), pages 781-797, October.
    5. Gangwei Cai & Baoping Zou & Xiaoting Chi & Xincheng He & Yuang Guo & Wen Jiang & Qian Wu & Yujin Zhang & Yanna Zhou, 2023. "Neighborhood Spatio-Temporal Impacts of SDG 8.9: The Case of Urban and Rural Exhibition-Driven Tourism by Multiple Methods," Land, MDPI, vol. 12(2), pages 1-37, January.
    6. Victor Chernozhukov & Alfred Galichon & Marc Hallin & Marc Henry, 2014. "Monge-Kantorovich Depth, Quantiles, Ranks, and Signs," Papers 1412.8434, arXiv.org, revised Sep 2015.
    7. Łuczak, Aleksandra & Just, Małgorzata, 2021. "Sustainable development of territorial units: MCDM approach with optimal tail selection," Ecological Modelling, Elsevier, vol. 457(C).
    8. Zhou, Xinyu & Ma, Yijia & Wu, Wei, 2023. "Statistical depth for point process via the isometric log-ratio transformation," Computational Statistics & Data Analysis, Elsevier, vol. 187(C).
    9. Hwang, Jinsoo & Jorn, Hongsuk & Kim, Jeankyung, 2004. "On the performance of bivariate robust location estimators under contamination," Computational Statistics & Data Analysis, Elsevier, vol. 44(4), pages 587-601, January.
    10. J. T. A. S. Ferreira & M. F. J. Steel, 2004. "On Describing Multivariate Skewness: A Directional Approach," Econometrics 0409010, University Library of Munich, Germany.
    11. repec:spo:wpmain:info:hdl:2441/64itsev5509q8aa5mrbhi0g0b6 is not listed on IDEAS
    12. Masato Okamoto, 2009. "Decomposition of gini and multivariate gini indices," The Journal of Economic Inequality, Springer;Society for the Study of Economic Inequality, vol. 7(2), pages 153-177, June.
    13. Victor Chernozhukov & Alfred Galichon & Marc Hallin & Marc Henry, 2014. "Monge-Kantorovich Depth, Quantiles, Ranks, and Signs," Papers 1412.8434, arXiv.org, revised Sep 2015.
    14. Gangwei Cai & Yan Hong & Lei Xu & Weijun Gao & Ka Wang & Xiaoting Chi, 2020. "An Evaluation of Green Ryokans through a Tourism Accommodation Survey and Customer-Satisfaction-Related CASBEE–IPA after COVID-19 Pandemic," Sustainability, MDPI, vol. 13(1), pages 1-24, December.
    15. Averous, Jean & Meste, Michel, 1997. "Median Balls: An Extension of the Interquantile Intervals to Multivariate Distributions," Journal of Multivariate Analysis, Elsevier, vol. 63(2), pages 222-241, November.
    16. Möttönen, J. & Hettmansperger, T. P. & Oja, H. & Tienari, J., 1998. "On the Efficiency of Affine Invariant Multivariate Rank Tests," Journal of Multivariate Analysis, Elsevier, vol. 66(1), pages 118-132, July.
    17. Taskinen, Sara & Kankainen, Annaliisa & Oja, Hannu, 2003. "Sign test of independence between two random vectors," Statistics & Probability Letters, Elsevier, vol. 62(1), pages 9-21, March.
    18. Fernandez-Ponce, J. M. & Suarez-Llorens, A., 2003. "A multivariate dispersion ordering based on quantiles more widely separated," Journal of Multivariate Analysis, Elsevier, vol. 85(1), pages 40-53, April.
    19. Masse, Jean-Claude & Plante, Jean-Francois, 2003. "A Monte Carlo study of the accuracy and robustness of ten bivariate location estimators," Computational Statistics & Data Analysis, Elsevier, vol. 42(1-2), pages 1-26, February.
    20. Rousson, Valentin, 2002. "On Distribution-Free Tests for the Multivariate Two-Sample Location-Scale Model," Journal of Multivariate Analysis, Elsevier, vol. 80(1), pages 43-57, January.
    21. Eisenberg, Bennett, 2015. "The multivariate Gini ratio," Statistics & Probability Letters, Elsevier, vol. 96(C), pages 292-298.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:54:y:2010:i:1:p:16-24. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.