IDEAS home Printed from https://ideas.repec.org/p/zbw/irtgdp/2018018.html
   My bibliography  Save this paper

Adaptive Nonparametric Clustering

Author

Listed:
  • Efimov, Kirill
  • Adamyan, Larisa
  • Spokoiny, Vladimir

Abstract

This paper presents a new approach to non-parametric cluster analysis called Adaptive Weights Clustering (AWC). The idea is to identify the clustering structure by checking at different points and for dierent scales on departure from local homogeneity. The proposed procedure describes the clustering structure in terms of weights wij each of them measures the degree of local inhomogeneity for two neighbor local clusters using statistical tests of "no gap" between them. The procedure starts from very local scale, then the parameter of locality grows by some factor at each step. The method is fully adaptive and does not require to specify the number of clusters or their structure. The clustering results are not sensitive to noise and outliers, the procedure is able to recover dierent clusters with sharp edges or manifold structure. The method is scalable and computationally feasible. An intensive numerical study shows a state-of-the-art performance of the method in various articial examples and applications to text data. Our theoretical study states optimal sensitivity of AWC to local inhomogeneity.

Suggested Citation

  • Efimov, Kirill & Adamyan, Larisa & Spokoiny, Vladimir, 2018. "Adaptive Nonparametric Clustering," IRTG 1792 Discussion Papers 2018-018, Humboldt University of Berlin, International Research Training Group 1792 "High Dimensional Nonstationary Time Series".
  • Handle: RePEc:zbw:irtgdp:2018018
    as

    Download full text from publisher

    File URL: https://www.econstor.eu/bitstream/10419/230729/1/irtg1792dp2018-018.pdf
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Klaus Frick & Axel Munk & Hannes Sieling, 2014. "Multiscale change point inference," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 76(3), pages 495-580, June.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Qingliang Fan & Wei Zhong, 2018. "Nonparametric Additive Instrumental Variable Estimator: A Group Shrinkage Estimation Perspective," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 36(3), pages 388-399, July.
    2. Victor Chernozhukov & Wolfgang Härdle & Chen Huang & Weining Wang, 2018. "LASSO-driven inference in time and space," CeMMAP working papers CWP36/18, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    3. Zhong, Wei & Liu, Xi & Ma, Shuangge, 2018. "Variable selection and direction estimation for single-index models via DC-TGDR method," IRTG 1792 Discussion Papers 2018-050, Humboldt University of Berlin, International Research Training Group 1792 "High Dimensional Nonstationary Time Series".
    4. Guo, Li & Tao, Yubo & Härdle, Wolfgang Karl, 2018. "Understanding Latent Group Structure of Cryptocurrencies Market: A Dynamic Network Perspective," IRTG 1792 Discussion Papers 2018-032, Humboldt University of Berlin, International Research Training Group 1792 "High Dimensional Nonstationary Time Series".
    5. Packham, Natalie & Woebbeking, Fabian, 2018. "A factor-model approach for correlation scenarios and correlation stress-testing," IRTG 1792 Discussion Papers 2018-034, Humboldt University of Berlin, International Research Training Group 1792 "High Dimensional Nonstationary Time Series".
    6. Xiaojia Bao & Qingliang Fan, 2020. "The impact of temperature on gaming productivity: evidence from online games," Empirical Economics, Springer, vol. 58(2), pages 835-867, February.
    7. Packham, Natalie & Kalkbrener, Michael & Overbeck, Ludger, 2018. "Default probabilities and default correlations under stress," IRTG 1792 Discussion Papers 2018-037, Humboldt University of Berlin, International Research Training Group 1792 "High Dimensional Nonstationary Time Series".
    8. Kuczmaszewska, Anna & Yan, Ji Gao, 2018. "On complete convergence in Marcinkiewicz-Zygmund type SLLN for random variables," IRTG 1792 Discussion Papers 2018-041, Humboldt University of Berlin, International Research Training Group 1792 "High Dimensional Nonstationary Time Series".
    9. Chen, Haiqiang & Li, Yingxing & Lin, Ming & Zhu, Yanli, 2018. "A Regime Shift Model with Nonparametric Switching Mechanism," IRTG 1792 Discussion Papers 2018-048, Humboldt University of Berlin, International Research Training Group 1792 "High Dimensional Nonstationary Time Series".
    10. Yatracos, Yannis G., 2018. "Residual'S Influence Index (Rinfin), Bad Leverage And Unmasking In High Dimensional L2-Regression," IRTG 1792 Discussion Papers 2018-060, Humboldt University of Berlin, International Research Training Group 1792 "High Dimensional Nonstationary Time Series".
    11. Zbonakova, Lenka & Li, Xinjue & Härdle, Wolfgang Karl, 2018. "Penalized Adaptive Forecasting with Large Information Sets and Structural Changes," IRTG 1792 Discussion Papers 2018-039, Humboldt University of Berlin, International Research Training Group 1792 "High Dimensional Nonstationary Time Series".
    12. Packham, Natalie, 2018. "Optimal contracts under competition when uncertainty from adverse selection and moral hazard are present," IRTG 1792 Discussion Papers 2018-033, Humboldt University of Berlin, International Research Training Group 1792 "High Dimensional Nonstationary Time Series".
    13. Cai, Zongwu & Fang, Ying & Lin, Ming & Su, Jia, 2018. "Inferences for a Partially Varying Coefficient Model With Endogenous Regressors," IRTG 1792 Discussion Papers 2018-047, Humboldt University of Berlin, International Research Training Group 1792 "High Dimensional Nonstationary Time Series".
    14. Wang, Honglin & Yu, Fan & Zhou, Yinggang, 2018. "Property Investment and Rental Rate under Housing Price Uncertainty: A Real Options Approach," IRTG 1792 Discussion Papers 2018-051, Humboldt University of Berlin, International Research Training Group 1792 "High Dimensional Nonstationary Time Series".
    15. Yan, Ji Gao, 2018. "Complete Convergence and Complete Moment Convergence for Maximal Weighted Sums of Extended Negatively Dependent Random Variables," IRTG 1792 Discussion Papers 2018-040, Humboldt University of Berlin, International Research Training Group 1792 "High Dimensional Nonstationary Time Series".
    16. Kalkbrener, Michael & Packham, Natalie, 2018. "Correlation Under Stress In Normal Variance Mixture Models," IRTG 1792 Discussion Papers 2018-035, Humboldt University of Berlin, International Research Training Group 1792 "High Dimensional Nonstationary Time Series".
    17. Chiu, Hsin-Yu & Chiang, Mi-Hsiu & Kuo, Wei-Yu, 2018. "Predicative Ability of Similarity-based Futures Trading Strategies," IRTG 1792 Discussion Papers 2018-045, Humboldt University of Berlin, International Research Training Group 1792 "High Dimensional Nonstationary Time Series".
    18. Guo, Shaojun & Li, Dong & Li, Muyi, 2018. "Strict Stationarity Testing and GLAD Estimation of Double Autoregressive Models," IRTG 1792 Discussion Papers 2018-049, Humboldt University of Berlin, International Research Training Group 1792 "High Dimensional Nonstationary Time Series".
    19. Koziuk, Andzhey & Spokoiny, Vladimir, 2018. "Toolbox: Gaussian comparison on Eucledian balls," IRTG 1792 Discussion Papers 2018-028, Humboldt University of Berlin, International Research Training Group 1792 "High Dimensional Nonstationary Time Series".

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Bill Russell & Dooruj Rambaccussing, 2019. "Breaks and the statistical process of inflation: the case of estimating the ‘modern’ long-run Phillips curve," Empirical Economics, Springer, vol. 56(5), pages 1455-1475, May.
    2. Salvatore Fasola & Vito M. R. Muggeo & Helmut Küchenhoff, 2018. "A heuristic, iterative algorithm for change-point detection in abrupt change models," Computational Statistics, Springer, vol. 33(2), pages 997-1015, June.
    3. Michael Messer, 2022. "Bivariate change point detection: Joint detection of changes in expectation and variance," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 49(2), pages 886-916, June.
    4. Wu Wang & Xuming He & Zhongyi Zhu, 2020. "Statistical inference for multiple change‐point models," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 47(4), pages 1149-1170, December.
    5. Sokbae Lee & Myung Hwan Seo & Youngki Shin, 2016. "The lasso for high dimensional regression with a possible change point," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 78(1), pages 193-210, January.
    6. Guenther Walther & Andrew Perry, 2022. "Calibrating the scan statistic: Finite sample performance versus asymptotics," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 84(5), pages 1608-1639, November.
    7. Kang-Ping Lu & Shao-Tung Chang, 2021. "Robust Algorithms for Change-Point Regressions Using the t -Distribution," Mathematics, MDPI, vol. 9(19), pages 1-28, September.
    8. Andreas Anastasiou & Piotr Fryzlewicz, 2022. "Detecting multiple generalized change-points by isolating single ones," Metrika: International Journal for Theoretical and Applied Statistics, Springer, vol. 85(2), pages 141-174, February.
    9. Yudong Chen & Tengyao Wang & Richard J. Samworth, 2022. "High‐dimensional, multiscale online changepoint detection," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 84(1), pages 234-266, February.
    10. Stefan Albert & Michael Messer & Julia Schiemann & Jochen Roeper & Gaby Schneider, 2017. "Multi-Scale Detection of Variance Changes in Renewal Processes in the Presence of Rate Change Points," Journal of Time Series Analysis, Wiley Blackwell, vol. 38(6), pages 1028-1052, November.
    11. Chen, Yudong & Wang, Tengyao & Samworth, Richard J., 2022. "High-dimensional, multiscale online changepoint detection," LSE Research Online Documents on Economics 113665, London School of Economics and Political Science, LSE Library.
    12. Lu Shaochuan, 2023. "Scalable Bayesian Multiple Changepoint Detection via Auxiliary Uniformisation," International Statistical Review, International Statistical Institute, vol. 91(1), pages 88-113, April.
    13. Florian Pein & Hannes Sieling & Axel Munk, 2017. "Heterogeneous change point inference," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 79(4), pages 1207-1227, September.
    14. Camillo Cammarota, 2017. "Estimating the turning point location in shifted exponential model of time series," Journal of Applied Statistics, Taylor & Francis Journals, vol. 44(7), pages 1269-1281, May.
    15. Claudia Kirch & Christina Stoehr, 2022. "Sequential change point tests based on U‐statistics," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 49(3), pages 1184-1214, September.
    16. Kang-Ping Lu & Shao-Tung Chang, 2023. "An Advanced Segmentation Approach to Piecewise Regression Models," Mathematics, MDPI, vol. 11(24), pages 1-23, December.
    17. Cho, Haeran & Kirch, Claudia, 2022. "Bootstrap confidence intervals for multiple change points based on moving sum procedures," Computational Statistics & Data Analysis, Elsevier, vol. 175(C).
    18. Sean Jewell & Paul Fearnhead & Daniela Witten, 2022. "Testing for a change in mean after changepoint detection," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 84(4), pages 1082-1104, September.
    19. Chao Du & Chu-Lan Michael Kao & S. C. Kou, 2016. "Stepwise Signal Extraction via Marginal Likelihood," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 111(513), pages 314-330, March.
    20. Mohamed Salah Eddine Arrouch & Echarif Elharfaoui & Joseph Ngatchou-Wandji, 2023. "Change-Point Detection in the Volatility of Conditional Heteroscedastic Autoregressive Nonlinear Models," Mathematics, MDPI, vol. 11(18), pages 1-31, September.

    More about this item

    Keywords

    adaptive weights; clustering; gap coecient; manifold clustering;
    All these keywords.

    JEL classification:

    • C00 - Mathematical and Quantitative Methods - - General - - - General

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:zbw:irtgdp:2018018. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ZBW - Leibniz Information Centre for Economics (email available below). General contact details of provider: https://edirc.repec.org/data/wfhubde.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.