IDEAS home Printed from https://ideas.repec.org/p/uct/uconnp/2025-09.html
   My bibliography  Save this paper

High-Dimensional Weighted K-Means with Serial Dependence

Author

Listed:
  • Zhonghui Zhang

    (Nanjing Audit University)

  • Chihwa Kao

    (University of Connecticut)

  • Jungbin Hwang

    (University of Connecticut)

Abstract

In this paper, we propose a new K-means approach for high-dimensional panel data with unknown group memberships. We highlight that the standard K-means algorithm using Euclidean distance can su¤er from misclassi cation in nite samples due to serial correlation and heteroskedasticity in the panel data. Our proposed weighted K-means algorithm addresses this issue by weighting the Euclidean distance using the full covari-ance structure of idiosyncratic shocks. Assuming that both the cross-sectional and time dimensions of the panel grow large, we develop an asymptotic theory for the weighted K-means algorithm that establishes the consistency of the estimated group centroids and the oracle property for group membership estimation. For practical implemen-tation, we propose a feasible weighted K-means method that employs a regularized estimation of the high-dimensional covariance matrix in the K-means objective func-tion. Monte Carlo simulation results demonstrate the e¤ectiveness of our weighted K-means algorithm in estimating grouped xed-e¤ects models for large panels, partic-ularly when strong serial dependencies exist in both group-level trends and idiosyncratic components.

Suggested Citation

  • Zhonghui Zhang & Chihwa Kao & Jungbin Hwang, 2025. "High-Dimensional Weighted K-Means with Serial Dependence," Working papers 2025-09, University of Connecticut, Department of Economics.
  • Handle: RePEc:uct:uconnp:2025-09
    as

    Download full text from publisher

    File URL: https://media.economics.uconn.edu/working/2025-09.pdf
    File Function: Full text
    Download Restriction: no
    ---><---

    More about this item

    Keywords

    ;
    ;
    ;
    ;
    ;

    JEL classification:

    • C13 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods and Methodology: General - - - Estimation: General
    • C23 - Mathematical and Quantitative Methods - - Single Equation Models; Single Variables - - - Models with Panel Data; Spatio-temporal Models
    • C38 - Mathematical and Quantitative Methods - - Multiple or Simultaneous Equation Models; Multiple Variables - - - Classification Methdos; Cluster Analysis; Principal Components; Factor Analysis
    • C63 - Mathematical and Quantitative Methods - - Mathematical Methods; Programming Models; Mathematical and Simulation Modeling - - - Computational Techniques

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:uct:uconnp:2025-09. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Mark McConnel (email available below). General contact details of provider: https://edirc.repec.org/data/deuctus.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.