IDEAS home Printed from https://ideas.repec.org/p/uct/uconnp/2025-09.html
   My bibliography  Save this paper

High-Dimensional Weighted K-Means with Serial Dependence

Author

Listed:
  • Zhonghui Zhang

    (Nanjing Audit University)

  • Chihwa Kao

    (University of Connecticut)

  • Jungbin Hwang

    (University of Connecticut)

Abstract

In this paper, we propose a new K-means approach for high-dimensional panel data with unknown group memberships. We highlight that the standard K-means algorithm using Euclidean distance can su¤er from misclassi cation in nite samples due to serial correlation and heteroskedasticity in the panel data. Our proposed weighted K-means algorithm addresses this issue by weighting the Euclidean distance using the full covari-ance structure of idiosyncratic shocks. Assuming that both the cross-sectional and time dimensions of the panel grow large, we develop an asymptotic theory for the weighted K-means algorithm that establishes the consistency of the estimated group centroids and the oracle property for group membership estimation. For practical implemen-tation, we propose a feasible weighted K-means method that employs a regularized estimation of the high-dimensional covariance matrix in the K-means objective func-tion. Monte Carlo simulation results demonstrate the e¤ectiveness of our weighted K-means algorithm in estimating grouped xed-e¤ects models for large panels, partic-ularly when strong serial dependencies exist in both group-level trends and idiosyncratic components.

Suggested Citation

  • Zhonghui Zhang & Chihwa Kao & Jungbin Hwang, 2025. "High-Dimensional Weighted K-Means with Serial Dependence," Working papers 2025-09, University of Connecticut, Department of Economics.
  • Handle: RePEc:uct:uconnp:2025-09
    as

    Download full text from publisher

    File URL: https://media.economics.uconn.edu/working/2025-09.pdf
    File Function: Full text
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Phillips, Peter C.B., 2005. "Hac Estimation By Automated Regression," Econometric Theory, Cambridge University Press, vol. 21(1), pages 116-142, February.
    2. Liangjun Su & Zhentao Shi & Peter C. B. Phillips, 2016. "Identifying Latent Structures in Panel Data," Econometrica, Econometric Society, vol. 84, pages 2215-2264, November.
    3. Su, Liangjun & Wang, Wuyi & Xu, Xingbai, 2023. "Identifying latent group structures in spatial dynamic panels," Journal of Econometrics, Elsevier, vol. 235(2), pages 1955-1980.
    4. Jushan Bai & Serena Ng, 2002. "Determining the Number of Factors in Approximate Factor Models," Econometrica, Econometric Society, vol. 70(1), pages 191-221, January.
    5. Stéphane Bonhomme & Elena Manresa, 2015. "Grouped Patterns of Heterogeneity in Panel Data," Econometrica, Econometric Society, vol. 83(3), pages 1147-1184, May.
    6. John H. Dunning & Sarianna M. Lundan, 2008. "Multinational Enterprises and the Global Economy, Second Edition," Books, Edward Elgar Publishing, number 3215, March.
    7. Tomohiro Ando & Jushan Bai, 2016. "Panel Data Models with Grouped Factor Structure Under Unknown Group Membership," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 31(1), pages 163-191, January.
    8. Huang, Wenxin & Jin, Sainan & Su, Liangjun, 2020. "Identifying Latent Grouped Patterns In Cointegrated Panels," Econometric Theory, Cambridge University Press, vol. 36(3), pages 410-456, June.
    9. Hahn, Jinyong & Moon, Hyungsik Roger, 2010. "Panel Data Models With Finite Number Of Multiple Equilibria," Econometric Theory, Cambridge University Press, vol. 26(3), pages 863-881, June.
    10. Wenxin Huang & Yiru Wang & Lingyun Zhou, 2024. "Identify latent group structures in panel data: The classifylasso command," Stata Journal, StataCorp LLC, vol. 24(1), pages 46-71, March.
    11. Okui, Ryo & Wang, Wendun, 2021. "Heterogeneous structural breaks in panel data models," Journal of Econometrics, Elsevier, vol. 220(2), pages 447-473.
    12. Andrews, Donald W K, 1991. "Heteroskedasticity and Autocorrelation Consistent Covariance Matrix Estimation," Econometrica, Econometric Society, vol. 59(3), pages 817-858, May.
    13. Chen, Yongmin & Jiang, Haiwei & Liang, Yousha & Pan, Shiyuan, 2022. "The impact of foreign direct investment on innovation: Evidence from patent filings and citations in China," Journal of Comparative Economics, Elsevier, vol. 50(4), pages 917-945.
    14. Douglas Steinley & Michael Brusco, 2008. "Selection of Variables in Cluster Analysis: An Empirical Comparison of Eight Procedures," Psychometrika, Springer;The Psychometric Society, vol. 73(1), pages 125-144, March.
    15. Hansen, Lars Peter & Heaton, John & Yaron, Amir, 1996. "Finite-Sample Properties of Some Alternative GMM Estimators," Journal of Business & Economic Statistics, American Statistical Association, vol. 14(3), pages 262-280, July.
    16. Jushan Bai, 2009. "Panel Data Models With Interactive Fixed Effects," Econometrica, Econometric Society, vol. 77(4), pages 1229-1279, July.
    17. Sun, Yixiao & Yang, Jingjing, 2020. "Testing-optimal kernel choice in HAR inference," Journal of Econometrics, Elsevier, vol. 219(1), pages 123-136.
    18. Yixiao Sun, 2014. "Fixed‐Smoothing Asymptotics in a Two‐Step Generalized Method of Moments Framework," Econometrica, Econometric Society, vol. 82, pages 2327-2370, November.
    19. Bian, Yulin & Su, Liangjun, 2025. "A note on factor models with latent group structures," Economics Letters, Elsevier, vol. 252(C).
    20. Lumsdaine, Robin L. & Okui, Ryo & Wang, Wendun, 2023. "Estimation of panel group structure models with structural breaks in group memberships and coefficients," Journal of Econometrics, Elsevier, vol. 233(1), pages 45-65.
    21. Lin Chang-Ching & Ng Serena, 2012. "Estimation of Panel Data Models with Parameter Heterogeneity when Group Membership is Unknown," Journal of Econometric Methods, De Gruyter, vol. 1(1), pages 42-55, August.
    22. Wang, Wuyi & Su, Liangjun, 2021. "Identifying latent group structures in nonlinear panels," Journal of Econometrics, Elsevier, vol. 220(2), pages 272-295.
    23. Jushan Bai & Shuzhong Shi, 2011. "Estimating High Dimensional Covariance Matrices and its Applications," Annals of Economics and Finance, Society for AEF, vol. 12(2), pages 199-215, November.
    24. Jushan Bai, 2003. "Inferential Theory for Factor Models of Large Dimensions," Econometrica, Econometric Society, vol. 71(1), pages 135-171, January.
    25. Chihwa Kao & Min Seong Kim & Zhonghui Zhang, 2021. "Mahalanobis Metric Based Clustering for Fixed Effects Model," Sankhya B: The Indian Journal of Statistics, Springer;Indian Statistical Institute, vol. 83(2), pages 493-506, November.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Wang, Yiren & Phillips, Peter C.B. & Su, Liangjun, 2024. "Panel data models with time-varying latent group structures," Journal of Econometrics, Elsevier, vol. 240(1).
    2. Bian, Yulin & Su, Liangjun, 2025. "A note on factor models with latent group structures," Economics Letters, Elsevier, vol. 252(C).
    3. Mikihito Nishi, 2025. "K-Means Panel Data Clustering in the Presence of Small Groups," Papers 2508.15408, arXiv.org.
    4. Pionati, Alessandro, 2025. "Latent grouped structures in panel data: a review," MPRA Paper 123954, University Library of Munich, Germany.
    5. Yiren Wang & Liangjun Su & Yichong Zhang, 2022. "Low-rank Panel Quantile Regression: Estimation and Inference," Papers 2210.11062, arXiv.org.
    6. Yanbo Liu & Peter C. B. Phillips & Jun Yu, 2023. "A Panel Clustering Approach To Analyzing Bubble Behavior," International Economic Review, Department of Economics, University of Pennsylvania and Osaka University Institute of Social and Economic Research Association, vol. 64(4), pages 1347-1395, November.
    7. Yi Li & Xingxing Luo & Mengqi Liao, 2025. "Incorporating Prior Information in Latent Structures Identification for Panel Data Models," Mathematics, MDPI, vol. 13(9), pages 1-26, May.
    8. Yu, Lu & Gu, Jiaying & Volgushev, Stanislav, 2024. "Spectral clustering with variance information for group structure estimation in panel data," Journal of Econometrics, Elsevier, vol. 241(1).
    9. Huang, Wenxin & Jin, Sainan & Phillips, Peter C.B. & Su, Liangjun, 2021. "Nonstationary panel models with latent group structures and cross-section dependence," Journal of Econometrics, Elsevier, vol. 221(1), pages 198-222.
    10. Andreas Dzemski & Ryo Okui, 2024. "Confidence set for group membership," Quantitative Economics, Econometric Society, vol. 15(2), pages 245-277, May.
    11. Denis Chetverikov & Elena Manresa, 2022. "Spectral and post-spectral estimators for grouped panel data models," Papers 2212.13324, arXiv.org, revised Dec 2022.
    12. Leng, Xuan & Chen, Heng & Wang, Wendun, 2023. "Multi-dimensional latent group structures with heterogeneous distributions," Journal of Econometrics, Elsevier, vol. 233(1), pages 1-21.
    13. Thibaut Lamadon & Elena Manresa & Stephane Bonhomme, 2016. "Discretizing Unobserved Heterogeneity," 2016 Meeting Papers 1536, Society for Economic Dynamics.
    14. Paul Haimerl & Stephan Smeekes & Ines Wilms, 2025. "Estimation of Latent Group Structures in Time-Varying Panel Data Models," Papers 2503.23165, arXiv.org, revised Nov 2025.
    15. Li, Kunpeng & Cui, Guowei & Lu, Lina, 2020. "Efficient estimation of heterogeneous coefficients in panel data models with common shocks," Journal of Econometrics, Elsevier, vol. 216(2), pages 327-353.
    16. Xu Cheng & Frank Schorfheide & Peng Shao, 2025. "Clustering for Multi-Dimensional Heterogeneity with an Application to Production Function Estimation," PIER Working Paper Archive 25-014, Penn Institute for Economic Research, Department of Economics, University of Pennsylvania.
    17. Su, Liangjun & Wang, Wuyi & Xu, Xingbai, 2023. "Identifying latent group structures in spatial dynamic panels," Journal of Econometrics, Elsevier, vol. 235(2), pages 1955-1980.
    18. Oguzhan Akgun & Alain Pirotte & Giovanni Urga & Zhenlin Yang, 2025. "Testing Clustered Equal Predictive Ability with Unknown Clusters," Papers 2507.14621, arXiv.org, revised Jul 2025.
    19. Ando, Tomohiro & Bai, Jushan, 2021. "Large-scale generalized linear longitudinal data models with grouped patterns of unobserved heterogeneity," MPRA Paper 111431, University Library of Munich, Germany.
    20. Mehrabani, Ali, 2023. "Estimation and identification of latent group structures in panel data," Journal of Econometrics, Elsevier, vol. 235(2), pages 1464-1482.

    More about this item

    Keywords

    ;
    ;
    ;
    ;
    ;

    JEL classification:

    • C13 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods and Methodology: General - - - Estimation: General
    • C23 - Mathematical and Quantitative Methods - - Single Equation Models; Single Variables - - - Models with Panel Data; Spatio-temporal Models
    • C38 - Mathematical and Quantitative Methods - - Multiple or Simultaneous Equation Models; Multiple Variables - - - Classification Methdos; Cluster Analysis; Principal Components; Factor Analysis
    • C63 - Mathematical and Quantitative Methods - - Mathematical Methods; Programming Models; Mathematical and Simulation Modeling - - - Computational Techniques

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:uct:uconnp:2025-09. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Mark McConnel (email available below). General contact details of provider: https://edirc.repec.org/data/deuctus.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.