IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2508.15408.html
   My bibliography  Save this paper

K-Means Panel Data Clustering in the Presence of Small Groups

Author

Listed:
  • Mikihito Nishi

Abstract

We consider panel data models with group structure. We study the asymptotic behavior of least-squares estimators and information criterion for the number of groups, allowing for the presence of small groups that have an asymptotically negligible relative size. Our contributions are threefold. First, we derive sufficient conditions under which the least-squares estimators are consistent and asymptotically normal. One of the conditions implies that a longer sample period is required as there are smaller groups. Second, we show that information criteria for the number of groups proposed in earlier works can be inconsistent or perform poorly in the presence of small groups. Third, we propose modified information criteria (MIC) designed to perform well in the presence of small groups. A Monte Carlo simulation confirms their good performance in finite samples. An empirical application illustrates that K-means clustering paired with the proposed MIC allows one to discover small groups without producing too many groups. This enables characterizing small groups and differentiating them from the other large groups in a parsimonious group structure.

Suggested Citation

  • Mikihito Nishi, 2025. "K-Means Panel Data Clustering in the Presence of Small Groups," Papers 2508.15408, arXiv.org.
  • Handle: RePEc:arx:papers:2508.15408
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2508.15408
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Liangjun Su & Zhentao Shi & Peter C. B. Phillips, 2016. "Identifying Latent Structures in Panel Data," Econometrica, Econometric Society, vol. 84, pages 2215-2264, November.
    2. Denis Chetverikov & Elena Manresa, 2022. "Spectral and post-spectral estimators for grouped panel data models," Papers 2212.13324, arXiv.org, revised Dec 2022.
    3. Janys, Lena & Siflinger, Bettina, 2024. "Mental health and abortions among young women: time-varying unobserved heterogeneity, health behaviors, and risky decisions," Journal of Econometrics, Elsevier, vol. 238(1).
    4. Jushan Bai & Serena Ng, 2002. "Determining the Number of Factors in Approximate Factor Models," Econometrica, Econometric Society, vol. 70(1), pages 191-221, January.
    5. Stéphane Bonhomme & Elena Manresa, 2015. "Grouped Patterns of Heterogeneity in Panel Data," Econometrica, Econometric Society, vol. 83(3), pages 1147-1184, May.
    6. Wu Wang & Zhongyi Zhu, 2024. "Homogeneity and Sparsity Analysis for High-Dimensional Panel Data Models," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 42(1), pages 26-35, January.
    7. Tomohiro Ando & Jushan Bai, 2017. "Clustering Huge Number of Financial Time Series: A Panel Data Approach With High-Dimensional Predictors and Factor Structures," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 112(519), pages 1182-1198, July.
    8. Phillips, Peter C.B. & Sul, Donggyu, 2007. "Bias in dynamic panel estimation with fixed effects, incidental trends and cross section dependence," Journal of Econometrics, Elsevier, vol. 137(1), pages 162-188, March.
    9. Wuyi Wang & Peter C. B. Phillips & Liangjun Su, 2018. "Homogeneity pursuit in panel data models: Theory and application," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 33(6), pages 797-815, September.
    10. Grunewald, Nicole & Klasen, Stephan & Martínez-Zarzoso, Inmaculada & Muris, Chris, 2017. "The Trade-off Between Income Inequality and Carbon Dioxide Emissions," Ecological Economics, Elsevier, vol. 142(C), pages 249-256.
    11. Hahn, Jinyong & Moon, Hyungsik Roger, 2010. "Panel Data Models With Finite Number Of Multiple Equilibria," Econometric Theory, Cambridge University Press, vol. 26(3), pages 863-881, June.
    12. Xun Lu & Liangjun Su, 2017. "Determining the number of groups in latent panel structures with an application to income and democracy," Quantitative Economics, Econometric Society, vol. 8(3), pages 729-760, November.
    13. Okui, Ryo & Wang, Wendun, 2021. "Heterogeneous structural breaks in panel data models," Journal of Econometrics, Elsevier, vol. 220(2), pages 447-473.
    14. Liu, Ruiqi & Shang, Zuofeng & Zhang, Yonghui & Zhou, Qiankun, 2020. "Identification and estimation in panel models with overspecified number of groups," Journal of Econometrics, Elsevier, vol. 215(2), pages 574-590.
    15. Mehrabani, Ali, 2023. "Estimation and identification of latent group structures in panel data," Journal of Econometrics, Elsevier, vol. 235(2), pages 1464-1482.
    16. Jinyong Hahn & Guido Kuersteiner, 2002. "Asymptotically Unbiased Inference for a Dynamic Panel Model with Fixed Effects when Both "n" and "T" Are Large," Econometrica, Econometric Society, vol. 70(4), pages 1639-1657, July.
    17. Jushan Bai, 2009. "Panel Data Models With Interactive Fixed Effects," Econometrica, Econometric Society, vol. 77(4), pages 1229-1279, July.
    18. Lumsdaine, Robin L. & Okui, Ryo & Wang, Wendun, 2023. "Estimation of panel group structure models with structural breaks in group memberships and coefficients," Journal of Econometrics, Elsevier, vol. 233(1), pages 45-65.
    19. Lin Chang-Ching & Ng Serena, 2012. "Estimation of Panel Data Models with Parameter Heterogeneity when Group Membership is Unknown," Journal of Econometric Methods, De Gruyter, vol. 1(1), pages 42-55, August.
    20. Wang, Wuyi & Su, Liangjun, 2021. "Identifying latent group structures in nonlinear panels," Journal of Econometrics, Elsevier, vol. 220(2), pages 272-295.
    21. Myers, Stewart C., 1977. "Determinants of corporate borrowing," Journal of Financial Economics, Elsevier, vol. 5(2), pages 147-175, November.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Pionati, Alessandro, 2025. "Latent grouped structures in panel data: a review," MPRA Paper 123954, University Library of Munich, Germany.
    2. Paul Haimerl & Stephan Smeekes & Ines Wilms, 2025. "Estimation of Latent Group Structures in Time-Varying Panel Data Models," Papers 2503.23165, arXiv.org.
    3. Andreas Dzemski & Ryo Okui, 2024. "Confidence set for group membership," Quantitative Economics, Econometric Society, vol. 15(2), pages 245-277, May.
    4. Yanbo Liu & Peter C. B. Phillips & Jun Yu, 2023. "A Panel Clustering Approach To Analyzing Bubble Behavior," International Economic Review, Department of Economics, University of Pennsylvania and Osaka University Institute of Social and Economic Research Association, vol. 64(4), pages 1347-1395, November.
    5. Mehrabani, Ali, 2023. "Estimation and identification of latent group structures in panel data," Journal of Econometrics, Elsevier, vol. 235(2), pages 1464-1482.
    6. Yu, Lu & Gu, Jiaying & Volgushev, Stanislav, 2024. "Spectral clustering with variance information for group structure estimation in panel data," Journal of Econometrics, Elsevier, vol. 241(1).
    7. Wang, Yiren & Phillips, Peter C.B. & Su, Liangjun, 2024. "Panel data models with time-varying latent group structures," Journal of Econometrics, Elsevier, vol. 240(1).
    8. Denis Chetverikov & Elena Manresa, 2022. "Spectral and post-spectral estimators for grouped panel data models," Papers 2212.13324, arXiv.org, revised Dec 2022.
    9. Leng, Xuan & Chen, Heng & Wang, Wendun, 2023. "Multi-dimensional latent group structures with heterogeneous distributions," Journal of Econometrics, Elsevier, vol. 233(1), pages 1-21.
    10. Langevin, R.;, 2024. "Consistent Estimation of Finite Mixtures: An Application to Latent Group Panel Structures," Health, Econometrics and Data Group (HEDG) Working Papers 24/16, HEDG, c/o Department of Economics, University of York.
    11. Lumsdaine, Robin L. & Okui, Ryo & Wang, Wendun, 2023. "Estimation of panel group structure models with structural breaks in group memberships and coefficients," Journal of Econometrics, Elsevier, vol. 233(1), pages 45-65.
    12. Yiren Wang & Liangjun Su & Yichong Zhang, 2022. "Low-rank Panel Quantile Regression: Estimation and Inference," Papers 2210.11062, arXiv.org.
    13. Miao, Ke & Su, Liangjun & Wang, Wendun, 2020. "Panel threshold regressions with latent group structures," Journal of Econometrics, Elsevier, vol. 214(2), pages 451-481.
    14. Ando, Tomohiro & Bai, Jushan, 2021. "Large-scale generalized linear longitudinal data models with grouped patterns of unobserved heterogeneity," MPRA Paper 111431, University Library of Munich, Germany.
    15. Xu Cheng & Frank Schorfheide & Peng Shao, 2025. "Clustering for Multi-Dimensional Heterogeneity with an Application to Production Function Estimation," PIER Working Paper Archive 25-014, Penn Institute for Economic Research, Department of Economics, University of Pennsylvania.
    16. Su, Liangjun & Wang, Wuyi & Xu, Xingbai, 2023. "Identifying latent group structures in spatial dynamic panels," Journal of Econometrics, Elsevier, vol. 235(2), pages 1955-1980.
    17. Xiaorong Yang & Jia Chen & Degui Li & Runze Li, 2024. "Functional-Coefficient Quantile Regression for Panel Data with Latent Group Structure," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 42(3), pages 1026-1040, July.
    18. Yi Li & Xingxing Luo & Mengqi Liao, 2025. "Incorporating Prior Information in Latent Structures Identification for Panel Data Models," Mathematics, MDPI, vol. 13(9), pages 1-26, May.
    19. Saptorshee Kanto Chakraborty & Massimiliano Mazzanti, 2021. "Revisiting the literature on the dynamic Environmental Kuznets Curves using a latent structure approach," Economia Politica: Journal of Analytical and Institutional Economics, Springer;Fondazione Edison, vol. 38(3), pages 923-941, October.
    20. Okui, Ryo & Wang, Wendun, 2021. "Heterogeneous structural breaks in panel data models," Journal of Econometrics, Elsevier, vol. 220(2), pages 447-473.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2508.15408. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.