IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v11y2023i20p4374-d1264489.html
   My bibliography  Save this article

Testing Equality of Several Distributions at High Dimensions: A Maximum-Mean-Discrepancy-Based Approach

Author

Listed:
  • Zhi Peng Ong

    (Department of Information Systems and Analytics, National University of Singapore, Singapore 117417, Singapore)

  • Aixiang Andy Chen

    (School of Statistics and Mathematics, Guangdong University of Finance and Economics, Guangzhou 510320, China)

  • Tianming Zhu

    (National Institute of Education, Nanyang Technological University, Singapore 637616, Singapore)

  • Jin-Ting Zhang

    (Department of Statistics and Data Science, National University of Singapore, Singapore 117546, Singapore)

Abstract

With the development of modern data collection techniques, researchers often encounter high-dimensional data across various research fields. An important problem is to determine whether several groups of these high-dimensional data originate from the same population. To address this, this paper presents a novel k -sample test for equal distributions for high-dimensional data, utilizing the Maximum Mean Discrepancy (MMD). The test statistic is constructed using a V-statistic-based estimator of the squared MMD derived for several samples. The asymptotic null and alternative distributions of the test statistic are derived. To approximate the null distribution accurately, three simple methods are described. To evaluate the performance of the proposed test, two simulation studies and a real data example are presented, demonstrating the effectiveness and reliability of the test in practical applications.

Suggested Citation

  • Zhi Peng Ong & Aixiang Andy Chen & Tianming Zhu & Jin-Ting Zhang, 2023. "Testing Equality of Several Distributions at High Dimensions: A Maximum-Mean-Discrepancy-Based Approach," Mathematics, MDPI, vol. 11(20), pages 1-21, October.
  • Handle: RePEc:gam:jmathe:v:11:y:2023:i:20:p:4374-:d:1264489
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/11/20/4374/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/11/20/4374/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Paul R. Rosenbaum, 2005. "An exact distribution‐free test comparing two multivariate distributions based on adjacency," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(4), pages 515-530, September.
    2. Munmun Biswas & Minerva Mukhopadhyay & Anil K. Ghosh, 2014. "A distribution-free two-sample run test applicable to high-dimensional data," Biometrika, Biometrika Trust, vol. 101(4), pages 913-926.
    3. Hao Chen & Jerome H. Friedman, 2017. "A New Graph-Based Two-Sample Test for Multivariate and Object Data," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 112(517), pages 397-409, January.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Nicolas Städler & Sach Mukherjee, 2017. "Two-sample testing in high dimensions," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 79(1), pages 225-246, January.
    2. Shin-ichi Tsukada, 2019. "High dimensional two-sample test based on the inter-point distance," Computational Statistics, Springer, vol. 34(2), pages 599-615, June.
    3. Mondal, Pronoy K. & Biswas, Munmun & Ghosh, Anil K., 2015. "On high dimensional two-sample tests based on nearest neighbors," Journal of Multivariate Analysis, Elsevier, vol. 141(C), pages 168-178.
    4. Jun Li, 2018. "Asymptotic normality of interpoint distances for high-dimensional data with applications to the two-sample problem," Biometrika, Biometrika Trust, vol. 105(3), pages 529-546.
    5. Paul, Biplab & De, Shyamal K. & Ghosh, Anil K., 2022. "Some clustering-based exact distribution-free k-sample tests applicable to high dimension, low sample size data," Journal of Multivariate Analysis, Elsevier, vol. 190(C).
    6. Lovato, Ilenia & Pini, Alessia & Stamm, Aymeric & Vantini, Simone, 2020. "Model-free two-sample test for network-valued data," Computational Statistics & Data Analysis, Elsevier, vol. 144(C).
    7. Saha, Enakshi & Sarkar, Soham & Ghosh, Anil K., 2017. "Some high-dimensional one-sample tests based on functions of interpoint distances," Journal of Multivariate Analysis, Elsevier, vol. 161(C), pages 83-95.
    8. Biswas, Munmun & Ghosh, Anil K., 2014. "A nonparametric two-sample test applicable to high dimensional data," Journal of Multivariate Analysis, Elsevier, vol. 123(C), pages 160-171.
    9. Modarres, Reza, 2014. "On the interpoint distances of Bernoulli vectors," Statistics & Probability Letters, Elsevier, vol. 84(C), pages 215-222.
    10. García, Jorge Luis & Heckman, James J. & Ziff, Anna L., 2018. "Gender differences in the benefits of an influential early childhood program," European Economic Review, Elsevier, vol. 109(C), pages 9-22.
    11. Geng, Sen & Peng, Yujia & Shachat, Jason & Zhong, Huizhen, 2015. "Adolescents, cognitive ability, and minimax play," Economics Letters, Elsevier, vol. 128(C), pages 54-58.
    12. García, Jorge Luis & Heckman, James J. & Ronda, Victor, 2021. "The Lasting Effects of Early Childhood Education on Promoting the Skills and Social Mobility of Disadvantaged African Americans," IZA Discussion Papers 14575, Institute of Labor Economics (IZA).
    13. Ludwig Baringhaus & Norbert Henze, 2016. "Revisiting the two-sample runs test," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 25(3), pages 432-448, September.
    14. Gertler,Paul J.,Heckman,James J.,Pinto,Rodrigo Ribeiro Antunes,Chang-Lopez,Susan M.,Grantham-Mcgregor,Sally,Vermeersch,Christel M. J.,Walker,Susan,Wright,Amika S., 2021. "Effect of the Jamaica Early Childhood Stimulation Intervention on Labor Market Outcomes at Age 31," Policy Research Working Paper Series 9787, The World Bank.
    15. Pini, Alessia & Stamm, Aymeric & Vantini, Simone, 2018. "Hotelling’s T2 in separable Hilbert spaces," Journal of Multivariate Analysis, Elsevier, vol. 167(C), pages 284-305.
    16. Petrie, Adam, 2016. "Graph-theoretic multisample tests of equality in distribution for high dimensional data," Computational Statistics & Data Analysis, Elsevier, vol. 96(C), pages 145-158.
    17. Cousido-Rocha, Marta & de Uña-Álvarez, Jacobo & Hart, Jeffrey D., 2019. "A two-sample test for the equality of univariate marginal distributions for high-dimensional data," Journal of Multivariate Analysis, Elsevier, vol. 174(C).
    18. Martin Boďa & Mariana Považanová, 2020. "Productivity patterns in Europe: adaptation of the Malmquist index to measuring group performance and productivity change over time," Empirica, Springer;Austrian Institute for Economic Research;Austrian Economic Association, vol. 47(4), pages 949-989, November.
    19. Anil K. Ghosh & Munmun Biswas, 2016. "Distribution-free high-dimensional two-sample tests based on discriminating hyperplanes," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 25(3), pages 525-547, September.
    20. Reza Modarres, 2018. "Multinomial interpoint distances," Statistical Papers, Springer, vol. 59(1), pages 341-360, March.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:11:y:2023:i:20:p:4374-:d:1264489. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.