IDEAS home Printed from https://ideas.repec.org/a/eee/jmvana/v123y2014icp160-171.html
   My bibliography  Save this article

A nonparametric two-sample test applicable to high dimensional data

Author

Listed:
  • Biswas, Munmun
  • Ghosh, Anil K.

Abstract

The multivariate two-sample testing problem has been well investigated in the literature, and several parametric and nonparametric methods are available for it. However, most of these two-sample tests perform poorly for high dimensional data, and many of them are not applicable when the dimension of the data exceeds the sample size. In this article, we propose a multivariate two-sample test that can be conveniently used in the high dimension low sample size setup. Asymptotic results on the power properties of our proposed test are derived when the sample size remains fixed, and the dimension of the data grows to infinity. We investigate the performance of this test on several high-dimensional simulated and real data sets, and demonstrate its superiority over several other existing two-sample tests. We also study some theoretical properties of the proposed test for situations when the dimension of the data remains fixed and the sample size tends to infinity. In such cases, it turns out to be asymptotically distribution-free and consistent under general alternatives.

Suggested Citation

  • Biswas, Munmun & Ghosh, Anil K., 2014. "A nonparametric two-sample test applicable to high dimensional data," Journal of Multivariate Analysis, Elsevier, vol. 123(C), pages 160-171.
  • Handle: RePEc:eee:jmvana:v:123:y:2014:i:c:p:160-171
    DOI: 10.1016/j.jmva.2013.09.004
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0047259X13001966
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.jmva.2013.09.004?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Peter Hall & J. S. Marron & Amnon Neeman, 2005. "Geometric representation of high dimension, low sample size data," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(3), pages 427-444, June.
    2. Lu, Bo & Greevy, Robert & Xu, Xinyi & Beck, Cole, 2011. "Optimal Nonbipartite Matching and Its Statistical Applications," The American Statistician, American Statistical Association, vol. 65(1), pages 21-30.
    3. Paul R. Rosenbaum, 2005. "An exact distribution‐free test comparing two multivariate distributions based on adjacency," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(4), pages 515-530, September.
    4. Baringhaus, L. & Franz, C., 2004. "On a new multivariate two-sample test," Journal of Multivariate Analysis, Elsevier, vol. 88(1), pages 190-206, January.
    5. Srivastava, Muni S., 2009. "A test for the mean vector with fewer observations than the dimension under non-normality," Journal of Multivariate Analysis, Elsevier, vol. 100(3), pages 518-532, March.
    6. Chen, Song Xi & Qin, Yingli, 2010. "A Two Sample Test for High Dimensional Data with Applications to Gene-set Testing," MPRA Paper 59642, University Library of Munich, Germany.
    7. Zhenyu Liu & Reza Modarres, 2011. "A triangle test for equality of distribution functions in high dimensions," Journal of Nonparametric Statistics, Taylor & Francis Journals, vol. 23(3), pages 605-615.
    8. Rousson, Valentin, 2002. "On Distribution-Free Tests for the Multivariate Two-Sample Location-Scale Model," Journal of Multivariate Analysis, Elsevier, vol. 80(1), pages 43-57, January.
    9. Andrews, Donald W.K., 1988. "Laws of Large Numbers for Dependent Non-Identically Distributed Random Variables," Econometric Theory, Cambridge University Press, vol. 4(3), pages 458-467, December.
    10. Peter Hall, 2002. "Permutation tests for equality of distributions in high-dimensional settings," Biometrika, Biometrika Trust, vol. 89(2), pages 359-374, June.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Modarres, Reza, 2022. "A high dimensional dissimilarity measure," Computational Statistics & Data Analysis, Elsevier, vol. 175(C).
    2. Luai Al-Labadi & Forough Fazeli Asl & Zahra Saberi, 2022. "A Bayesian nonparametric multi-sample test in any dimension," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 106(2), pages 217-242, June.
    3. Liu, Zhi & Xia, Xiaochao & Zhou, Wang, 2015. "A test for equality of two distributions via jackknife empirical likelihood and characteristic functions," Computational Statistics & Data Analysis, Elsevier, vol. 92(C), pages 97-114.
    4. Xu Li & Wenjuan Hu & Baoxue Zhang, 2023. "Measuring and testing homogeneity of distributions by characteristic distance," Statistical Papers, Springer, vol. 64(2), pages 529-556, April.
    5. Jun Li, 2018. "Asymptotic normality of interpoint distances for high-dimensional data with applications to the two-sample problem," Biometrika, Biometrika Trust, vol. 105(3), pages 529-546.
    6. Ludwig Baringhaus & Norbert Henze, 2016. "Revisiting the two-sample runs test," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 25(3), pages 432-448, September.
    7. Yue, Mu & Li, Jialiang & Cheng, Ming-Yen, 2019. "Two-step sparse boosting for high-dimensional longitudinal data with varying coefficients," Computational Statistics & Data Analysis, Elsevier, vol. 131(C), pages 222-234.
    8. Modarres, Reza, 2016. "Multivariate Poisson interpoint distances," Statistics & Probability Letters, Elsevier, vol. 112(C), pages 113-123.
    9. Cousido-Rocha, Marta & de Uña-Álvarez, Jacobo & Hart, Jeffrey D., 2019. "A two-sample test for the equality of univariate marginal distributions for high-dimensional data," Journal of Multivariate Analysis, Elsevier, vol. 174(C).
    10. Nicolas Städler & Sach Mukherjee, 2017. "Two-sample testing in high dimensions," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 79(1), pages 225-246, January.
    11. Lingzhe Guo & Reza Modarres, 2020. "Testing the equality of matrix distributions," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 29(2), pages 289-307, June.
    12. Dai, Xinjie & Niu, Cuizhen & Guo, Xu, 2018. "Testing for central symmetry and inference of the unknown center," Computational Statistics & Data Analysis, Elsevier, vol. 127(C), pages 15-31.
    13. Long Feng & Changliang Zou & Zhaojun Wang, 2016. "Multivariate-Sign-Based High-Dimensional Tests for the Two-Sample Location Problem," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 111(514), pages 721-735, April.
    14. Mondal, Pronoy K. & Biswas, Munmun & Ghosh, Anil K., 2015. "On high dimensional two-sample tests based on nearest neighbors," Journal of Multivariate Analysis, Elsevier, vol. 141(C), pages 168-178.
    15. Reza Modarres & Yu Song, 2020. "Multivariate power series interpoint distances," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 29(4), pages 955-982, December.
    16. Reza Modarres, 2020. "Graphical Comparison of High‐Dimensional Distributions," International Statistical Review, International Statistical Institute, vol. 88(3), pages 698-714, December.
    17. Shin-ichi Tsukada, 2019. "High dimensional two-sample test based on the inter-point distance," Computational Statistics, Springer, vol. 34(2), pages 599-615, June.
    18. Huang, Yuan & Li, Changcheng & Li, Runze & Yang, Songshan, 2022. "An overview of tests on high-dimensional means," Journal of Multivariate Analysis, Elsevier, vol. 188(C).
    19. Lovato, Ilenia & Pini, Alessia & Stamm, Aymeric & Vantini, Simone, 2020. "Model-free two-sample test for network-valued data," Computational Statistics & Data Analysis, Elsevier, vol. 144(C).
    20. Pini, Alessia & Stamm, Aymeric & Vantini, Simone, 2018. "Hotelling’s T2 in separable Hilbert spaces," Journal of Multivariate Analysis, Elsevier, vol. 167(C), pages 284-305.
    21. Saha, Enakshi & Sarkar, Soham & Ghosh, Anil K., 2017. "Some high-dimensional one-sample tests based on functions of interpoint distances," Journal of Multivariate Analysis, Elsevier, vol. 161(C), pages 83-95.
    22. Reza Modarres, 2018. "Multinomial interpoint distances," Statistical Papers, Springer, vol. 59(1), pages 341-360, March.
    23. Paul, Biplab & De, Shyamal K. & Ghosh, Anil K., 2022. "Some clustering-based exact distribution-free k-sample tests applicable to high dimension, low sample size data," Journal of Multivariate Analysis, Elsevier, vol. 190(C).

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Shin-ichi Tsukada, 2019. "High dimensional two-sample test based on the inter-point distance," Computational Statistics, Springer, vol. 34(2), pages 599-615, June.
    2. Mondal, Pronoy K. & Biswas, Munmun & Ghosh, Anil K., 2015. "On high dimensional two-sample tests based on nearest neighbors," Journal of Multivariate Analysis, Elsevier, vol. 141(C), pages 168-178.
    3. Paul, Biplab & De, Shyamal K. & Ghosh, Anil K., 2022. "Some clustering-based exact distribution-free k-sample tests applicable to high dimension, low sample size data," Journal of Multivariate Analysis, Elsevier, vol. 190(C).
    4. Petrie, Adam, 2016. "Graph-theoretic multisample tests of equality in distribution for high dimensional data," Computational Statistics & Data Analysis, Elsevier, vol. 96(C), pages 145-158.
    5. Anil K. Ghosh & Munmun Biswas, 2016. "Distribution-free high-dimensional two-sample tests based on discriminating hyperplanes," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 25(3), pages 525-547, September.
    6. Saha, Enakshi & Sarkar, Soham & Ghosh, Anil K., 2017. "Some high-dimensional one-sample tests based on functions of interpoint distances," Journal of Multivariate Analysis, Elsevier, vol. 161(C), pages 83-95.
    7. Modarres, Reza, 2014. "On the interpoint distances of Bernoulli vectors," Statistics & Probability Letters, Elsevier, vol. 84(C), pages 215-222.
    8. Pini, Alessia & Stamm, Aymeric & Vantini, Simone, 2018. "Hotelling’s T2 in separable Hilbert spaces," Journal of Multivariate Analysis, Elsevier, vol. 167(C), pages 284-305.
    9. Davy Paindaveine & Thomas Verdebout, 2013. "Universal Asymptotics for High-Dimensional Sign Tests," Working Papers ECARES ECARES 2013-40, ULB -- Universite Libre de Bruxelles.
    10. Reza Modarres, 2020. "Graphical Comparison of High‐Dimensional Distributions," International Statistical Review, International Statistical Institute, vol. 88(3), pages 698-714, December.
    11. Lovato, Ilenia & Pini, Alessia & Stamm, Aymeric & Vantini, Simone, 2020. "Model-free two-sample test for network-valued data," Computational Statistics & Data Analysis, Elsevier, vol. 144(C).
    12. Yata, Kazuyoshi & Aoshima, Makoto, 2013. "PCA consistency for the power spiked model in high-dimensional settings," Journal of Multivariate Analysis, Elsevier, vol. 122(C), pages 334-354.
    13. Jiang Hu & Zhidong Bai & Chen Wang & Wei Wang, 2017. "On testing the equality of high dimensional mean vectors with unequal covariance matrices," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 69(2), pages 365-387, April.
    14. Nicolas Städler & Sach Mukherjee, 2017. "Two-sample testing in high dimensions," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 79(1), pages 225-246, January.
    15. Dong, Kai & Pang, Herbert & Tong, Tiejun & Genton, Marc G., 2016. "Shrinkage-based diagonal Hotelling’s tests for high-dimensional small sample size data," Journal of Multivariate Analysis, Elsevier, vol. 143(C), pages 127-142.
    16. Yin, Yanqing, 2021. "Test for high-dimensional mean vector under missing observations," Journal of Multivariate Analysis, Elsevier, vol. 186(C).
    17. Ishii, Aki & Yata, Kazuyoshi & Aoshima, Makoto, 2022. "Geometric classifiers for high-dimensional noisy data," Journal of Multivariate Analysis, Elsevier, vol. 188(C).
    18. M. Ahmad, 2014. "A $$U$$ -statistic approach for a high-dimensional two-sample mean testing problem under non-normality and Behrens–Fisher setting," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 66(1), pages 33-61, February.
    19. Cai, T. Tony & Xia, Yin, 2014. "High-dimensional sparse MANOVA," Journal of Multivariate Analysis, Elsevier, vol. 131(C), pages 174-196.
    20. Amanda Plunkett & Junyong Park, 2019. "Two-sample test for sparse high-dimensional multinomial distributions," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 28(3), pages 804-826, September.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:jmvana:v:123:y:2014:i:c:p:160-171. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/wps/find/journaldescription.cws_home/622892/description#description .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.