IDEAS home Printed from https://ideas.repec.org/a/spr/compst/v39y2024i3d10.1007_s00180-023-01374-0.html
   My bibliography  Save this article

Two-sample mean vector projection test in high-dimensional data

Author

Listed:
  • Caizhu Huang

    (University of Padova)

  • Xia Cui

    (Guangzhou University)

  • Euloge Clovis Kenne Pagui

    (University of Oslo)

Abstract

Statistical hypothesis testing for high-dimensional data poses challenges in recent inference. The Hotelling test is commonly applied to the comparison of mean vectors with a fixed dimension of mean vectors, but becomes unavailable when the dimension diverges or greater than sample sizes. For the high-dimensional regimes, we propose a two-sample mean vector test statistic by adding a projection term based on the Euclidean norm of the mean vectors. The projection term improves power and ensures the validity for dimensions greater than sample sizes, without relying on any inverse matrices. The proposed projection statistic, suitably standardized, approximates a standard normal distribution under mild conditions. Extensive simulation results, under different scenarios, show that the proposed approach enjoys a comparable Type I error and an improved efficiency power. We further illustrate its application by testing the equality of two acute lymphocytic leukemia genetic data and a significant test of the “Sell in May and Go Away” effect in China A stock market.

Suggested Citation

  • Caizhu Huang & Xia Cui & Euloge Clovis Kenne Pagui, 2024. "Two-sample mean vector projection test in high-dimensional data," Computational Statistics, Springer, vol. 39(3), pages 1061-1091, May.
  • Handle: RePEc:spr:compst:v:39:y:2024:i:3:d:10.1007_s00180-023-01374-0
    DOI: 10.1007/s00180-023-01374-0
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s00180-023-01374-0
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s00180-023-01374-0?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Deepak Nag Ayyala & Santu Ghosh & Daniel F. Linder, 2022. "Covariance matrix testing in high dimension using random projections," Computational Statistics, Springer, vol. 37(3), pages 1111-1141, July.
    2. Tianming Zhu & Jin-Ting Zhang, 2022. "Linear hypothesis testing in high-dimensional one-way MANOVA: a new normal reference approach," Computational Statistics, Springer, vol. 37(1), pages 1-27, March.
    3. Xia Cui & Runze Li & Guangren Yang & Wang Zhou, 2020. "Empirical likelihood test for a large-dimensional mean vector," Biometrika, Biometrika Trust, vol. 107(3), pages 591-607.
    4. Takayuki Yamada & Tetsuto Himeno, 2019. "Estimation of multivariate 3rd moment for high-dimensional data and its application for testing multivariate normality," Computational Statistics, Springer, vol. 34(2), pages 911-941, June.
    5. Jin-Ting Zhang & Jia Guo & Bu Zhou & Ming-Yen Cheng, 2020. "A Simple Two-Sample Test in High Dimensions Based on L2-Norm," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 115(530), pages 1011-1027, April.
    6. Lan Wang & Bo Peng & Runze Li, 2015. "A High-Dimensional Nonparametric Multivariate Test for Mean Vector," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 110(512), pages 1658-1669, December.
    7. Chen, Song Xi & Qin, Yingli, 2010. "A Two Sample Test for High Dimensional Data with Applications to Gene-set Testing," MPRA Paper 59642, University Library of Munich, Germany.
    8. Srivastava, Muni S. & Du, Meng, 2008. "A test for the mean vector with fewer observations than the dimension," Journal of Multivariate Analysis, Elsevier, vol. 99(3), pages 386-402, March.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Huang, Yuan & Li, Changcheng & Li, Runze & Yang, Songshan, 2022. "An overview of tests on high-dimensional means," Journal of Multivariate Analysis, Elsevier, vol. 188(C).
    2. Jianghao Li & Shizhe Hong & Zhenzhen Niu & Zhidong Bai, 2025. "Test for high-dimensional linear hypothesis of mean vectors via random integration," Statistical Papers, Springer, vol. 66(1), pages 1-34, January.
    3. Jin-Ting Zhang & Bu Zhou & Jia Guo, 2022. "Testing high-dimensional mean vector with applications," Statistical Papers, Springer, vol. 63(4), pages 1105-1137, August.
    4. Zhang, Jin-Ting & Zhou, Bu & Guo, Jia, 2022. "Linear hypothesis testing in high-dimensional heteroscedastic one-way MANOVA: A normal reference L2-norm based test," Journal of Multivariate Analysis, Elsevier, vol. 187(C).
    5. Li, Yang & Wang, Zhaojun & Zou, Changliang, 2016. "A simpler spatial-sign-based two-sample test for high-dimensional data," Journal of Multivariate Analysis, Elsevier, vol. 149(C), pages 192-198.
    6. Tianming Zhu, 2025. "The general linear hypothesis testing problem for multivariate functional data with applications," Statistical Papers, Springer, vol. 66(4), pages 1-32, June.
    7. Saha, Enakshi & Sarkar, Soham & Ghosh, Anil K., 2017. "Some high-dimensional one-sample tests based on functions of interpoint distances," Journal of Multivariate Analysis, Elsevier, vol. 161(C), pages 83-95.
    8. Qiu, Tao & Xu, Wangli & Zhu, Liping, 2021. "Two-sample test in high dimensions through random selection," Computational Statistics & Data Analysis, Elsevier, vol. 160(C).
    9. Harrar, Solomon W. & Kong, Xiaoli, 2022. "Recent developments in high-dimensional inference for multivariate data: Parametric, semiparametric and nonparametric approaches," Journal of Multivariate Analysis, Elsevier, vol. 188(C).
    10. Zhang, Jin-Ting & Zhu, Tianming, 2022. "A new normal reference test for linear hypothesis testing in high-dimensional heteroscedastic one-way MANOVA," Computational Statistics & Data Analysis, Elsevier, vol. 168(C).
    11. Zhang, Yu & Feng, Long, 2024. "Adaptive rank-based tests for high dimensional mean problems," Statistics & Probability Letters, Elsevier, vol. 214(C).
    12. Yin, Yanqing, 2021. "Test for high-dimensional mean vector under missing observations," Journal of Multivariate Analysis, Elsevier, vol. 186(C).
    13. Feng, Long & Zhang, Xiaoxu & Liu, Binghui, 2020. "A high-dimensional spatial rank test for two-sample location problems," Computational Statistics & Data Analysis, Elsevier, vol. 144(C).
    14. Li, Jun, 2023. "Finite sample t-tests for high-dimensional means," Journal of Multivariate Analysis, Elsevier, vol. 196(C).
    15. Wang, Wei & Lin, Nan & Tang, Xiang, 2019. "Robust two-sample test of high-dimensional mean vectors under dependence," Journal of Multivariate Analysis, Elsevier, vol. 169(C), pages 312-329.
    16. Yuanyuan Jiang & Xingzhong Xu, 2022. "A Two-Sample Test of High Dimensional Means Based on Posterior Bayes Factor," Mathematics, MDPI, vol. 10(10), pages 1-23, May.
    17. Ouyang, Yanyan & Liu, Jiamin & Tong, Tiejun & Xu, Wangli, 2022. "A rank-based high-dimensional test for equality of mean vectors," Computational Statistics & Data Analysis, Elsevier, vol. 173(C).
    18. Reza Modarres, 2024. "Hotelling $$T^2$$ T 2 test in high dimensions with application to Wilks outlier method," Statistical Papers, Springer, vol. 65(8), pages 5203-5218, October.
    19. Jinyuan Chang & Wen Zhou & Wen-Xin Zhou & Lan Wang, 2017. "Comparing large covariance matrices under weak conditions on the dependence structure and its application to gene clustering," Biometrics, The International Biometric Society, vol. 73(1), pages 31-41, March.
    20. Tzviel Frostig & Yoav Benjamini, 2022. "Testing the equality of multivariate means when $$p>n$$ p > n by combining the Hotelling and Simes tests," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 31(2), pages 390-415, June.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:compst:v:39:y:2024:i:3:d:10.1007_s00180-023-01374-0. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.