IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v203y2025ics0167947324001592.html
   My bibliography  Save this article

Testing sufficiency for transfer learning

Author

Listed:
  • Lin, Ziqian
  • Gao, Yuan
  • Wang, Feifei
  • Wang, Hansheng

Abstract

Modern statistical analysis often encounters high dimensional models but with limited sample sizes. This makes it difficult to estimate high-dimensional statistical models based on target data with limited sample size. Then how to borrow information from another large sized source data for more accurate target model estimation becomes an interesting problem. This leads to the useful idea of transfer learning. Various estimation methods in this regard have been developed recently. In this work, we study transfer learning from a different perspective. Specifically, we consider here the problem of testing for transfer learning sufficiency. We denote transfer learning sufficiency to be the null hypothesis. It refers to the situation that, with the help of the source data, the useful information contained in the feature vectors of the target data can be sufficiently extracted for predicting the interested target response. Therefore, the rejection of the null hypothesis implies that information useful for prediction remains in the feature vectors of the target data and thus calls for further exploration. To this end, we develop a novel testing procedure and a centralized and standardized test statistic, whose asymptotic null distribution is analytically derived. Simulation studies are presented to demonstrate the finite sample performance of the proposed method. A deep learning related real data example is presented for illustration purpose.

Suggested Citation

  • Lin, Ziqian & Gao, Yuan & Wang, Feifei & Wang, Hansheng, 2025. "Testing sufficiency for transfer learning," Computational Statistics & Data Analysis, Elsevier, vol. 203(C).
  • Handle: RePEc:eee:csdana:v:203:y:2025:i:c:s0167947324001592
    DOI: 10.1016/j.csda.2024.108075
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167947324001592
    Download Restriction: Full text for ScienceDirect subscribers only.

    File URL: https://libkey.io/10.1016/j.csda.2024.108075?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Lu, Wenbin & Zhang, Hao Helen, 2010. "On Estimation of Partially Linear Transformation Models," Journal of the American Statistical Association, American Statistical Association, vol. 105(490), pages 683-691.
    2. Wei Lan & Hansheng Wang & Chih-Ling Tsai, 2014. "Testing covariates in high-dimensional regression," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 66(2), pages 279-301, April.
    3. He, Xuming & Shao, Qi-Man, 2000. "On Parameters of Increasing Dimensions," Journal of Multivariate Analysis, Elsevier, vol. 73(1), pages 120-135, April.
    4. Sai Li & T. Tony Cai & Hongzhe Li, 2022. "Transfer learning for high‐dimensional linear regression: Prediction, estimation and minimax optimality," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 84(1), pages 149-173, February.
    5. Chunpeng Fan & Jason P. Fine, 2013. "Linear Transformation Model With Parametric Covariate Transformations," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 108(502), pages 701-712, June.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Alexandre Belloni & Victor Chernozhukov & Denis Chetverikov & Christian Hansen & Kengo Kato, 2018. "High-dimensional econometrics and regularized GMM," CeMMAP working papers CWP35/18, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    2. Alexandre Belloni & Victor Chernozhukov & Kengo Kato, 2019. "Valid Post-Selection Inference in High-Dimensional Approximately Sparse Quantile Regression Models," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 114(526), pages 749-758, April.
    3. Kato, Kengo & F. Galvao, Antonio & Montes-Rojas, Gabriel V., 2012. "Asymptotics for panel quantile regression models with individual effects," Journal of Econometrics, Elsevier, vol. 170(1), pages 76-91.
    4. repec:hal:wpspec:info:hdl:2441/5rkqqmvrn4tl22s9mc4b6ga2g is not listed on IDEAS
    5. Adam C. Sales & Ben B. Hansen, 2020. "Limitless Regression Discontinuity," Journal of Educational and Behavioral Statistics, , vol. 45(2), pages 143-174, April.
    6. HAFNER, Christian & LINTON, Oliver B. & TANG, Haihan, 2016. "Estimation of a Multiplicative Covariance Structure in the Large Dimensional Case," LIDAM Discussion Papers CORE 2016044, Université catholique de Louvain, Center for Operations Research and Econometrics (CORE).
    7. T. Tony Cai & Zijian Guo & Yin Xia, 2023. "Rejoinder on: statistical inference and large-scale multiple testing for high-dimensional regression models," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 32(4), pages 1187-1194, December.
    8. Hafner, Christian M. & Linton, Oliver B. & Tang, Haihan, 2020. "Estimation of a multiplicative correlation structure in the large dimensional case," Journal of Econometrics, Elsevier, vol. 217(2), pages 431-470.
    9. Ji-Yeon Yang & Xuming He, 2011. "A Multistep Protein Lysate Array Quantification Method and its Statistical Properties," Biometrics, The International Biometric Society, vol. 67(4), pages 1197-1205, December.
    10. Alexandre Belloni & Victor Chernozhukov & Kengo Kato, 2013. "Uniform Post Selection Inference for LAD Regression and Other Z-estimation problems," Papers 1304.0282, arXiv.org, revised Oct 2020.
    11. Arun G. Chandrasekhar & Victor Chernozhukov & Francesca Molinari & Paul Schrimpf, 2019. "Best Linear Approximations to Set Identified Functions: With an Application to the Gender Wage Gap," NBER Working Papers 25593, National Bureau of Economic Research, Inc.
    12. Demian Pouzo, 2014. "Bootstrap Consistency for Quadratic Forms of Sample Averages with Increasing Dimension," Papers 1411.2701, arXiv.org, revised Aug 2015.
    13. Fan, Jianqing & Guo, Yongyi & Jiang, Bai, 2022. "Adaptive Huber regression on Markov-dependent data," Stochastic Processes and their Applications, Elsevier, vol. 150(C), pages 802-818.
    14. Calhoun, Gray, 2011. "Hypothesis testing in linear regression when k/n is large," Journal of Econometrics, Elsevier, vol. 165(2), pages 163-174.
    15. Aiai Yu & Yujie Zhong & Xingdong Feng & Ying Wei, 2023. "Quantile regression for nonignorable missing data with its application of analyzing electronic medical records," Biometrics, The International Biometric Society, vol. 79(3), pages 2036-2049, September.
    16. Chaohua Dong & Jiti Gao & Bin Peng & Yundong Tu, 2021. "Multiple-index Nonstationary Time Series Models: Robust Estimation Theory and Practice," Papers 2111.02023, arXiv.org.
    17. Victor Chernozhukov & Roberto Rigobon & Thomas M. Stoker, 2010. "Set identification and sensitivity analysis with Tobin regressors," Quantitative Economics, Econometric Society, vol. 1(2), pages 255-277, November.
    18. V. Chernozhukov & I. Fernández-Val & A. Galichon, 2009. "Improving point and interval estimators of monotone functions by rearrangement," Biometrika, Biometrika Trust, vol. 96(3), pages 559-575.
    19. Christian M. Hafner & Oliver Linton & Haihan Tang, 2016. "Estimation of a Multiplicative Covariance Structure," CeMMAP working papers CWP23/16, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    20. Han, Dongxiao & Huang, Jian & Lin, Yuanyuan & Shen, Guohao, 2022. "Robust post-selection inference of high-dimensional mean regression with heavy-tailed asymmetric or heteroskedastic errors," Journal of Econometrics, Elsevier, vol. 230(2), pages 416-431.
    21. Zongwu Cai & Xiyuan Liu, 2020. "A Functional-Coefficient VAR Model for Dynamic Quantiles with Constructing Financial Network," WORKING PAPERS SERIES IN THEORETICAL AND APPLIED ECONOMICS 202017, University of Kansas, Department of Economics, revised Oct 2020.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:203:y:2025:i:c:s0167947324001592. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.