IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2512.20046.html

Assumption-lean covariate adjustment under covariate adaptive randomization when $p = o (n)$

Author

Listed:
  • Yujia Gu
  • Lin Liu
  • Wei Ma

Abstract

Adjusting for (baseline) covariates with working regression models becomes standard practice in the analysis of randomized clinical trials (RCT). When the dimension $p$ of the covariates is large relative to the sample size $n$, specifically $p = o (n)$, adjusting for covariates even in a linear working model by ordinary least squares can yield overly large bias, defeating the purpose of improving efficiency. This issue arises when no structural assumptions are imposed on the outcome model, a scenario that we refer to as the assumption-lean setting. Several new estimators have been proposed to address this issue. However, they focus mainly on simple randomization under the finite-population model, not covering covariate adaptive randomization (CAR) schemes under the superpopulation model. Due to improved covariate balance between treatment groups, CAR is more widely adopted in RCT; and the superpopulation model fits better when subjects are enrolled sequentially or when generalizing to a larger population is of interest. Thus, there is an urgent need to develop procedures in these settings, as the current regulatory guidance provides little concrete direction. In this paper, we fill this gap by demonstrating that an adjusted estimator based on second-order $U$-statistics can almost unbiasedly estimate the average treatment effect and enjoy a guaranteed efficiency gain if $p = o (n)$. In our analysis, we generalize the coupling technique commonly used in the CAR literature to $U$-statistics and also obtain several useful results for analyzing inverse sample Gram matrices by a delicate leave-$m$-out analysis, which may be of independent interest. Both synthetic and semi-synthetic experiments are conducted to demonstrate the superior finite-sample performance of our new estimator compared to popular benchmarks.

Suggested Citation

  • Yujia Gu & Lin Liu & Wei Ma, 2025. "Assumption-lean covariate adjustment under covariate adaptive randomization when $p = o (n)$," Papers 2512.20046, arXiv.org.
  • Handle: RePEc:arx:papers:2512.20046
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2512.20046
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Bhattacharya, Rabi N. & Ghosh, Jayanta K., 1992. "A class of U-statistics and asymptotic normality of the number of k-clusters," Journal of Multivariate Analysis, Elsevier, vol. 43(2), pages 300-330, November.
    2. Rajarshi Mukherjee & Whitney K. Newey & James Robins, 2017. "Semiparametric efficient empirical higher order influence function estimators," CeMMAP working papers CWP30/17, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    3. Roman Vershynin, 2012. "How Close is the Sample Covariance Matrix to the Actual Covariance Matrix?," Journal of Theoretical Probability, Springer, vol. 25(3), pages 655-686, September.
    4. Rajarshi Mukherjee & Whitney K. Newey & James Robins, 2017. "Semiparametric efficient empirical higher order influence function estimators," CeMMAP working papers 30/17, Institute for Fiscal Studies.
    5. Federico A. Bugni & Ivan A. Canay & Azeem M. Shaikh, 2018. "Inference Under Covariate-Adaptive Randomization," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 113(524), pages 1784-1796, October.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Xingyu Chen & Lin Liu & Rajarshi Mukherjee, 2024. "Method-of-Moments Inference for GLMs and Doubly Robust Functionals under Proportional Asymptotics," Papers 2408.06103, arXiv.org, revised May 2025.
    2. Liu, Lin & Mukherjee, Rajarshi & Robins, James M., 2024. "Assumption-lean falsification tests of rate double-robustness of double-machine-learning estimators," Journal of Econometrics, Elsevier, vol. 240(2).
    3. Lin Liu & Chang Li, 2023. "New $\sqrt{n}$-consistent, numerically stable higher-order influence function estimators," Papers 2302.08097, arXiv.org.
    4. Jikai Jin & Vasilis Syrgkanis, 2024. "Structure-agnostic Optimality of Doubly Robust Learning for Treatment Effect Estimation," Papers 2402.14264, arXiv.org, revised Jun 2025.
    5. Sihui Zhao & Xinbo Wang & Lin Liu & Xin Zhang, 2024. "Covariate Adjustment in Randomized Experiments Motivated by Higher-Order Influence Functions," Papers 2411.08491, arXiv.org, revised Nov 2025.
    6. Jikai Jin & Vasilis Syrgkanis, 2025. "Sharp Structure-Agnostic Lower Bounds for General Linear Functional Estimation," Papers 2512.17341, arXiv.org, revised Jan 2026.
    7. Ivan A Canay & Vishal Kamat, 2018. "Approximate Permutation Tests and Induced Order Statistics in the Regression Discontinuity Design," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 85(3), pages 1577-1608.
    8. Vishal Kamat, 2017. "Identifying the Effects of a Program Offer with an Application to Head Start," Papers 1711.02048, arXiv.org, revised Aug 2023.
    9. Chen, Canyi & Xu, Wangli & Zhu, Liping, 2022. "Distributed estimation in heterogeneous reduced rank regression: With application to order determination in sufficient dimension reduction," Journal of Multivariate Analysis, Elsevier, vol. 190(C).
    10. Jun Wang & Yahe Yu, 2024. "Improved estimation of average treatment effects under covariate‐adaptive randomization methods," Statistica Neerlandica, Netherlands Society for Statistics and Operations Research, vol. 78(2), pages 310-333, May.
    11. Tong Wang & Wei Ma, 2021. "The impact of misclassification on covariate‐adaptive randomized clinical trials," Biometrics, The International Biometric Society, vol. 77(2), pages 451-464, June.
    12. Jiang, Liang & Phillips, Peter C.B. & Tao, Yubo & Zhang, Yichong, 2023. "Regression-adjusted estimation of quantile treatment effects under covariate-adaptive randomizations," Journal of Econometrics, Elsevier, vol. 234(2), pages 758-776.
    13. Tymon Sloczynski, 2018. "A General Weighted Average Representation of the Ordinary and Two-Stage Least Squares Estimands," Working Papers 125, Brandeis University, Department of Economics and International Business School.
    14. Shi, Chengchun & Xu, Tianlin & Bergsma, Wicher & Li, Lexin, 2021. "Double generative adversarial networks for conditional independence testing," LSE Research Online Documents on Economics 112550, London School of Economics and Political Science, LSE Library.
    15. John A. List & Azeem M. Shaikh & Yang Xu, 2019. "Multiple hypothesis testing in experimental economics," Experimental Economics, Springer;Economic Science Association, vol. 22(4), pages 773-793, December.
    16. Chao, Shih-Kang & Härdle, Wolfgang K. & Yuan, Ming, 2021. "Factorisable Multitask Quantile Regression," Econometric Theory, Cambridge University Press, vol. 37(4), pages 794-816, August.
    17. Guiteras, Raymond P. & Levine, David I. & Polley, Thomas H., 2016. "The pursuit of balance in sequential randomized trials," Development Engineering, Elsevier, vol. 1(C), pages 12-25.
    18. Federico A. Bugni & Ivan A. Canay & Azeem M. Shaikh, 2019. "Inference under covariate‐adaptive randomization with multiple treatments," Quantitative Economics, Econometric Society, vol. 10(4), pages 1747-1785, November.
    19. Zhao, Anqi & Ding, Peng, 2024. "No star is good news: A unified look at rerandomization based on p-values from covariate balance tests," Journal of Econometrics, Elsevier, vol. 241(1).
    20. Federico A. Bugni & Ivan A. Canay & Azeem M. Shaikh, 2018. "Inference Under Covariate-Adaptive Randomization," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 113(524), pages 1784-1796, October.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2512.20046. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.