IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2602.19705.html

Model Selection in High-Dimensional Linear Regression using Boosting with Multiple Testing

Author

Listed:
  • George Kapetanios
  • Vasilis Sarafidis
  • Alexia Ventouri

Abstract

High-dimensional regression specification and analysis is a complex and active area of research in statistics, machine learning, and econometrics. This paper proposes a new approach, Boosting with Multiple Testing (BMT), which combines forward stepwise variable selection with the multiple testing framework of Chudik et al (2018). At each stage, the model is updated by adding only the most significant regressor conditional on those already included, while a family-wise multiple testing filter is applied to the remaining candidates. In this way, the method retains the strong screening properties of Chudik et al (2018) while operating in a less greedy manner with respect to proxy and noise variables. Using sharp probability inequalities for heterogeneous strongly mixing processes from Dendramis et al (2022), we show that BMT enjoys oracle type properties relative to an approximating model that includes all true signals and excludes pure noise variables: this model is selected with probability tending to one, and the resulting estimator achieves standard parametric rates for prediction error and coefficient estimation. Additional results establish conditions under which BMT recovers the exact true model and avoids selection of proxy signals. Monte Carlo experiments indicate that BMT performs very well relative to OCMT and Lasso type procedures, delivering higher model selection accuracy and smaller RMSE for the estimated coefficients, especially under strong multicollinearity of the regressors. Two empirical illustrations based on a large set of macro-financial indicators as covariates, show that BMT yields sparse, interpretable specifications with favourable out-of-sample performance.

Suggested Citation

  • George Kapetanios & Vasilis Sarafidis & Alexia Ventouri, 2026. "Model Selection in High-Dimensional Linear Regression using Boosting with Multiple Testing," Papers 2602.19705, arXiv.org.
  • Handle: RePEc:arx:papers:2602.19705
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2602.19705
    File Function: Latest version
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Michael W. McCracken & Serena Ng, 2021. "FRED-QD: A Quarterly Database for Macroeconomic Research," Review, Federal Reserve Bank of St. Louis, vol. 103(1), pages 1-44, January.
    2. James H. Stock & Mark W. Watson, 2008. "Phillips curve inflation forecasts," Conference Series ; [Proceedings], Federal Reserve Bank of Boston.
    3. Jianqing Fan & Jinchi Lv, 2008. "Sure independence screening for ultrahigh dimensional feature space," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 70(5), pages 849-911, November.
    4. Jiahua Chen & Zehua Chen, 2008. "Extended Bayesian information criteria for model selection with large model spaces," Biometrika, Biometrika Trust, vol. 95(3), pages 759-771.
    5. Yingying Fan & Cheng Yong Tang, 2013. "Tuning parameter selection in high dimensional penalized likelihood," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 75(3), pages 531-552, June.
    6. Hui Zou & Trevor Hastie, 2005. "Addendum: Regularization and variable selection via the elastic net," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(5), pages 768-768, November.
    7. Dendramis, Yiannis & Giraitis, Liudas & Kapetanios, George, 2021. "Estimation Of Time-Varying Covariance Matrices For Large Datasets," Econometric Theory, Cambridge University Press, vol. 37(6), pages 1100-1134, December.
    8. Yingying Fan & Jinchi Lv, 2013. "Asymptotic Equivalence of Regularization Methods in Thresholded Parameter Space," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 108(503), pages 1044-1061, September.
    9. Olivier Blanchard, 2016. "The Phillips Curve: Back to the '60s?," American Economic Review, American Economic Association, vol. 106(5), pages 31-34, May.
    10. Hui Zou & Trevor Hastie, 2005. "Regularization and variable selection via the elastic net," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(2), pages 301-320, April.
    11. Kapetanios, George & Blake, Andrew P., 2010. "Tests Of The Martingale Difference Hypothesis Using Boosting And Rbf Neural Network Approximations," Econometric Theory, Cambridge University Press, vol. 26(5), pages 1363-1397, October.
    12. Fan J. & Li R., 2001. "Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties," Journal of the American Statistical Association, American Statistical Association, vol. 96, pages 1348-1360, December.
    13. Andrew Atkeson & Lee E. Ohanian, 2001. "Are Phillips curves useful for forecasting inflation?," Quarterly Review, Federal Reserve Bank of Minneapolis, vol. 25(Win), pages 2-11.
    14. Antoniadis A. & Fan J., 2001. "Regularization of Wavelet Approximations," Journal of the American Statistical Association, American Statistical Association, vol. 96, pages 939-967, September.
    15. Stock, James H & Watson, Mark W, 2002. "Macroeconomic Forecasting Using Diffusion Indexes," Journal of Business & Economic Statistics, American Statistical Association, vol. 20(2), pages 147-162, April.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Shu, Lei & Hao, Yifan & Chen, Yu & Yang, Qing, 2025. "SFQRA: Scaled factor-augmented quantile regression with aggregation in conditional mean forecasting," Journal of Multivariate Analysis, Elsevier, vol. 207(C).
    2. Wang, Zhenzhong & Zhu, Zhengyuan & Yu, Cindy, 2025. "Variable Selection in Macroeconomic Forecasting with Many Predictors," Econometrics and Statistics, Elsevier, vol. 36(C), pages 19-36.
    3. Marzia Freo & Alessandra Luati, 2024. "Lasso-based variable selection methods in text regression: the case of short texts," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 108(1), pages 69-99, March.
    4. Dai, Linlin & Chen, Kani & Sun, Zhihua & Liu, Zhenqiu & Li, Gang, 2018. "Broken adaptive ridge regression and its asymptotic properties," Journal of Multivariate Analysis, Elsevier, vol. 168(C), pages 334-351.
    5. Ruggieri, Eric & Lawrence, Charles E., 2012. "On efficient calculations for Bayesian variable selection," Computational Statistics & Data Analysis, Elsevier, vol. 56(6), pages 1319-1332.
    6. Paweł Teisseyre & Robert A. Kłopotek & Jan Mielniczuk, 2016. "Random Subspace Method for high-dimensional regression with the R package regRSM," Computational Statistics, Springer, vol. 31(3), pages 943-972, September.
    7. Huiwen Wang & Ruiping Liu & Shanshan Wang & Zhichao Wang & Gilbert Saporta, 2020. "Ultra-high dimensional variable screening via Gram–Schmidt orthogonalization," Computational Statistics, Springer, vol. 35(3), pages 1153-1170, September.
    8. Yongxia Zhang & Qi Wang & Maozai Tian, 2022. "Smoothed Quantile Regression with Factor-Augmented Regularized Variable Selection for High Correlated Data," Mathematics, MDPI, vol. 10(16), pages 1-30, August.
    9. She, Yiyuan, 2012. "An iterative algorithm for fitting nonconvex penalized generalized linear models with grouped predictors," Computational Statistics & Data Analysis, Elsevier, vol. 56(10), pages 2976-2990.
    10. Chen Xu & Jiahua Chen, 2014. "The Sparse MLE for Ultrahigh-Dimensional Feature Screening," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 109(507), pages 1257-1269, September.
    11. Wei Sun & Lexin Li, 2012. "Multiple Loci Mapping via Model-free Variable Selection," Biometrics, The International Biometric Society, vol. 68(1), pages 12-22, March.
    12. Philippe Goulet Coulombe, 2022. "A Neural Phillips Curve and a Deep Output Gap," Papers 2202.04146, arXiv.org, revised Oct 2024.
    13. Zhihua Sun & Yi Liu & Kani Chen & Gang Li, 2022. "Broken adaptive ridge regression for right-censored survival data," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 74(1), pages 69-91, February.
    14. Wang, Tao & Zhu, Lixing, 2011. "Consistent tuning parameter selection in high dimensional sparse linear regression," Journal of Multivariate Analysis, Elsevier, vol. 102(7), pages 1141-1151, August.
    15. Jian Huang & Yuling Jiao & Lican Kang & Jin Liu & Yanyan Liu & Xiliang Lu, 2022. "GSDAR: a fast Newton algorithm for $$\ell _0$$ ℓ 0 regularized generalized linear models with statistical guarantee," Computational Statistics, Springer, vol. 37(1), pages 507-533, March.
    16. Tong Wu & Jiawen Hu & Zhi-Sheng Ye & Nan Chen, 2026. "Deep tobit model: an integrated framework for high-dimensional censored regression with variable selection," Lifetime Data Analysis: An International Journal Devoted to Statistical Methods and Applications for Time-to-Event Data, Springer, vol. 32(1), pages 1-28, March.
    17. Xiangyu Wang & Chenlei Leng, 2016. "High dimensional ordinary least squares projection for screening variables," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 78(3), pages 589-611, June.
    18. Mehmet Caner & Anders Bredahl Kock, 2016. "Oracle Inequalities for Convex Loss Functions with Nonlinear Targets," Econometric Reviews, Taylor & Francis Journals, vol. 35(8-10), pages 1377-1411, December.
    19. Shuichi Kawano, 2014. "Selection of tuning parameters in bridge regression models via Bayesian information criterion," Statistical Papers, Springer, vol. 55(4), pages 1207-1223, November.
    20. Joseph, Andreas & Potjagailo, Galina & Chakraborty, Chiranjit & Kapetanios, George, 2024. "Forecasting UK inflation bottom up," International Journal of Forecasting, Elsevier, vol. 40(4), pages 1521-1538.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2602.19705. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.