IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v71y2014icp725-742.html

Using random subspace method for prediction and variable importance assessment in linear regression

Author

Listed:
  • Mielniczuk, Jan
  • Teisseyre, Paweł

Abstract

A random subset method (RSM) with a new weighting scheme is proposed and investigated for linear regression with a large number of features. Weights of variables are defined as averages of squared values of pertaining t-statistics over fitted models with randomly chosen features. It is argued that such weighting is advisable as it incorporates two factors: a measure of importance of the variable within the considered model and a measure of goodness-of-fit of the model itself. Asymptotic weights assigned by such a scheme are determined as well as assumptions under which the method leads to consistent choice of significant variables in the model. Numerical experiments indicate that the proposed method behaves promisingly when its prediction errors are compared with errors of penalty-based methods such as the lasso and it has much smaller false discovery rate than the other methods considered.

Suggested Citation

  • Mielniczuk, Jan & Teisseyre, Paweł, 2014. "Using random subspace method for prediction and variable importance assessment in linear regression," Computational Statistics & Data Analysis, Elsevier, vol. 71(C), pages 725-742.
  • Handle: RePEc:eee:csdana:v:71:y:2014:i:c:p:725-742
    DOI: 10.1016/j.csda.2012.09.018
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167947312003477
    Download Restriction: Full text for ScienceDirect subscribers only.

    File URL: https://libkey.io/10.1016/j.csda.2012.09.018?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to

    for a different version of it.

    References listed on IDEAS

    as
    1. Kuhn, Max, 2008. "Building Predictive Models in R Using the caret Package," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 28(i05).
    2. Frommlet, Florian & Ruhaltinger, Felix & Twaróg, Piotr & Bogdan, Małgorzata, 2012. "Modified versions of Bayesian Information Criterion for genome-wide association studies," Computational Statistics & Data Analysis, Elsevier, vol. 56(5), pages 1038-1051.
    3. Jianqing Fan & Jinchi Lv, 2008. "Sure independence screening for ultrahigh dimensional feature space," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 70(5), pages 849-911, November.
    4. Hui Zou & Trevor Hastie, 2005. "Addendum: Regularization and variable selection via the elastic net," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(5), pages 768-768, November.
    5. Hui Zou & Trevor Hastie, 2005. "Regularization and variable selection via the elastic net," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(2), pages 301-320, April.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Łukasz Smaga & Hidetoshi Matsui, 2018. "A note on variable selection in functional regression via random subspace method," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 27(3), pages 455-477, August.
    2. Aneiros, Germán & Novo, Silvia & Vieu, Philippe, 2022. "Variable selection in functional regression models: A review," Journal of Multivariate Analysis, Elsevier, vol. 188(C).
    3. Thulin, Måns, 2014. "A high-dimensional two-sample test for the mean using random subspaces," Computational Statistics & Data Analysis, Elsevier, vol. 74(C), pages 26-38.
    4. Paweł Teisseyre & Robert A. Kłopotek & Jan Mielniczuk, 2016. "Random Subspace Method for high-dimensional regression with the R package regRSM," Computational Statistics, Springer, vol. 31(3), pages 943-972, September.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Paweł Teisseyre & Robert A. Kłopotek & Jan Mielniczuk, 2016. "Random Subspace Method for high-dimensional regression with the R package regRSM," Computational Statistics, Springer, vol. 31(3), pages 943-972, September.
    2. Jian Huang & Yuling Jiao & Lican Kang & Jin Liu & Yanyan Liu & Xiliang Lu, 2022. "GSDAR: a fast Newton algorithm for $$\ell _0$$ ℓ 0 regularized generalized linear models with statistical guarantee," Computational Statistics, Springer, vol. 37(1), pages 507-533, March.
    3. Shuichi Kawano, 2014. "Selection of tuning parameters in bridge regression models via Bayesian information criterion," Statistical Papers, Springer, vol. 55(4), pages 1207-1223, November.
    4. Wang, Christina Dan & Chen, Zhao & Lian, Yimin & Chen, Min, 2022. "Asset selection based on high frequency Sharpe ratio," Journal of Econometrics, Elsevier, vol. 227(1), pages 168-188.
    5. Štefan Lyócsa & Petra Vašaničová & Branka Hadji Misheva & Marko Dávid Vateha, 2022. "Default or profit scoring credit systems? Evidence from European and US peer-to-peer lending markets," Financial Innovation, Springer;Southwestern University of Finance and Economics, vol. 8(1), pages 1-21, December.
    6. Peter Bühlmann & Jacopo Mandozzi, 2014. "High-dimensional variable screening and bias in subsequent inference, with an empirical comparison," Computational Statistics, Springer, vol. 29(3), pages 407-430, June.
    7. Loann David Denis Desboulets, 2018. "A Review on Variable Selection in Regression Analysis," Econometrics, MDPI, vol. 6(4), pages 1-27, November.
    8. Lee, Ji Hyung & Shi, Zhentao & Gao, Zhan, 2022. "On LASSO for predictive regression," Journal of Econometrics, Elsevier, vol. 229(2), pages 322-349.
    9. Christidis, Anthony-Alexander & Van Aelst, Stefan & Zamar, Ruben, 2025. "Multi-model subset selection," Computational Statistics & Data Analysis, Elsevier, vol. 203(C).
    10. Ian W. McKeague & Min Qian, 2015. "An Adaptive Resampling Test for Detecting the Presence of Significant Predictors," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 110(512), pages 1422-1433, December.
    11. Victor Chernozhukov & Christian Hansen & Yuan Liao, 2015. "A lava attack on the recovery of sums of dense and sparse signals," CeMMAP working papers CWP56/15, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    12. Zander S. Venter & Adam Sadilek & Charlotte Stanton & David N. Barton & Kristin Aunan & Sourangsu Chowdhury & Aaron Schneider & Stefano Maria Iacus, 2021. "Mobility in Blue-Green Spaces Does Not Predict COVID-19 Transmission: A Global Analysis," IJERPH, MDPI, vol. 18(23), pages 1-12, November.
    13. Yagli, Gokhan Mert & Yang, Dazhi & Srinivasan, Dipti, 2019. "Automatic hourly solar forecasting using machine learning models," Renewable and Sustainable Energy Reviews, Elsevier, vol. 105(C), pages 487-498.
    14. Chen, Shi & Härdle, Wolfgang Karl & López Cabrera, Brenda, 2018. "Regularization Approach for Network Modeling of German Energy Market," IRTG 1792 Discussion Papers 2018-017, Humboldt University of Berlin, International Research Training Group 1792 "High Dimensional Nonstationary Time Series".
    15. Zakariya Yahya Algamal & Muhammad Hisyam Lee, 2019. "A two-stage sparse logistic regression for optimal gene selection in high-dimensional microarray data classification," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 13(3), pages 753-771, September.
    16. Ricardo P. Masini & Marcelo C. Medeiros & Eduardo F. Mendes, 2023. "Machine learning advances for time series forecasting," Journal of Economic Surveys, Wiley Blackwell, vol. 37(1), pages 76-111, February.
    17. Marzia Freo & Alessandra Luati, 2024. "Lasso-based variable selection methods in text regression: the case of short texts," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 108(1), pages 69-99, March.
    18. Minerva Mukhopadhyay & David B. Dunson, 2020. "Targeted Random Projection for Prediction From High-Dimensional Features," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 115(532), pages 1998-2010, December.
    19. Clément Cariou & Amélie Charles & Olivier Darné, 2024. "Are national or regional surveys useful for nowcasting regional jobseekers? The case of the French region of Pays‐de‐la‐Loire," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 43(6), pages 2341-2357, September.
    20. Dai, Linlin & Chen, Kani & Sun, Zhihua & Liu, Zhenqiu & Li, Gang, 2018. "Broken adaptive ridge regression and its asymptotic properties," Journal of Multivariate Analysis, Elsevier, vol. 168(C), pages 334-351.

    More about this item

    Keywords

    ;
    ;
    ;
    ;
    ;
    ;

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:71:y:2014:i:c:p:725-742. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.