IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v71y2014icp725-742.html
   My bibliography  Save this article

Using random subspace method for prediction and variable importance assessment in linear regression

Author

Listed:
  • Mielniczuk, Jan
  • Teisseyre, Paweł

Abstract

A random subset method (RSM) with a new weighting scheme is proposed and investigated for linear regression with a large number of features. Weights of variables are defined as averages of squared values of pertaining t-statistics over fitted models with randomly chosen features. It is argued that such weighting is advisable as it incorporates two factors: a measure of importance of the variable within the considered model and a measure of goodness-of-fit of the model itself. Asymptotic weights assigned by such a scheme are determined as well as assumptions under which the method leads to consistent choice of significant variables in the model. Numerical experiments indicate that the proposed method behaves promisingly when its prediction errors are compared with errors of penalty-based methods such as the lasso and it has much smaller false discovery rate than the other methods considered.

Suggested Citation

  • Mielniczuk, Jan & Teisseyre, Paweł, 2014. "Using random subspace method for prediction and variable importance assessment in linear regression," Computational Statistics & Data Analysis, Elsevier, vol. 71(C), pages 725-742.
  • Handle: RePEc:eee:csdana:v:71:y:2014:i:c:p:725-742
    DOI: 10.1016/j.csda.2012.09.018
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167947312003477
    Download Restriction: Full text for ScienceDirect subscribers only.

    File URL: https://libkey.io/10.1016/j.csda.2012.09.018?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Kuhn, Max, 2008. "Building Predictive Models in R Using the caret Package," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 28(i05).
    2. Frommlet, Florian & Ruhaltinger, Felix & Twaróg, Piotr & Bogdan, Małgorzata, 2012. "Modified versions of Bayesian Information Criterion for genome-wide association studies," Computational Statistics & Data Analysis, Elsevier, vol. 56(5), pages 1038-1051.
    3. Hui Zou & Trevor Hastie, 2005. "Addendum: Regularization and variable selection via the elastic net," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(5), pages 768-768, November.
    4. Hui Zou & Trevor Hastie, 2005. "Regularization and variable selection via the elastic net," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(2), pages 301-320, April.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Aneiros, Germán & Novo, Silvia & Vieu, Philippe, 2022. "Variable selection in functional regression models: A review," Journal of Multivariate Analysis, Elsevier, vol. 188(C).
    2. Łukasz Smaga & Hidetoshi Matsui, 2018. "A note on variable selection in functional regression via random subspace method," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 27(3), pages 455-477, August.
    3. Paweł Teisseyre & Robert A. Kłopotek & Jan Mielniczuk, 2016. "Random Subspace Method for high-dimensional regression with the R package regRSM," Computational Statistics, Springer, vol. 31(3), pages 943-972, September.
    4. Thulin, Måns, 2014. "A high-dimensional two-sample test for the mean using random subspaces," Computational Statistics & Data Analysis, Elsevier, vol. 74(C), pages 26-38.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Štefan Lyócsa & Petra Vašaničová & Branka Hadji Misheva & Marko Dávid Vateha, 2022. "Default or profit scoring credit systems? Evidence from European and US peer-to-peer lending markets," Financial Innovation, Springer;Southwestern University of Finance and Economics, vol. 8(1), pages 1-21, December.
    2. Zander S. Venter & Adam Sadilek & Charlotte Stanton & David N. Barton & Kristin Aunan & Sourangsu Chowdhury & Aaron Schneider & Stefano Maria Iacus, 2021. "Mobility in Blue-Green Spaces Does Not Predict COVID-19 Transmission: A Global Analysis," IJERPH, MDPI, vol. 18(23), pages 1-12, November.
    3. Yagli, Gokhan Mert & Yang, Dazhi & Srinivasan, Dipti, 2019. "Automatic hourly solar forecasting using machine learning models," Renewable and Sustainable Energy Reviews, Elsevier, vol. 105(C), pages 487-498.
    4. Paweł Teisseyre & Robert A. Kłopotek & Jan Mielniczuk, 2016. "Random Subspace Method for high-dimensional regression with the R package regRSM," Computational Statistics, Springer, vol. 31(3), pages 943-972, September.
    5. Satre-Meloy, Aven & Diakonova, Marina & Grünewald, Philipp, 2020. "Cluster analysis and prediction of residential peak demand profiles using occupant activity data," Applied Energy, Elsevier, vol. 260(C).
    6. Merlijn Breugel & Cancan Qi & Zhongli Xu & Casper-Emil T. Pedersen & Ilya Petoukhov & Judith M. Vonk & Ulrike Gehring & Marijn Berg & Marnix Bügel & Orestes A. Carpaij & Erick Forno & Andréanne Morin , 2022. "Nasal DNA methylation at three CpG sites predicts childhood allergic disease," Nature Communications, Nature, vol. 13(1), pages 1-13, December.
    7. Anurag Satpathi & Parul Setiya & Bappa Das & Ajeet Singh Nain & Prakash Kumar Jha & Surendra Singh & Shikha Singh, 2023. "Comparative Analysis of Statistical and Machine Learning Techniques for Rice Yield Forecasting for Chhattisgarh, India," Sustainability, MDPI, vol. 15(3), pages 1-18, February.
    8. Jian Huang & Yuling Jiao & Lican Kang & Jin Liu & Yanyan Liu & Xiliang Lu, 2022. "GSDAR: a fast Newton algorithm for $$\ell _0$$ ℓ 0 regularized generalized linear models with statistical guarantee," Computational Statistics, Springer, vol. 37(1), pages 507-533, March.
    9. Vera Wendler-Bosco & Charles Nicholson, 2022. "Modeling the economic impact of incoming tropical cyclones using machine learning," Natural Hazards: Journal of the International Society for the Prevention and Mitigation of Natural Hazards, Springer;International Society for the Prevention and Mitigation of Natural Hazards, vol. 110(1), pages 487-518, January.
    10. A. Jiran Meitei & Akanksha Saini & Bibhuti Bhusan Mohapatra & Kh. Jitenkumar Singh, 2022. "Predicting child anaemia in the North-Eastern states of India: a machine learning approach," International Journal of System Assurance Engineering and Management, Springer;The Society for Reliability, Engineering Quality and Operations Management (SREQOM),India, and Division of Operation and Maintenance, Lulea University of Technology, Sweden, vol. 13(6), pages 2949-2962, December.
    11. Schroeders, Ulrich & Watrin, Luc & Wilhelm, Oliver, 2021. "Age-related nuances in knowledge assessment," Intelligence, Elsevier, vol. 85(C).
    12. Tutz, Gerhard & Pößnecker, Wolfgang & Uhlmann, Lorenz, 2015. "Variable selection in general multinomial logit models," Computational Statistics & Data Analysis, Elsevier, vol. 82(C), pages 207-222.
    13. Oxana Babecka Kucharcukova & Jan Bruha, 2016. "Nowcasting the Czech Trade Balance," Working Papers 2016/11, Czech National Bank.
    14. Carstensen, Kai & Heinrich, Markus & Reif, Magnus & Wolters, Maik H., 2020. "Predicting ordinary and severe recessions with a three-state Markov-switching dynamic factor model," International Journal of Forecasting, Elsevier, vol. 36(3), pages 829-850.
    15. Hou-Tai Chang & Ping-Huai Wang & Wei-Fang Chen & Chen-Ju Lin, 2022. "Risk Assessment of Early Lung Cancer with LDCT and Health Examinations," IJERPH, MDPI, vol. 19(8), pages 1-12, April.
    16. Margherita Giuzio, 2017. "Genetic algorithm versus classical methods in sparse index tracking," Decisions in Economics and Finance, Springer;Associazione per la Matematica, vol. 40(1), pages 243-256, November.
    17. Nicolaj N. Mühlbach, 2020. "Tree-based Synthetic Control Methods: Consequences of moving the US Embassy," CREATES Research Papers 2020-04, Department of Economics and Business Economics, Aarhus University.
    18. Wang, Qiao & Zhou, Wei & Cheng, Yonggang & Ma, Gang & Chang, Xiaolin & Miao, Yu & Chen, E, 2018. "Regularized moving least-square method and regularized improved interpolating moving least-square method with nonsingular moment matrices," Applied Mathematics and Computation, Elsevier, vol. 325(C), pages 120-145.
    19. Dmitriy Drusvyatskiy & Adrian S. Lewis, 2018. "Error Bounds, Quadratic Growth, and Linear Convergence of Proximal Methods," Mathematics of Operations Research, INFORMS, vol. 43(3), pages 919-948, August.
    20. Mkhadri, Abdallah & Ouhourane, Mohamed, 2013. "An extended variable inclusion and shrinkage algorithm for correlated variables," Computational Statistics & Data Analysis, Elsevier, vol. 57(1), pages 631-644.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:71:y:2014:i:c:p:725-742. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.