IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0296748.html
   My bibliography  Save this article

A penalized variable selection ensemble algorithm for high-dimensional group-structured data

Author

Listed:
  • Dongsheng Li
  • Chunyan Pan
  • Jing Zhao
  • Anfei Luo

Abstract

This paper presents a multi-algorithm fusion model (StackingGroup) based on the Stacking ensemble learning framework to address the variable selection problem in high-dimensional group structure data. The proposed algorithm takes into account the differences in data observation and training principles of different algorithms. It leverages the strengths of each model and incorporates Stacking ensemble learning with multiple group structure regularization methods. The main approach involves dividing the data set into K parts on average, using more than 10 algorithms as basic learning models, and selecting the base learner based on low correlation, strong prediction ability, and small model error. Finally, we selected the grSubset + grLasso, grLasso, and grSCAD algorithms as the base learners for the Stacking algorithm. The Lasso algorithm was used as the meta-learner to create a comprehensive algorithm called StackingGroup. This algorithm is designed to handle high-dimensional group structure data. Simulation experiments showed that the proposed method outperformed other R2, RMSE, and MAE prediction methods. Lastly, we applied the proposed algorithm to investigate the risk factors of low birth weight in infants and young children. The final results demonstrate that the proposed method achieves a mean absolute error (MAE) of 0.508 and a root mean square error (RMSE) of 0.668. The obtained values are smaller compared to those obtained from a single model, indicating that the proposed method surpasses other algorithms in terms of prediction accuracy.

Suggested Citation

  • Dongsheng Li & Chunyan Pan & Jing Zhao & Anfei Luo, 2024. "A penalized variable selection ensemble algorithm for high-dimensional group-structured data," PLOS ONE, Public Library of Science, vol. 19(2), pages 1-19, February.
  • Handle: RePEc:plo:pone00:0296748
    DOI: 10.1371/journal.pone.0296748
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0296748
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0296748&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0296748?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Morteza Mashayekhi & Robin Gras, 2017. "Rule Extraction from Decision Trees Ensembles: New Algorithms Based on Heuristic Search and Sparse Group Lasso Methods," International Journal of Information Technology & Decision Making (IJITDM), World Scientific Publishing Co. Pte. Ltd., vol. 16(06), pages 1707-1727, November.
    2. Wang, Hansheng & Leng, Chenlei, 2008. "A note on adaptive group lasso," Computational Statistics & Data Analysis, Elsevier, vol. 52(12), pages 5277-5286, August.
    3. Zou, Hui, 2006. "The Adaptive Lasso and Its Oracle Properties," Journal of the American Statistical Association, American Statistical Association, vol. 101, pages 1418-1429, December.
    4. Guo, Xiao & Zhang, Hai & Wang, Yao & Wu, Jiang-Lun, 2015. "Model selection and estimation in high dimensional regression models with group SCAD," Statistics & Probability Letters, Elsevier, vol. 103(C), pages 86-92.
    5. Hui Zou & Trevor Hastie, 2005. "Addendum: Regularization and variable selection via the elastic net," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(5), pages 768-768, November.
    6. Hui Zou & Trevor Hastie, 2005. "Regularization and variable selection via the elastic net," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(2), pages 301-320, April.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Tutz, Gerhard & Pößnecker, Wolfgang & Uhlmann, Lorenz, 2015. "Variable selection in general multinomial logit models," Computational Statistics & Data Analysis, Elsevier, vol. 82(C), pages 207-222.
    2. Yang, Yanlin & Hu, Xuemei & Jiang, Huifeng, 2022. "Group penalized logistic regressions predict up and down trends for stock prices," The North American Journal of Economics and Finance, Elsevier, vol. 59(C).
    3. Zeng, Qing & Lu, Xinjie & Xu, Jin & Lin, Yu, 2024. "Macro-Driven Stock Market Volatility Prediction: Insights from a New Hybrid Machine Learning Approach," International Review of Financial Analysis, Elsevier, vol. 96(PB).
    4. Ricardo P. Masini & Marcelo C. Medeiros & Eduardo F. Mendes, 2023. "Machine learning advances for time series forecasting," Journal of Economic Surveys, Wiley Blackwell, vol. 37(1), pages 76-111, February.
    5. Mogliani, Matteo & Simoni, Anna, 2021. "Bayesian MIDAS penalized regressions: Estimation, selection, and prediction," Journal of Econometrics, Elsevier, vol. 222(1), pages 833-860.
    6. Mike K. P. So & Wing Ki Liu & Amanda M. Y. Chu, 2018. "Bayesian Shrinkage Estimation Of Time-Varying Covariance Matrices In Financial Time Series," Advances in Decision Sciences, Asia University, Taiwan, vol. 22(1), pages 369-404, December.
    7. Kaida Cai & Xuewen Lu & Hua Shen, 2025. "Group sparse sufficient dimension reduction: a model-free group variable selection method," Computational Statistics, Springer, vol. 40(5), pages 2323-2366, June.
    8. Kaida Cai & Hua Shen & Xuewen Lu, 2022. "Adaptive bi-level variable selection for multivariate failure time model with a diverging number of covariates," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 31(4), pages 968-993, December.
    9. Zhong, Yan & Sang, Huiyan & Cook, Scott J. & Kellstedt, Paul M., 2023. "Sparse spatially clustered coefficient model via adaptive regularization," Computational Statistics & Data Analysis, Elsevier, vol. 177(C).
    10. Devriendt, Sander & Antonio, Katrien & Reynkens, Tom & Verbelen, Roel, 2021. "Sparse regression with Multi-type Regularized Feature modeling," Insurance: Mathematics and Economics, Elsevier, vol. 96(C), pages 248-261.
    11. Justin B. Post & Howard D. Bondell, 2013. "Factor Selection and Structural Identification in the Interaction ANOVA Model," Biometrics, The International Biometric Society, vol. 69(1), pages 70-79, March.
    12. Jonathan Boss & Alexander Rix & Yin‐Hsiu Chen & Naveen N. Narisetty & Zhenke Wu & Kelly K. Ferguson & Thomas F. McElrath & John D. Meeker & Bhramar Mukherjee, 2021. "A hierarchical integrative group least absolute shrinkage and selection operator for analyzing environmental mixtures," Environmetrics, John Wiley & Sons, Ltd., vol. 32(8), December.
    13. Caiya Zhang & Yanbiao Xiang, 2016. "On the oracle property of adaptive group Lasso in high-dimensional linear models," Statistical Papers, Springer, vol. 57(1), pages 249-265, March.
    14. Xianyi Wu & Xian Zhou, 2019. "On Hodges’ superefficiency and merits of oracle property in model selection," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 71(5), pages 1093-1119, October.
    15. Diego Vidaurre & Concha Bielza & Pedro Larrañaga, 2013. "A Survey of L1 Regression," International Statistical Review, International Statistical Institute, vol. 81(3), pages 361-387, December.
    16. Kristoffer Pons Bertelsen, 2022. "The Prior Adaptive Group Lasso and the Factor Zoo," CREATES Research Papers 2022-05, Department of Economics and Business Economics, Aarhus University.
    17. Zhixuan Fu & Chirag R. Parikh & Bingqing Zhou, 2017. "Penalized variable selection in competing risks regression," Lifetime Data Analysis: An International Journal Devoted to Statistical Methods and Applications for Time-to-Event Data, Springer, vol. 23(3), pages 353-376, July.
    18. Mkhadri, Abdallah & Ouhourane, Mohamed, 2013. "An extended variable inclusion and shrinkage algorithm for correlated variables," Computational Statistics & Data Analysis, Elsevier, vol. 57(1), pages 631-644.
    19. Chuliá, Helena & Garrón, Ignacio & Uribe, Jorge M., 2024. "Daily growth at risk: Financial or real drivers? The answer is not always the same," International Journal of Forecasting, Elsevier, vol. 40(2), pages 762-776.
    20. Christopher J Greenwood & George J Youssef & Primrose Letcher & Jacqui A Macdonald & Lauryn J Hagg & Ann Sanson & Jenn Mcintosh & Delyse M Hutchinson & John W Toumbourou & Matthew Fuller-Tyszkiewicz &, 2020. "A comparison of penalised regression methods for informing the selection of predictive markers," PLOS ONE, Public Library of Science, vol. 15(11), pages 1-14, November.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0296748. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.