IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v10y2022i8p1283-d792189.html
   My bibliography  Save this article

Mitigating the Multicollinearity Problem and Its Machine Learning Approach: A Review

Author

Listed:
  • Jireh Yi-Le Chan

    (Faculty of Business and Finance, Universiti Tunku Abdul Rahman, Kampar 31900, Malaysia
    These authors contributed equally to this work.)

  • Steven Mun Hong Leow

    (Faculty of Business and Finance, Universiti Tunku Abdul Rahman, Kampar 31900, Malaysia
    These authors contributed equally to this work.)

  • Khean Thye Bea

    (Faculty of Business and Finance, Universiti Tunku Abdul Rahman, Kampar 31900, Malaysia)

  • Wai Khuen Cheng

    (Faculty of Information and Communication Technology, Universiti Tunku Abdul Rahman, Kampar 31900, Malaysia)

  • Seuk Wai Phoong

    (Department of Management, Faculty of Business and Economics, Universiti Malaya, Kuala Lumpur 50603, Malaysia)

  • Zeng-Wei Hong

    (Department of Information Engineering and Computer Science, Feng Chia University, Taichung 407102, Taiwan)

  • Yen-Lin Chen

    (Department of Computer Science and Information Engineering, National Taipei University of Technology, Taipei 106344, Taiwan)

Abstract

Technologies have driven big data collection across many fields, such as genomics and business intelligence. This results in a significant increase in variables and data points (observations) collected and stored. Although this presents opportunities to better model the relationship between predictors and the response variables, this also causes serious problems during data analysis, one of which is the multicollinearity problem. The two main approaches used to mitigate multicollinearity are variable selection methods and modified estimator methods. However, variable selection methods may negate efforts to collect more data as new data may eventually be dropped from modeling, while recent studies suggest that optimization approaches via machine learning handle data with multicollinearity better than statistical estimators. Therefore, this study details the chronological developments to mitigate the effects of multicollinearity and up-to-date recommendations to better mitigate multicollinearity.

Suggested Citation

  • Jireh Yi-Le Chan & Steven Mun Hong Leow & Khean Thye Bea & Wai Khuen Cheng & Seuk Wai Phoong & Zeng-Wei Hong & Yen-Lin Chen, 2022. "Mitigating the Multicollinearity Problem and Its Machine Learning Approach: A Review," Mathematics, MDPI, vol. 10(8), pages 1-17, April.
  • Handle: RePEc:gam:jmathe:v:10:y:2022:i:8:p:1283-:d:792189
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/10/8/1283/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/10/8/1283/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. H. C. Hamaker, 1962. "On multiple regression analysis," Statistica Neerlandica, Netherlands Society for Statistics and Operations Research, vol. 16(1), pages 31-56, March.
    2. C.K. Chandrasekhar & H. Bagyalakshmi & M.R. Srinivasan & M. Gallo, 2016. "Partial ridge regression under multicollinearity," Journal of Applied Statistics, Taylor & Francis Journals, vol. 43(13), pages 2462-2473, October.
    3. Van Cuong Nguyen & Chi Tim Ng, 2020. "Variable selection under multicollinearity using modified log penalty," Journal of Applied Statistics, Taylor & Francis Journals, vol. 47(2), pages 201-230, January.
    4. Ryuta Tamura & Ken Kobayashi & Yuichi Takano & Ryuhei Miyashiro & Kazuhide Nakata & Tomomi Matsui, 2019. "Mixed integer quadratic optimization formulations for eliminating multicollinearity based on variance inflation factor," Journal of Global Optimization, Springer, vol. 73(2), pages 431-446, February.
    5. Taewook Kim & Ha Young Kim, 2019. "Forecasting stock prices with a feature fusion LSTM-CNN model using different representations of the same data," PLOS ONE, Public Library of Science, vol. 14(2), pages 1-23, February.
    6. Jianqing Fan & Jinchi Lv, 2008. "Sure independence screening for ultrahigh dimensional feature space," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 70(5), pages 849-911, November.
    7. Hui Zou & Trevor Hastie, 2005. "Addendum: Regularization and variable selection via the elastic net," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(5), pages 768-768, November.
    8. Raehyun Kim & Chan Ho So & Minbyul Jeong & Sanghoon Lee & Jinkyu Kim & Jaewoo Kang, 2019. "HATS: A Hierarchical Graph Attention Network for Stock Movement Prediction," Papers 1908.07999, arXiv.org, revised Nov 2019.
    9. Hui Zou & Trevor Hastie, 2005. "Regularization and variable selection via the elastic net," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(2), pages 301-320, April.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Hoxha, Julian & Çodur, Muhammed Yasin & Mustafaraj, Enea & Kanj, Hassan & El Masri, Ali, 2023. "Prediction of transportation energy demand in Türkiye using stacking ensemble models: Methodology and comparative analysis," Applied Energy, Elsevier, vol. 350(C).
    2. Tran Ngoc Mai, 2023. "Renewable Energy, GDP (Gross Domestic Product), FDI (Foreign Direct Investment) and CO2 Emissions in Southeast Asia Countries," International Journal of Energy Economics and Policy, Econjournals, vol. 13(2), pages 284-289, March.
    3. Nagwan Abdel Samee & Ghada Atteia & Souham Meshoul & Mugahed A. Al-antari & Yasser M. Kadah, 2022. "Deep Learning Cascaded Feature Selection Framework for Breast Cancer Classification: Hybrid CNN with Univariate-Based Approach," Mathematics, MDPI, vol. 10(19), pages 1-27, October.
    4. Wai Khuen Cheng & Khean Thye Bea & Steven Mun Hong Leow & Jireh Yi-Le Chan & Zeng-Wei Hong & Yen-Lin Chen, 2022. "A Review of Sentiment, Semantic and Event-Extraction-Based Approaches in Stock Forecasting," Mathematics, MDPI, vol. 10(14), pages 1-20, July.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Shuichi Kawano, 2014. "Selection of tuning parameters in bridge regression models via Bayesian information criterion," Statistical Papers, Springer, vol. 55(4), pages 1207-1223, November.
    2. Wang, Christina Dan & Chen, Zhao & Lian, Yimin & Chen, Min, 2022. "Asset selection based on high frequency Sharpe ratio," Journal of Econometrics, Elsevier, vol. 227(1), pages 168-188.
    3. Borup, Daniel & Christensen, Bent Jesper & Mühlbach, Nicolaj Søndergaard & Nielsen, Mikkel Slot, 2023. "Targeting predictors in random forest regression," International Journal of Forecasting, Elsevier, vol. 39(2), pages 841-868.
    4. Caroline Jardet & Baptiste Meunier, 2022. "Nowcasting world GDP growth with high‐frequency data," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 41(6), pages 1181-1200, September.
    5. Peter Bühlmann & Jacopo Mandozzi, 2014. "High-dimensional variable screening and bias in subsequent inference, with an empirical comparison," Computational Statistics, Springer, vol. 29(3), pages 407-430, June.
    6. Loann David Denis Desboulets, 2018. "A Review on Variable Selection in Regression Analysis," Econometrics, MDPI, vol. 6(4), pages 1-27, November.
    7. Lee, Ji Hyung & Shi, Zhentao & Gao, Zhan, 2022. "On LASSO for predictive regression," Journal of Econometrics, Elsevier, vol. 229(2), pages 322-349.
    8. Ian W. McKeague & Min Qian, 2015. "An Adaptive Resampling Test for Detecting the Presence of Significant Predictors," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 110(512), pages 1422-1433, December.
    9. Victor Chernozhukov & Christian Hansen & Yuan Liao, 2015. "A lava attack on the recovery of sums of dense and sparse signals," CeMMAP working papers CWP56/15, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    10. Jingxuan Luo & Lili Yue & Gaorong Li, 2023. "Overview of High-Dimensional Measurement Error Regression Models," Mathematics, MDPI, vol. 11(14), pages 1-22, July.
    11. Zeng, Yaohui & Yang, Tianbao & Breheny, Patrick, 2021. "Hybrid safe–strong rules for efficient optimization in lasso-type problems," Computational Statistics & Data Analysis, Elsevier, vol. 153(C).
    12. Tan, Xin Lu, 2019. "Optimal estimation of slope vector in high-dimensional linear transformation models," Journal of Multivariate Analysis, Elsevier, vol. 169(C), pages 179-204.
    13. Hojin Yang & Hongtu Zhu & Joseph G. Ibrahim, 2018. "MILFM: Multiple index latent factor model based on high‐dimensional features," Biometrics, The International Biometric Society, vol. 74(3), pages 834-844, September.
    14. Gambella, Claudio & Ghaddar, Bissan & Naoum-Sawaya, Joe, 2021. "Optimization problems for machine learning: A survey," European Journal of Operational Research, Elsevier, vol. 290(3), pages 807-828.
    15. Kimia Keshanian & Daniel Zantedeschi & Kaushik Dutta, 2022. "Features Selection as a Nash-Bargaining Solution: Applications in Online Advertising and Information Systems," INFORMS Journal on Computing, INFORMS, vol. 34(5), pages 2485-2501, September.
    16. Lai, Peng & Song, Fengli & Chen, Kaiwen & Liu, Zhi, 2017. "Model free feature screening with dependent variable in ultrahigh dimensional binary classification," Statistics & Probability Letters, Elsevier, vol. 125(C), pages 141-148.
    17. Ricardo P. Masini & Marcelo C. Medeiros & Eduardo F. Mendes, 2023. "Machine learning advances for time series forecasting," Journal of Economic Surveys, Wiley Blackwell, vol. 37(1), pages 76-111, February.
    18. Chen, Shi & Härdle, Wolfgang Karl & López Cabrera, Brenda, 2018. "Regularization Approach for Network Modeling of German Energy Market," IRTG 1792 Discussion Papers 2018-017, Humboldt University of Berlin, International Research Training Group 1792 "High Dimensional Nonstationary Time Series".
    19. Zakariya Yahya Algamal & Muhammad Hisyam Lee, 2019. "A two-stage sparse logistic regression for optimal gene selection in high-dimensional microarray data classification," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 13(3), pages 753-771, September.
    20. Soyeon Kim & Veerabhadran Baladandayuthapani & J. Jack Lee, 2017. "Prediction-Oriented Marker Selection (PROMISE): With Application to High-Dimensional Regression," Statistics in Biosciences, Springer;International Chinese Statistical Association, vol. 9(1), pages 217-245, June.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:10:y:2022:i:8:p:1283-:d:792189. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.