IDEAS home Printed from https://ideas.repec.org/a/eee/ejores/v247y2015i3p721-731.html
   My bibliography  Save this article

Mixed integer second-order cone programming formulations for variable selection in linear regression

Author

Listed:
  • Miyashiro, Ryuhei
  • Takano, Yuichi

Abstract

This study concerns a method of selecting the best subset of explanatory variables in a multiple linear regression model. Goodness-of-fit measures, for example, adjusted R2, AIC, and BIC, are generally used to evaluate a subset regression model. Although variable selection with regard to these measures is usually performed with a stepwise regression method, it does not always provide the best subset of explanatory variables. In this paper, we propose mixed integer second-order cone programming formulations for selecting the best subset of variables with respect to adjusted R2, AIC, and BIC. Computational experiments show that, in terms of these measures, the proposed formulations yield better solutions than those provided by common stepwise regression methods.

Suggested Citation

  • Miyashiro, Ryuhei & Takano, Yuichi, 2015. "Mixed integer second-order cone programming formulations for variable selection in linear regression," European Journal of Operational Research, Elsevier, vol. 247(3), pages 721-731.
  • Handle: RePEc:eee:ejores:v:247:y:2015:i:3:p:721-731
    DOI: 10.1016/j.ejor.2015.06.081
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0377221715006359
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.ejor.2015.06.081?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Trafalis, Theodore B. & Gilbert, Robin C., 2006. "Robust classification and regression using support vector machines," European Journal of Operational Research, Elsevier, vol. 173(3), pages 893-909, September.
    2. Dimitris Bertsimas & Romy Shioda, 2009. "Algorithm for cardinality-constrained quadratic optimization," Computational Optimization and Applications, Springer, vol. 43(1), pages 1-22, May.
    3. Hofmann, Marc & Gatu, Cristian & Kontoghiorghes, Erricos John, 2007. "Efficient algorithms for computing the best subset regression models for large-scale problems," Computational Statistics & Data Analysis, Elsevier, vol. 52(1), pages 16-29, September.
    4. Peide Shi & Chih‐Ling Tsai, 2002. "Regression model selection—a residual likelihood approach," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 64(2), pages 237-252, May.
    5. Ming Yuan & Ali Ekici & Zhaosong Lu & Renato Monteiro, 2007. "Dimension reduction and coefficient estimation in multivariate linear regression," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 69(3), pages 329-346, June.
    6. Hiroshi Konno & Yoshihiro Takaya, 2010. "Multi-step methods for choosing the best set of variables in regression analysis," Computational Optimization and Applications, Springer, vol. 46(3), pages 417-426, July.
    7. Meiri, Ronen & Zahavi, Jacob, 2006. "Using simulated annealing to optimize the feature selection problem in marketing applications," European Journal of Operational Research, Elsevier, vol. 171(3), pages 842-858, June.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Kimia Keshanian & Daniel Zantedeschi & Kaushik Dutta, 2022. "Features Selection as a Nash-Bargaining Solution: Applications in Online Advertising and Information Systems," INFORMS Journal on Computing, INFORMS, vol. 34(5), pages 2485-2501, September.
    2. Li, Libo, 2018. "Predicting online invitation responses with a competing risk model using privacy-friendly social event data," European Journal of Operational Research, Elsevier, vol. 270(2), pages 698-708.
    3. George Drogalas & Konstantinos Petridis & Nikolaos E. Petridis & Eleni Zografidou, 2020. "Valuation of the internal audit mechanisms in the decision support department of the local government organizations using mathematical programming," Annals of Operations Research, Springer, vol. 294(1), pages 267-280, November.
    4. Young Woong Park & Diego Klabjan, 2020. "Subset selection for multiple linear regression via optimization," Journal of Global Optimization, Springer, vol. 77(3), pages 543-574, July.
    5. Ryuta Tamura & Ken Kobayashi & Yuichi Takano & Ryuhei Miyashiro & Kazuhide Nakata & Tomomi Matsui, 2019. "Mixed integer quadratic optimization formulations for eliminating multicollinearity based on variance inflation factor," Journal of Global Optimization, Springer, vol. 73(2), pages 431-446, February.
    6. Tao Xu & He Meng & Jie Zhu & Wei Wei & He Zhao & Han Yang & Zijin Li & Yuhan Wu, 2021. "Optimal Capacity Allocation of Energy Storage in Distribution Networks Considering Active/Reactive Coordination," Energies, MDPI, vol. 14(6), pages 1-24, March.
    7. Ben-Ameur, Walid & Neto, José, 2022. "New bounds for subset selection from conic relaxations," European Journal of Operational Research, Elsevier, vol. 298(2), pages 425-438.
    8. Noriyoshi Sukegawa & Shohei Suzuki & Yoshiko Ikebe & Yoshito Hirata, 2024. "On Computing Medians of Marked Point Process Data Under Edit Distance," Journal of Optimization Theory and Applications, Springer, vol. 200(1), pages 178-193, January.
    9. Matteo Lapucci & Tommaso Levato & Marco Sciandrone, 2021. "Convergent Inexact Penalty Decomposition Methods for Cardinality-Constrained Problems," Journal of Optimization Theory and Applications, Springer, vol. 188(2), pages 473-496, February.
    10. Zhouchun Huang & Qipeng Phil Zheng & Eduardo Pasiliao & Vladimir Boginski & Tao Zhang, 2019. "A cutting plane method for risk-constrained traveling salesman problem with random arc costs," Journal of Global Optimization, Springer, vol. 74(4), pages 839-859, August.
    11. Toshiki Sato & Yuichi Takano & Ryuhei Miyashiro & Akiko Yoshise, 2016. "Feature subset selection for logistic regression via mixed integer optimization," Computational Optimization and Applications, Springer, vol. 64(3), pages 865-880, July.
    12. Leonardo Di Gangi & M. Lapucci & F. Schoen & A. Sortino, 2019. "An efficient optimization approach for best subset selection in linear regression, with application to model selection and fitting in autoregressive time-series," Computational Optimization and Applications, Springer, vol. 74(3), pages 919-948, December.
    13. Amir Ahmadi-Javid & Pooya Hoseinpour, 2022. "Convexification of Queueing Formulas by Mixed-Integer Second-Order Cone Programming: An Application to a Discrete Location Problem with Congestion," INFORMS Journal on Computing, INFORMS, vol. 34(5), pages 2621-2633, September.
    14. Gambella, Claudio & Ghaddar, Bissan & Naoum-Sawaya, Joe, 2021. "Optimization problems for machine learning: A survey," European Journal of Operational Research, Elsevier, vol. 290(3), pages 807-828.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Toshiki Sato & Yuichi Takano & Ryuhei Miyashiro & Akiko Yoshise, 2016. "Feature subset selection for logistic regression via mixed integer optimization," Computational Optimization and Applications, Springer, vol. 64(3), pages 865-880, July.
    2. Meisel, Stephan & Mattfeld, Dirk, 2010. "Synergies of Operations Research and Data Mining," European Journal of Operational Research, Elsevier, vol. 206(1), pages 1-10, October.
    3. Lili Pan & Ziyan Luo & Naihua Xiu, 2017. "Restricted Robinson Constraint Qualification and Optimality for Cardinality-Constrained Cone Programming," Journal of Optimization Theory and Applications, Springer, vol. 175(1), pages 104-118, October.
    4. Schlereth, Christian & Stepanchuk, Tanja & Skiera, Bernd, 2010. "Optimization and analysis of the profitability of tariff structures with two-part tariffs," European Journal of Operational Research, Elsevier, vol. 206(3), pages 691-701, November.
    5. Siniksaran, Enis, 2008. "A geometric interpretation of Mallows' Cp statistic and an alternative plot in variable selection," Computational Statistics & Data Analysis, Elsevier, vol. 52(7), pages 3459-3467, March.
    6. Casado Yusta, Silvia & Nœ–ez Letamendía, Laura & Pacheco Bonrostro, Joaqu’n Antonio, 2018. "Predicting Corporate Failure: The GRASP-LOGIT Model || Predicci—n de la quiebra empresarial: el modelo GRASP-LOGIT," Revista de Métodos Cuantitativos para la Economía y la Empresa = Journal of Quantitative Methods for Economics and Business Administration, Universidad Pablo de Olavide, Department of Quantitative Methods for Economics and Business Administration, vol. 26(1), pages 294-314, Diciembre.
    7. Francesco Cesarone & Andrea Scozzari & Fabio Tardella, 2015. "Linear vs. quadratic portfolio selection models with hard real-world constraints," Computational Management Science, Springer, vol. 12(3), pages 345-370, July.
    8. Ximing Wang & Neng Fan & Panos M. Pardalos, 2018. "Robust chance-constrained support vector machines with second-order moment information," Annals of Operations Research, Springer, vol. 263(1), pages 45-68, April.
    9. Matthias Bogaert & Lex Delaere, 2023. "Ensemble Methods in Customer Churn Prediction: A Comparative Analysis of the State-of-the-Art," Mathematics, MDPI, vol. 11(5), pages 1-28, February.
    10. Ricardo M. Lima & Ignacio E. Grossmann, 2017. "On the solution of nonconvex cardinality Boolean quadratic programming problems: a computational study," Computational Optimization and Applications, Springer, vol. 66(1), pages 1-37, January.
    11. Luo, Chongliang & Liang, Jian & Li, Gen & Wang, Fei & Zhang, Changshui & Dey, Dipak K. & Chen, Kun, 2018. "Leveraging mixed and incomplete outcomes via reduced-rank modeling," Journal of Multivariate Analysis, Elsevier, vol. 167(C), pages 378-394.
    12. Andrés Gómez & Oleg A. Prokopyev, 2021. "A Mixed-Integer Fractional Optimization Approach to Best Subset Selection," INFORMS Journal on Computing, INFORMS, vol. 33(2), pages 551-565, May.
    13. Dimitris Bertsimas & Ryan Cory-Wright, 2022. "A Scalable Algorithm for Sparse Portfolio Selection," INFORMS Journal on Computing, INFORMS, vol. 34(3), pages 1489-1511, May.
    14. Pacheco, Joaquín & Casado, Silvia & Porras, Santiago, 2013. "Exact methods for variable selection in principal component analysis: Guide functions and pre-selection," Computational Statistics & Data Analysis, Elsevier, vol. 57(1), pages 95-111.
    15. Zhi-Long Dong & Fengmin Xu & Yu-Hong Dai, 2020. "Fast algorithms for sparse portfolio selection considering industries and investment styles," Journal of Global Optimization, Springer, vol. 78(4), pages 763-789, December.
    16. A. Garcia-Bernabeu & J. V. Salcedo & A. Hilario & D. Pla-Santamaria & Juan M. Herrero, 2019. "Computing the Mean-Variance-Sustainability Nondominated Surface by ev-MOGA," Complexity, Hindawi, vol. 2019, pages 1-12, December.
    17. Postiglione, Paolo & Benedetti, Roberto & Lafratta, Giovanni, 2010. "A regression tree algorithm for the identification of convergence clubs," Computational Statistics & Data Analysis, Elsevier, vol. 54(11), pages 2776-2785, November.
    18. Peter C.B. Phillips & Ye Chen, "undated". "Restricted Likelihood Ratio Tests in Predictive Regression," Cowles Foundation Discussion Papers 1968, Cowles Foundation for Research in Economics, Yale University.
    19. Ricardo M. Lima & Antonio J. Conejo & Loïc Giraldi & Olivier Le Maître & Ibrahim Hoteit & Omar M. Knio, 2022. "Risk-Averse Stochastic Programming vs. Adaptive Robust Optimization: A Virtual Power Plant Application," INFORMS Journal on Computing, INFORMS, vol. 34(3), pages 1795-1818, May.
    20. Yang, Guijun & Wang, Zhigang & Deng, Wei, 2010. "Unbiased generalized quasi-regression," Computational Statistics & Data Analysis, Elsevier, vol. 54(3), pages 779-789, March.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:ejores:v:247:y:2015:i:3:p:721-731. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/eor .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.