IDEAS home Printed from https://ideas.repec.org/a/spr/jglopt/v77y2020i3d10.1007_s10898-020-00876-1.html
   My bibliography  Save this article

Subset selection for multiple linear regression via optimization

Author

Listed:
  • Young Woong Park

    (Iowa State University)

  • Diego Klabjan

    (Northwestern University)

Abstract

Subset selection in multiple linear regression aims to choose a subset of candidate explanatory variables that tradeoff fitting error (explanatory power) and model complexity (number of variables selected). We build mathematical programming models for regression subset selection based on mean square and absolute errors, and minimal-redundancy–maximal-relevance criteria. The proposed models are tested using a linear-program-based branch-and-bound algorithm with tailored valid inequalities and big M values and are compared against the algorithms in the literature. For high dimensional cases, an iterative heuristic algorithm is proposed based on the mathematical programming models and a core set concept, and a randomized version of the algorithm is derived to guarantee convergence to the global optimum. From the computational experiments, we find that our models quickly find a quality solution while the rest of the time is spent to prove optimality; the iterative algorithms find solutions in a relatively short time and are competitive compared to state-of-the-art algorithms; using ad-hoc big M values is not recommended.

Suggested Citation

  • Young Woong Park & Diego Klabjan, 2020. "Subset selection for multiple linear regression via optimization," Journal of Global Optimization, Springer, vol. 77(3), pages 543-574, July.
  • Handle: RePEc:spr:jglopt:v:77:y:2020:i:3:d:10.1007_s10898-020-00876-1
    DOI: 10.1007/s10898-020-00876-1
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s10898-020-00876-1
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s10898-020-00876-1?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. A. Charnes & W. W. Cooper & R. O. Ferguson, 1955. "Optimal Estimation of Executive Compensation by Linear Programming," Management Science, INFORMS, vol. 1(2), pages 138-151, January.
    2. Kyoungmi Hwang & Dohyun Kim & Kyungsik Lee & Chungmok Lee & Sungsoo Park, 2017. "Embedded variable selection method using signomial classification," Annals of Operations Research, Springer, vol. 254(1), pages 89-109, July.
    3. Fred Glover, 1975. "Improved Linear Integer Programming Formulations of Nonlinear Integer Problems," Management Science, INFORMS, vol. 22(4), pages 455-460, December.
    4. Dimitris Bertsimas & Romy Shioda, 2009. "Algorithm for cardinality-constrained quadratic optimization," Computational Optimization and Applications, Springer, vol. 43(1), pages 1-22, May.
    5. P. S. Bradley & O. L. Mangasarian & W. N. Street, 1998. "Feature Selection via Mathematical Programming," INFORMS Journal on Computing, INFORMS, vol. 10(2), pages 209-217, May.
    6. Miyashiro, Ryuhei & Takano, Yuichi, 2015. "Mixed integer second-order cone programming formulations for variable selection in linear regression," European Journal of Operational Research, Elsevier, vol. 247(3), pages 721-731.
    7. Dimitris Bertsimas & Angela King, 2016. "OR Forum—An Algorithmic Approach to Linear Regression," Operations Research, INFORMS, vol. 64(1), pages 2-16, February.
    8. DE FARIAS, Ismael R. & NEMHAUSER, Georges L., 2003. "A polyhedral study of the cardinality constrained knapsack problem," LIDAM Reprints CORE 1634, Université catholique de Louvain, Center for Operations Research and Econometrics (CORE).
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Ricardo M. Lima & Ignacio E. Grossmann, 2017. "On the solution of nonconvex cardinality Boolean quadratic programming problems: a computational study," Computational Optimization and Applications, Springer, vol. 66(1), pages 1-37, January.
    2. Andrés Gómez & Oleg A. Prokopyev, 2021. "A Mixed-Integer Fractional Optimization Approach to Best Subset Selection," INFORMS Journal on Computing, INFORMS, vol. 33(2), pages 551-565, May.
    3. Dimitris Bertsimas & Ryan Cory-Wright, 2022. "A Scalable Algorithm for Sparse Portfolio Selection," INFORMS Journal on Computing, INFORMS, vol. 34(3), pages 1489-1511, May.
    4. Leonardo Di Gangi & M. Lapucci & F. Schoen & A. Sortino, 2019. "An efficient optimization approach for best subset selection in linear regression, with application to model selection and fitting in autoregressive time-series," Computational Optimization and Applications, Springer, vol. 74(3), pages 919-948, December.
    5. Joey Huchette & Joey Huchette, 2019. "A Combinatorial Approach for Small and Strong Formulations of Disjunctive Constraints," Mathematics of Operations Research, INFORMS, vol. 44(3), pages 793-820, August.
    6. Ryuta Tamura & Ken Kobayashi & Yuichi Takano & Ryuhei Miyashiro & Kazuhide Nakata & Tomomi Matsui, 2019. "Mixed integer quadratic optimization formulations for eliminating multicollinearity based on variance inflation factor," Journal of Global Optimization, Springer, vol. 73(2), pages 431-446, February.
    7. Toshiki Sato & Yuichi Takano & Ryuhei Miyashiro & Akiko Yoshise, 2016. "Feature subset selection for logistic regression via mixed integer optimization," Computational Optimization and Applications, Springer, vol. 64(3), pages 865-880, July.
    8. Lili Pan & Ziyan Luo & Naihua Xiu, 2017. "Restricted Robinson Constraint Qualification and Optimality for Cardinality-Constrained Cone Programming," Journal of Optimization Theory and Applications, Springer, vol. 175(1), pages 104-118, October.
    9. Wojtek Michalowski & Włodzimierz Ogryczak, 2001. "Extending the MAD portfolio optimization model to incorporate downside risk aversion," Naval Research Logistics (NRL), John Wiley & Sons, vol. 48(3), pages 185-200, April.
    10. Ye, Ya-Fen & Shao, Yuan-Hai & Deng, Nai-Yang & Li, Chun-Na & Hua, Xiang-Yu, 2017. "Robust Lp-norm least squares support vector regression with feature selection," Applied Mathematics and Computation, Elsevier, vol. 305(C), pages 32-52.
    11. Kadziński, MiŁosz & Greco, Salvatore & SŁowiński, Roman, 2012. "Extreme ranking analysis in robust ordinal regression," Omega, Elsevier, vol. 40(4), pages 488-501.
    12. Thomas L. Saaty, 2013. "The Modern Science of Multicriteria Decision Making and Its Practical Applications: The AHP/ANP Approach," Operations Research, INFORMS, vol. 61(5), pages 1101-1118, October.
    13. Yanagida, John F. & Book, Don N., 1984. "Application of the Least Absolute Value Technique as a Data Filter for Detecting Structural Change in the Demand for Meat," Northeastern Journal of Agricultural and Resource Economics, Northeastern Agricultural and Resource Economics Association, vol. 0(Number 1), pages 1-5, April.
    14. Yokoyama, Ryohei & Kitano, Hiroyuki & Wakui, Tetsuya, 2017. "Optimal operation of heat supply systems with piping network," Energy, Elsevier, vol. 137(C), pages 888-897.
    15. Tian, Xueyu & You, Fengqi, 2019. "Carbon-neutral hybrid energy systems with deep water source cooling, biomass heating, and geothermal heat and power," Applied Energy, Elsevier, vol. 250(C), pages 413-432.
    16. Dima, Bogdan & Dincă, Marius Sorin & Spulbăr, Cristi, 2014. "Financial nexus: Efficiency and soundness in banking and capital markets," Journal of International Money and Finance, Elsevier, vol. 47(C), pages 100-124.
    17. Hamalainen, Raimo P. & Mantysaari, Juha, 2002. "Dynamic multi-objective heating optimization," European Journal of Operational Research, Elsevier, vol. 142(1), pages 1-15, October.
    18. Mostafa Rezaei & Ivor Cribben & Michele Samorani, 2021. "A clustering-based feature selection method for automatically generated relational attributes," Annals of Operations Research, Springer, vol. 303(1), pages 233-263, August.
    19. Jize Zhang & Tim Leung & Aleksandr Aravkin, 2018. "A Relaxed Optimization Approach for Cardinality-Constrained Portfolio Optimization," Papers 1810.10563, arXiv.org.
    20. Longinidis, Pantelis & Georgiadis, Michael C., 2014. "Integration of sale and leaseback in the optimal design of supply chain networks," Omega, Elsevier, vol. 47(C), pages 73-89.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:jglopt:v:77:y:2020:i:3:d:10.1007_s10898-020-00876-1. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.