IDEAS home Printed from https://ideas.repec.org/a/inm/orijoc/v33y2021i2p551-565.html
   My bibliography  Save this article

A Mixed-Integer Fractional Optimization Approach to Best Subset Selection

Author

Listed:
  • Andrés Gómez

    (Department of Industrial and Systems Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, California 90089)

  • Oleg A. Prokopyev

    (Department of Industrial Engineering, Swanson School of Engineering, University of Pittsburgh, Pittsburgh, Pennsylvania 15261)

Abstract

We consider the best subset selection problem in linear regression—that is, finding a parsimonious subset of the regression variables that provides the best fit to the data according to some predefined criterion. We are primarily concerned with alternatives to cross-validation methods that do not require data partitioning and involve a range of information criteria extensively studied in the statistical literature. We show that the problem of interest can be modeled using fractional mixed-integer optimization, which can be tackled by leveraging recent advances in modern optimization solvers. The proposed algorithms involve solving a sequence of mixed-integer quadratic optimization problems (or their convexifications) and can be implemented with off-the-shelf solvers. We report encouraging results in our computational experiments, with respect to both the optimization and statistical performance. Summary of Contribution: This paper considers feature selection problems with information criteria. We show that by adopting a fractional optimization perspective (a well-known field in nonlinear optimization and operations research), it is possible to leverage recent advances in mixed-integer quadratic optimization technology to tackle traditional statistical problems long considered intractable. We present extensive computational experiments, with both synthetic and real data, illustrating that the new fractional optimization approach is orders of magnitude faster than existing approaches in the literature.

Suggested Citation

  • Andrés Gómez & Oleg A. Prokopyev, 2021. "A Mixed-Integer Fractional Optimization Approach to Best Subset Selection," INFORMS Journal on Computing, INFORMS, vol. 33(2), pages 551-565, May.
  • Handle: RePEc:inm:orijoc:v:33:y:2021:i:2:p:551-565
    DOI: 10.1287/ijoc.2020.1031
    as

    Download full text from publisher

    File URL: http://dx.doi.org/10.1287/ijoc.2020.1031
    Download Restriction: no

    File URL: https://libkey.io/10.1287/ijoc.2020.1031?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Yuichi Takano & Ryuhei Miyashiro, 2020. "Best subset selection via cross-validation criterion," TOP: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 28(2), pages 475-488, July.
    2. Kumar Abhishek & Sven Leyffer & Jeff Linderoth, 2010. "FilMINT: An Outer Approximation-Based Solver for Convex Mixed-Integer Nonlinear Programs," INFORMS Journal on Computing, INFORMS, vol. 22(4), pages 555-567, November.
    3. Fred Glover, 1975. "Improved Linear Integer Programming Formulations of Nonlinear Integer Problems," Management Science, INFORMS, vol. 22(4), pages 455-460, December.
    4. Dimitris Bertsimas & Romy Shioda, 2009. "Algorithm for cardinality-constrained quadratic optimization," Computational Optimization and Applications, Springer, vol. 43(1), pages 1-22, May.
    5. M. J. Garside, 1971. "Some Computational Procedures for the Best Subset Problem," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 20(1), pages 8-15, March.
    6. Robert Tibshirani, 2011. "Regression shrinkage and selection via the lasso: a retrospective," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 73(3), pages 273-282, June.
    7. Nimrod Megiddo, 1979. "Combinatorial Optimization with Rational Objective Functions," Mathematics of Operations Research, INFORMS, vol. 4(4), pages 414-424, November.
    8. M. J. Garside, 1965. "The Best Sub‐Set in Multiple Regression Analysis," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 14(2-3), pages 196-200, November.
    9. Alper Atamtürk & Carlos Deck & Hyemin Jeon, 2020. "Successive Quadratic Upper-Bounding for Discrete Mean-Risk Minimization and Network Interdiction," INFORMS Journal on Computing, INFORMS, vol. 32(2), pages 346-355, April.
    10. Alper Atamtürk & Hyemin Jeon, 2019. "Lifted polymatroid inequalities for mean-risk optimization with indicator variables," Journal of Global Optimization, Springer, vol. 73(4), pages 677-699, April.
    11. Juan S. Borrero & Colin Gillen & Oleg A. Prokopyev, 2017. "Fractional 0–1 programming: applications and algorithms," Journal of Global Optimization, Springer, vol. 69(1), pages 255-282, September.
    12. Werner Dinkelbach, 1967. "On Nonlinear Fractional Programming," Management Science, INFORMS, vol. 13(7), pages 492-498, March.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Juan S. Borrero & Colin Gillen & Oleg A. Prokopyev, 2017. "Fractional 0–1 programming: applications and algorithms," Journal of Global Optimization, Springer, vol. 69(1), pages 255-282, September.
    2. Ricardo M. Lima & Ignacio E. Grossmann, 2017. "On the solution of nonconvex cardinality Boolean quadratic programming problems: a computational study," Computational Optimization and Applications, Springer, vol. 66(1), pages 1-37, January.
    3. Dimitris Bertsimas & Ryan Cory-Wright, 2022. "A Scalable Algorithm for Sparse Portfolio Selection," INFORMS Journal on Computing, INFORMS, vol. 34(3), pages 1489-1511, May.
    4. Weerasena, Lakmali & Shier, Douglas & Tonkyn, David & McFeaters, Mark & Collins, Christopher, 2023. "A sequential approach to reserve design with compactness and contiguity considerations," Ecological Modelling, Elsevier, vol. 478(C).
    5. Leonardo Di Gangi & M. Lapucci & F. Schoen & A. Sortino, 2019. "An efficient optimization approach for best subset selection in linear regression, with application to model selection and fitting in autoregressive time-series," Computational Optimization and Applications, Springer, vol. 74(3), pages 919-948, December.
    6. Samyukta Sethuraman & Sergiy Butenko, 2015. "The maximum ratio clique problem," Computational Management Science, Springer, vol. 12(1), pages 197-218, January.
    7. Thompson, Ryan, 2022. "Robust subset selection," Computational Statistics & Data Analysis, Elsevier, vol. 169(C).
    8. Denoyel, Victoire & Alfandari, Laurent & Thiele, Aurélie, 2017. "Optimizing healthcare network design under reference pricing and parameter uncertainty," European Journal of Operational Research, Elsevier, vol. 263(3), pages 996-1006.
    9. Pinheiro, Rian G.S. & Martins, Ivan C. & Protti, Fábio & Ochi, Luiz S. & Simonetti, Luidi G. & Subramanian, Anand, 2016. "On solving manufacturing cell formation via Bicluster Editing," European Journal of Operational Research, Elsevier, vol. 254(3), pages 769-779.
    10. Billionnet, Alain, 2013. "Mathematical optimization ideas for biodiversity conservation," European Journal of Operational Research, Elsevier, vol. 231(3), pages 514-534.
    11. Young Woong Park & Diego Klabjan, 2020. "Subset selection for multiple linear regression via optimization," Journal of Global Optimization, Springer, vol. 77(3), pages 543-574, July.
    12. Erfan Mehmanchi & Andrés Gómez & Oleg A. Prokopyev, 2021. "Solving a class of feature selection problems via fractional 0–1 programming," Annals of Operations Research, Springer, vol. 303(1), pages 265-295, August.
    13. Gerelt Tserenjigmid, 2021. "The Order-Dependent Luce Model," Management Science, INFORMS, vol. 67(11), pages 6915-6933, November.
    14. Tunjo Perić & Josip Matejaš & Zoran Babić, 2023. "Advantages, sensitivity and application efficiency of the new iterative method to solve multi-objective linear fractional programming problem," Central European Journal of Operations Research, Springer;Slovak Society for Operations Research;Hungarian Operational Research Society;Czech Society for Operations Research;Österr. Gesellschaft für Operations Research (ÖGOR);Slovenian Society Informatika - Section for Operational Research;Croatian Operational Research Society, vol. 31(3), pages 751-767, September.
    15. Lili Pan & Ziyan Luo & Naihua Xiu, 2017. "Restricted Robinson Constraint Qualification and Optimality for Cardinality-Constrained Cone Programming," Journal of Optimization Theory and Applications, Springer, vol. 175(1), pages 104-118, October.
    16. Lucian Belascu & Alexandra Horobet & Georgiana Vrinceanu & Consuela Popescu, 2021. "Performance Dissimilarities in European Union Manufacturing: The Effect of Ownership and Technological Intensity," Sustainability, MDPI, vol. 13(18), pages 1-19, September.
    17. Tien Mai & Arunesh Sinha, 2022. "Safe Delivery of Critical Services in Areas with Volatile Security Situation via a Stackelberg Game Approach," Papers 2204.11451, arXiv.org.
    18. Park, Chong Hyun & Lim, Heejong, 2021. "A parametric approach to integer linear fractional programming: Newton’s and Hybrid-Newton methods for an optimal road maintenance problem," European Journal of Operational Research, Elsevier, vol. 289(3), pages 1030-1039.
    19. Steffen Rebennack & Ashwin Arulselvan & Lily Elefteriadou & Panos M. Pardalos, 2010. "Complexity analysis for maximum flow problems with arc reversals," Journal of Combinatorial Optimization, Springer, vol. 19(2), pages 200-216, February.
    20. Yong Xia & Longfei Wang & Xiaohui Wang, 2020. "Globally minimizing the sum of a convex–concave fraction and a convex function based on wave-curve bounds," Journal of Global Optimization, Springer, vol. 77(2), pages 301-318, June.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:inm:orijoc:v:33:y:2021:i:2:p:551-565. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Asher (email available below). General contact details of provider: https://edirc.repec.org/data/inforea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.