IDEAS home Printed from https://ideas.repec.org/a/spr/coopap/v64y2016i3d10.1007_s10589-016-9832-2.html
   My bibliography  Save this article

Feature subset selection for logistic regression via mixed integer optimization

Author

Listed:
  • Toshiki Sato

    (University of Tsukuba)

  • Yuichi Takano

    (Senshu University)

  • Ryuhei Miyashiro

    (Tokyo University of Agriculture and Technology)

  • Akiko Yoshise

    (University of Tsukuba)

Abstract

This paper concerns a method of selecting a subset of features for a logistic regression model. Information criteria, such as the Akaike information criterion and Bayesian information criterion, are employed as a goodness-of-fit measure. The purpose of our work is to establish a computational framework for selecting a subset of features with an optimality guarantee. For this purpose, we devise mixed integer optimization formulations for feature subset selection in logistic regression. Specifically, we pose the problem as a mixed integer linear optimization problem, which can be solved with standard mixed integer optimization software, by making a piecewise linear approximation of the logistic loss function. The computational results demonstrate that when the number of candidate features was less than 40, our method successfully provided a feature subset that was sufficiently close to an optimal one in a reasonable amount of time. Furthermore, even if there were more candidate features, our method often found a better subset of features than the stepwise methods did in terms of information criteria.

Suggested Citation

  • Toshiki Sato & Yuichi Takano & Ryuhei Miyashiro & Akiko Yoshise, 2016. "Feature subset selection for logistic regression via mixed integer optimization," Computational Optimization and Applications, Springer, vol. 64(3), pages 865-880, July.
  • Handle: RePEc:spr:coopap:v:64:y:2016:i:3:d:10.1007_s10589-016-9832-2
    DOI: 10.1007/s10589-016-9832-2
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s10589-016-9832-2
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s10589-016-9832-2?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Unler, Alper & Murat, Alper, 2010. "A discrete particle swarm optimization method for feature selection in binary classification problems," European Journal of Operational Research, Elsevier, vol. 206(3), pages 528-539, November.
    2. Hiroshi Konno & Rei Yamamoto, 2005. "A Mean-Variance-Skewness Model: Algorithm And Applications," International Journal of Theoretical and Applied Finance (IJTAF), World Scientific Publishing Co. Pte. Ltd., vol. 8(04), pages 409-423.
    3. Dimitris Bertsimas & Romy Shioda, 2009. "Algorithm for cardinality-constrained quadratic optimization," Computational Optimization and Applications, Springer, vol. 43(1), pages 1-22, May.
    4. Edward I. Altman, 1968. "The Prediction Of Corporate Bankruptcy: A Discriminant Analysis," Journal of Finance, American Finance Association, vol. 23(1), pages 193-194, March.
    5. Pacheco, Joaquín & Casado, Silvia & Núñez, Laura, 2009. "A variable selection method based on Tabu search for logistic regression models," European Journal of Operational Research, Elsevier, vol. 199(2), pages 506-511, December.
    6. Edward I. Altman, 1968. "Financial Ratios, Discriminant Analysis And The Prediction Of Corporate Bankruptcy," Journal of Finance, American Finance Association, vol. 23(4), pages 589-609, September.
    7. Miyashiro, Ryuhei & Takano, Yuichi, 2015. "Mixed integer second-order cone programming formulations for variable selection in linear regression," European Journal of Operational Research, Elsevier, vol. 247(3), pages 721-731.
    8. Hiroshi Konno & Yoshihiro Takaya, 2010. "Multi-step methods for choosing the best set of variables in regression analysis," Computational Optimization and Applications, Springer, vol. 46(3), pages 417-426, July.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Manlio Gaudioso & Giovanni Giallombardo & Giovanna Miglionico, 2023. "Sparse optimization via vector k-norm and DC programming with an application to feature selection for support vector machines," Computational Optimization and Applications, Springer, vol. 86(2), pages 745-766, November.
    2. Enrico Civitelli & Matteo Lapucci & Fabio Schoen & Alessio Sortino, 2021. "An effective procedure for feature subset selection in logistic regression based on information criteria," Computational Optimization and Applications, Springer, vol. 80(1), pages 1-32, September.
    3. Ryuta Tamura & Ken Kobayashi & Yuichi Takano & Ryuhei Miyashiro & Kazuhide Nakata & Tomomi Matsui, 2019. "Mixed integer quadratic optimization formulations for eliminating multicollinearity based on variance inflation factor," Journal of Global Optimization, Springer, vol. 73(2), pages 431-446, February.
    4. Yuichi Takano & Ryuhei Miyashiro, 2020. "Best subset selection via cross-validation criterion," TOP: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 28(2), pages 475-488, July.
    5. Matteo Lapucci & Tommaso Levato & Marco Sciandrone, 2021. "Convergent Inexact Penalty Decomposition Methods for Cardinality-Constrained Problems," Journal of Optimization Theory and Applications, Springer, vol. 188(2), pages 473-496, February.
    6. Danuta Zawadzka & Agnieszka Strzelecka & Ewa Szafraniec-Siluta, 2021. "Debt as a Source of Financial Energy of the Farm—What Causes the Use of External Capital in Financing Agricultural Activity? A Model Approach," Energies, MDPI, vol. 14(14), pages 1-17, July.
    7. Paz, Alexander & Arteaga, Cristian & Cobos, Carlos, 2019. "Specification of mixed logit models assisted by an optimization framework," Journal of choice modelling, Elsevier, vol. 30(C), pages 50-60.
    8. Leonardo Di Gangi & M. Lapucci & F. Schoen & A. Sortino, 2019. "An efficient optimization approach for best subset selection in linear regression, with application to model selection and fitting in autoregressive time-series," Computational Optimization and Applications, Springer, vol. 74(3), pages 919-948, December.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Mohammad Mahdi Mousavi & Jamal Ouenniche & Kaoru Tone, 2023. "A dynamic performance evaluation of distress prediction models," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 42(4), pages 756-784, July.
    2. Casado Yusta, Silvia & Nœ–ez Letamendía, Laura & Pacheco Bonrostro, Joaqu’n Antonio, 2018. "Predicting Corporate Failure: The GRASP-LOGIT Model || Predicci—n de la quiebra empresarial: el modelo GRASP-LOGIT," Revista de Métodos Cuantitativos para la Economía y la Empresa = Journal of Quantitative Methods for Economics and Business Administration, Universidad Pablo de Olavide, Department of Quantitative Methods for Economics and Business Administration, vol. 26(1), pages 294-314, Diciembre.
    3. Shaikh, Ibrahim A. & O'Brien, Jonathan Paul & Peters, Lois, 2018. "Inside directors and the underinvestment of financial slack towards R&D-intensity in high-technology firms," Journal of Business Research, Elsevier, vol. 82(C), pages 192-201.
    4. Mikel Bedayo & Gabriel Jiménez & José-Luis Peydró & Raquel Vegas, 2020. "Screening and Loan Origination Time: Lending Standards, Loan Defaults and Bank Failures," Working Papers 1215, Barcelona School of Economics.
    5. Ruey-Ching Hwang, 2013. "Forecasting credit ratings with the varying-coefficient model," Quantitative Finance, Taylor & Francis Journals, vol. 13(12), pages 1947-1965, December.
    6. Giordani, Paolo & Jacobson, Tor & Schedvin, Erik von & Villani, Mattias, 2014. "Taking the Twists into Account: Predicting Firm Bankruptcy Risk with Splines of Financial Ratios," Journal of Financial and Quantitative Analysis, Cambridge University Press, vol. 49(4), pages 1071-1099, August.
    7. Li, Chunyu & Lou, Chenxin & Luo, Dan & Xing, Kai, 2021. "Chinese corporate distress prediction using LASSO: The role of earnings management," International Review of Financial Analysis, Elsevier, vol. 76(C).
    8. Suzan Hol, 2006. "The influence of the business cycle on bankruptcy probability," Discussion Papers 466, Statistics Norway, Research Department.
    9. Pavol Durana & Lucia Michalkova & Andrej Privara & Josef Marousek & Milos Tumpach, 2021. "Does the life cycle affect earnings management and bankruptcy?," Oeconomia Copernicana, Institute of Economic Research, vol. 12(2), pages 425-461, June.
    10. Dinh, K. & Kleimeier, S., 2006. "Credit scoring for Vietnam's retail banking market : implementation and implications for transactional versus relationship lending," Research Memorandum 012, Maastricht University, Maastricht Research School of Economics of Technology and Organization (METEOR).
    11. Elizaveta Danilova & Evgeny Rumyantsev & Ivan Shevchuk, 2018. "Review of the Bank of Russia – IMF Workshop 'Recent Developments in Macroprudential Stress Testing'," Russian Journal of Money and Finance, Bank of Russia, vol. 77(4), pages 60-83, December.
    12. Alderson, Michael J. & Betker, Brian L. & Halford, Joseph T., 2021. "Fictitious dividend cuts in the CRSP data," Journal of Corporate Finance, Elsevier, vol. 71(C).
    13. Pesaran, M. Hashem & Schuermann, Til & Treutler, Bjorn-Jakob & Weiner, Scott M., 2006. "Macroeconomic Dynamics and Credit Risk: A Global Perspective," Journal of Money, Credit and Banking, Blackwell Publishing, vol. 38(5), pages 1211-1261, August.
    14. Nabeel Al-Milli & Amjad Hudaib & Nadim Obeid, 2021. "Population Diversity Control of Genetic Algorithm Using a Novel Injection Method for Bankruptcy Prediction Problem," Mathematics, MDPI, vol. 9(8), pages 1-18, April.
    15. Sandrine Lardic & Claire Gauthier, 2003. "Un modèle multifactoriel des spreads de crédit : estimation sur panels complets et incomplets," Économie et Prévision, Programme National Persée, vol. 159(3), pages 53-69.
    16. Hu, Xiaolu & Shi, Jing & Wang, Lafang & Yu, Jing, 2020. "Foreign ownership in Chinese credit ratings industry: Information revelation or certification?," Journal of Banking & Finance, Elsevier, vol. 118(C).
    17. Lauren Stagnol, 2015. "Designing a corporate bond index on solvency criteria," EconomiX Working Papers 2015-39, University of Paris Nanterre, EconomiX.
    18. Nadiah Amirah Nor Azhari & Suhaily Hasnan & Zuraidah Mohd Sanusi, 2020. "The Relationships Between Managerial Overconfidence, Audit Committee, CEO Duality and Audit Quality and Accounting Misstatements," International Journal of Financial Research, International Journal of Financial Research, Sciedu Press, vol. 11(3), pages 18-30, June.
    19. Nagar, Neerav & Sen, Kaustav, 2016. "Earnings management in India: Managers’ fixation on operating profits," Journal of International Accounting, Auditing and Taxation, Elsevier, vol. 26(C), pages 1-12.
    20. Lin, Hsiou-Wei William & Lo, Huai-Chun & Wu, Ruei-Shian, 2016. "Modeling default prediction with earnings management," Pacific-Basin Finance Journal, Elsevier, vol. 40(PB), pages 306-322.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:coopap:v:64:y:2016:i:3:d:10.1007_s10589-016-9832-2. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.