IDEAS home Printed from https://ideas.repec.org/a/spr/jcomop/v42y2021i1d10.1007_s10878-021-00732-2.html
   My bibliography  Save this article

Efficient feature selection for logical analysis of large-scale multi-class datasets

Author

Listed:
  • Kedong Yan

    (School of Computer Science and Engineering, Nanjing University of Science and Technology)

  • Dongjing Miao

    (Faculty of Computing, Harbin Institute of Technology)

  • Cui Guo

    (Business School, Shantou University)

  • Chanying Huang

    (School of Computer Science and Engineering, Nanjing University of Science and Technology)

Abstract

Feature selection in logical analysis of data (LAD) can be cast into a set covering problem. In this paper, extending the results on feature selection for binary classification using LAD, we present a mathematical model that selects a minimum set of necessary features for multi-class datasets and develop a heuristic algorithm that is both memory and time efficient for this model correspondingly. The utility of the algorithm is illustrated on a small example and the superiority of our work is demonstrated through experiments on 6 real-life multi-class datasets from UCI repository.

Suggested Citation

  • Kedong Yan & Dongjing Miao & Cui Guo & Chanying Huang, 2021. "Efficient feature selection for logical analysis of large-scale multi-class datasets," Journal of Combinatorial Optimization, Springer, vol. 42(1), pages 1-23, July.
  • Handle: RePEc:spr:jcomop:v:42:y:2021:i:1:d:10.1007_s10878-021-00732-2
    DOI: 10.1007/s10878-021-00732-2
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s10878-021-00732-2
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s10878-021-00732-2?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Egon Balas & Maria C. Carrera, 1996. "A Dynamic Subgradient-Based Branch-and-Bound Procedure for Set Covering," Operations Research, INFORMS, vol. 44(6), pages 875-890, December.
    2. Travaughn C. Bain & Juan F. Avila-Herrera & Ersoy Subasi & Munevver Mine Subasi, 2020. "Logical analysis of multiclass data with relaxed patterns," Annals of Operations Research, Springer, vol. 287(1), pages 11-35, April.
    3. Renato Bruni, 2007. "Reformulation of the support set selection problem in the logical analysis of data," Annals of Operations Research, Springer, vol. 150(1), pages 79-92, March.
    4. Ahmed Ragab & Mohamed-Salah Ouali & Soumaya Yacout & Hany Osman, 2016. "Remaining useful life prediction using prognostic methodology based on logical analysis of data and Kaplan–Meier estimation," Journal of Intelligent Manufacturing, Springer, vol. 27(5), pages 943-958, October.
    5. Alberto Caprara & Matteo Fischetti & Paolo Toth, 1999. "A Heuristic Method for the Set Covering Problem," Operations Research, INFORMS, vol. 47(5), pages 730-743, October.
    6. Jocelyn, Sabrina & Chinniah, Yuvin & Ouali, Mohamed-Salah & Yacout, Soumaya, 2017. "Application of logical analysis of data to machinery-related accident prevention based on scarce data," Reliability Engineering and System Safety, Elsevier, vol. 159(C), pages 223-236.
    7. V. Chvatal, 1979. "A Greedy Heuristic for the Set-Covering Problem," Mathematics of Operations Research, INFORMS, vol. 4(3), pages 233-235, August.
    8. Sorin Alexe & Eugene Blackstone & Peter Hammer & Hemant Ishwaran & Michael Lauer & Claire Pothier Snader, 2003. "Coronary Risk Prediction by Logical Analysis of Data," Annals of Operations Research, Springer, vol. 119(1), pages 15-42, March.
    9. Marshall L. Fisher & Pradeep Kedia, 1990. "Optimal Solution of Set Covering/Partitioning Problems Using Dual Heuristics," Management Science, INFORMS, vol. 36(6), pages 674-688, June.
    10. M. W. Brauner & N. Brauner & P. L. Hammer & I. Lozina & D. Valeyre, 2007. "Logical Analysis of Computed Tomography Data to Differentiate Entities of Idiopathic Interstitial Pneumonias," Springer Optimization and Its Applications, in: Panos M. Pardalos & Vladimir L. Boginski & Alkis Vazacopoulos (ed.), Data Mining in Biomedicine, pages 193-208, Springer.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Lejeune, Miguel & Lozin, Vadim & Lozina, Irina & Ragab, Ahmed & Yacout, Soumaya, 2019. "Recent advances in the theory and practice of Logical Analysis of Data," European Journal of Operational Research, Elsevier, vol. 275(1), pages 1-15.
    2. Guo, Cui & Ryoo, Hong Seo, 2021. "On Pareto-Optimal Boolean Logical Patterns for Numerical Data," Applied Mathematics and Computation, Elsevier, vol. 403(C).
    3. Wang, Yiyuan & Pan, Shiwei & Al-Shihabi, Sameh & Zhou, Junping & Yang, Nan & Yin, Minghao, 2021. "An improved configuration checking-based algorithm for the unicost set covering problem," European Journal of Operational Research, Elsevier, vol. 294(2), pages 476-491.
    4. Gao, Chao & Yao, Xin & Weise, Thomas & Li, Jinlong, 2015. "An efficient local search heuristic with row weighting for the unicost set covering problem," European Journal of Operational Research, Elsevier, vol. 246(3), pages 750-761.
    5. Lan, Guanghui & DePuy, Gail W. & Whitehouse, Gary E., 2007. "An effective and simple heuristic for the set covering problem," European Journal of Operational Research, Elsevier, vol. 176(3), pages 1387-1403, February.
    6. Masoud Yaghini & Mohammad Karimi & Mohadeseh Rahbar, 2015. "A set covering approach for multi-depot train driver scheduling," Journal of Combinatorial Optimization, Springer, vol. 29(3), pages 636-654, April.
    7. Kedong Yan & Hong Seo Ryoo, 2019. "A multi-term, polyhedral relaxation of a 0–1 multilinear function for Boolean logical pattern generation," Journal of Global Optimization, Springer, vol. 74(4), pages 705-735, August.
    8. Victor Reyes & Ignacio Araya, 2021. "A GRASP-based scheme for the set covering problem," Operational Research, Springer, vol. 21(4), pages 2391-2408, December.
    9. Ablanedo-Rosas, José H. & Rego, César, 2010. "Surrogate constraint normalization for the set covering problem," European Journal of Operational Research, Elsevier, vol. 205(3), pages 540-551, September.
    10. Kedong Yan & Hong Seo Ryoo, 2022. "Graph, clique and facet of boolean logical polytope," Journal of Global Optimization, Springer, vol. 82(4), pages 1015-1052, April.
    11. Yagiura, Mutsunori & Kishida, Masahiro & Ibaraki, Toshihide, 2006. "A 3-flip neighborhood local search for the set covering problem," European Journal of Operational Research, Elsevier, vol. 172(2), pages 472-499, July.
    12. Li, Shengyin & Huang, Yongxi, 2014. "Heuristic approaches for the flow-based set covering problem with deviation paths," Transportation Research Part E: Logistics and Transportation Review, Elsevier, vol. 72(C), pages 144-158.
    13. Youngho Lee & Hanif D. Sherali & Ikhyun Kwon & Seongin Kim, 2006. "A new reformulation approach for the generalized partial covering problem," Naval Research Logistics (NRL), John Wiley & Sons, vol. 53(2), pages 170-179, March.
    14. Hernández-Leandro, Noberto A. & Boyer, Vincent & Salazar-Aguilar, M. Angélica & Rousseau, Louis-Martin, 2019. "A matheuristic based on Lagrangian relaxation for the multi-activity shift scheduling problem," European Journal of Operational Research, Elsevier, vol. 272(3), pages 859-867.
    15. Patrizia Beraldi & Andrzej Ruszczyński, 2002. "The Probabilistic Set-Covering Problem," Operations Research, INFORMS, vol. 50(6), pages 956-967, December.
    16. Shane N. Hall & Sheldon H. Jacobson & Edward C. Sewell, 2008. "An Analysis of Pediatric Vaccine Formulary Selection Problems," Operations Research, INFORMS, vol. 56(6), pages 1348-1365, December.
    17. Bautista, Joaquín & Pereira, Jordi, 2006. "Modeling the problem of locating collection areas for urban waste management. An application to the metropolitan area of Barcelona," Omega, Elsevier, vol. 34(6), pages 617-629, December.
    18. Torbjörn Larsson & Michael Patriksson, 2006. "Global Optimality Conditions for Discrete and Nonconvex Optimization---With Applications to Lagrangian Heuristics and Column Generation," Operations Research, INFORMS, vol. 54(3), pages 436-453, June.
    19. Jihong Yan & Wenliang Cheng & Chengyu Wang & Jun Liu & Ming Gao & Aoying Zhou, 2015. "Optimizing word set coverage for multi-event summarization," Journal of Combinatorial Optimization, Springer, vol. 30(4), pages 996-1015, November.
    20. Alberto Caprara & Matteo Fischetti & Paolo Toth, 1999. "A Heuristic Method for the Set Covering Problem," Operations Research, INFORMS, vol. 47(5), pages 730-743, October.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:jcomop:v:42:y:2021:i:1:d:10.1007_s10878-021-00732-2. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.