IDEAS home Printed from https://ideas.repec.org/a/pal/jorsoc/v60y2009i8d10.1057_palgrave.jors.2602651.html
   My bibliography  Save this article

Near-optimal feature selection for large databases

Author

Listed:
  • J Yang

    (Chonbuk National University)

  • S Ólafsson

    (Iowa State University)

Abstract

We analyse a new optimization-based approach for feature selection that uses the nested partitions method for combinatorial optimization as a heuristic search procedure to identify good feature subsets. In particular, we show how to improve the performance of the nested partitions method using random sampling of instances. The new approach uses a two-stage sampling scheme that determines the required sample size to guarantee convergence to a near-optimal solution. This approach therefore also has attractive theoretical characteristics. In particular, when the algorithm terminates in finite time, rigorous statements can be made concerning the quality of the final feature subset. Numerical results are reported to illustrate the key results, and show that the new approach is considerably faster than the original nested partitions method and other feature selection methods.

Suggested Citation

  • J Yang & S Ólafsson, 2009. "Near-optimal feature selection for large databases," Journal of the Operational Research Society, Palgrave Macmillan;The OR Society, vol. 60(8), pages 1045-1055, August.
  • Handle: RePEc:pal:jorsoc:v:60:y:2009:i:8:d:10.1057_palgrave.jors.2602651
    DOI: 10.1057/palgrave.jors.2602651
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1057/palgrave.jors.2602651
    File Function: Abstract
    Download Restriction: Access to full text is restricted to subscribers.

    File URL: https://libkey.io/10.1057/palgrave.jors.2602651?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Olafsson, Sigurdur & Li, Xiaonan & Wu, Shuning, 2008. "Operations research and data mining," European Journal of Operational Research, Elsevier, vol. 187(3), pages 1429-1448, June.
    2. Leyuan Shi & Sigurdur Ólafsson, 2000. "Nested Partitions Method for Global Optimization," Operations Research, INFORMS, vol. 48(3), pages 390-407, June.
    3. Sigurdur Ólafsson & Jaekyung Yang, 2005. "Intelligent Partitioning for Feature Selection," INFORMS Journal on Computing, INFORMS, vol. 17(3), pages 339-355, August.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Unler, Alper & Murat, Alper, 2010. "A discrete particle swarm optimization method for feature selection in binary classification problems," European Journal of Operational Research, Elsevier, vol. 206(3), pages 528-539, November.
    2. Meisel, Stephan & Mattfeld, Dirk, 2010. "Synergies of Operations Research and Data Mining," European Journal of Operational Research, Elsevier, vol. 206(1), pages 1-10, October.
    3. Lee, Loo Hay & Chew, Ek Peng & Manikam, Puvaneswari, 2006. "A general framework on the simulation-based optimization under fixed computing budget," European Journal of Operational Research, Elsevier, vol. 174(3), pages 1828-1841, November.
    4. Mark Gilchrist & Deana Lehmann Mooers & Glenn Skrubbeltrang & Francine Vachon, 2012. "Knowledge Discovery in Databases for Competitive Advantage," Journal of Management and Strategy, Journal of Management and Strategy, Sciedu Press, vol. 3(2), pages 2-15, April.
    5. Carrizosa, Emilio & Guerrero, Vanesa & Romero Morales, Dolores, 2018. "On Mathematical Optimization for the visualization of frequencies and adjacencies as rectangular maps," European Journal of Operational Research, Elsevier, vol. 265(1), pages 290-302.
    6. Lingxuan Liu & Leyuan Shi, 2019. "Simulation Optimization on Complex Job Shop Scheduling with Non-Identical Job Sizes," Asia-Pacific Journal of Operational Research (APJOR), World Scientific Publishing Co. Pte. Ltd., vol. 36(05), pages 1-26, October.
    7. Davidson, Ian & Tayi, Giri, 2009. "Data preparation using data quality matrices for classification mining," European Journal of Operational Research, Elsevier, vol. 197(2), pages 764-772, September.
    8. Daniel Gartner & Yiye Zhang & Rema Padman, 2018. "Cognitive workload reduction in hospital information systems," Health Care Management Science, Springer, vol. 21(2), pages 224-243, June.
    9. Tom Pape, 2020. "Prioritising data items for business analytics: Framework and application to human resources," Papers 2012.13813, arXiv.org.
    10. Heydari Majeed & Yousefli Amir, 2017. "A new optimization model for market basket analysis with allocation considerations: A genetic algorithm solution approach," Management & Marketing, Sciendo, vol. 12(1), pages 1-11, March.
    11. Anzanello, Michel J. & Albin, Susan L. & Chaovalitwongse, Wanpracha A., 2012. "Multicriteria variable selection for classification of production batches," European Journal of Operational Research, Elsevier, vol. 218(1), pages 97-105.
    12. Jesse G. Wales & Alexander J. Zolan & William T. Hamilton & Alexandra M. Newman & Michael J. Wagner, 2023. "Combining simulation and optimization to derive operating policies for a concentrating solar power plant," OR Spectrum: Quantitative Approaches in Management, Springer;Gesellschaft für Operations Research e.V., vol. 45(1), pages 119-150, March.
    13. Zhenyuan Liu & Lei Xiao & Jing Tian, 2016. "An activity-list-based nested partitions algorithm for resource-constrained project scheduling," International Journal of Production Research, Taylor & Francis Journals, vol. 54(16), pages 4744-4758, August.
    14. Geuens, Stijn & Coussement, Kristof & De Bock, Koen W., 2018. "A framework for configuring collaborative filtering-based recommendations derived from purchase data," European Journal of Operational Research, Elsevier, vol. 265(1), pages 208-218.
    15. Saridakis, Charalampos & Katsikeas, Constantine S. & Angelidou, Sofia & Oikonomidou, Maria & Pratikakis, Polyvios, 2023. "Mining Twitter lists to extract brand-related associative information for celebrity endorsement," European Journal of Operational Research, Elsevier, vol. 311(1), pages 316-332.
    16. K. Gokbayrak & C.G. Cassandras, 2002. "Generalized Surrogate Problem Methodology for Online Stochastic Discrete Optimization," Journal of Optimization Theory and Applications, Springer, vol. 114(1), pages 97-132, July.
    17. Clarisse Dhaenens & Laetitia Jourdan, 2019. "Metaheuristics for data mining," 4OR, Springer, vol. 17(2), pages 115-139, June.
    18. K. Gokbayrak & C. G. Cassandras, 2001. "Online Surrogate Problem Methodology for Stochastic Discrete Resource Allocation Problems," Journal of Optimization Theory and Applications, Springer, vol. 108(2), pages 349-376, February.
    19. Filom, Siyavash & Amiri, Amir M. & Razavi, Saiedeh, 2022. "Applications of machine learning methods in port operations – A systematic literature review," Transportation Research Part E: Logistics and Transportation Review, Elsevier, vol. 161(C).
    20. Mareček, Jakub & Richtárik, Peter & Takáč, Martin, 2017. "Matrix completion under interval uncertainty," European Journal of Operational Research, Elsevier, vol. 256(1), pages 35-43.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:pal:jorsoc:v:60:y:2009:i:8:d:10.1057_palgrave.jors.2602651. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.palgrave-journals.com/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.