IDEAS home Printed from https://ideas.repec.org/a/eee/ejores/v263y2017i2p367-379.html
   My bibliography  Save this article

New diagonal bundle method for clustering problems in large data sets

Author

Listed:
  • Karmitsa, Napsu
  • Bagirov, Adil M.
  • Taheri, Sona

Abstract

Clustering is one of the most important tasks in data mining. Recent developments in computer hardware allow us to store in random access memory (RAM) and repeatedly read data sets with hundreds of thousands and even millions of data points. This makes it possible to use conventional clustering algorithms in such data sets. However, these algorithms may need prohibitively large computational time and fail to produce accurate solutions. Therefore, it is important to develop clustering algorithms which are accurate and can provide real time clustering in large data sets. This paper introduces one of them. Using nonsmooth optimization formulation of the clustering problem the objective function is represented as a difference of two convex (DC) functions. Then a new diagonal bundle algorithm that explicitly uses this structure is designed and combined with an incremental approach to solve this problem. The method is evaluated using real world data sets with both large number of attributes and large number of data points. The proposed method is compared with two other clustering algorithms using numerical results.

Suggested Citation

  • Karmitsa, Napsu & Bagirov, Adil M. & Taheri, Sona, 2017. "New diagonal bundle method for clustering problems in large data sets," European Journal of Operational Research, Elsevier, vol. 263(2), pages 367-379.
  • Handle: RePEc:eee:ejores:v:263:y:2017:i:2:p:367-379
    DOI: 10.1016/j.ejor.2017.06.010
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0377221717305283
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.ejor.2017.06.010?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. A. Bagirov & B. Ordin & G. Ozturk & A. Xavier, 2015. "An incremental clustering algorithm based on hyperbolic smoothing," Computational Optimization and Applications, Springer, vol. 61(1), pages 219-241, May.
    2. Santi, Éverton & Aloise, Daniel & Blanchard, Simon J., 2016. "A model for clustering data from heterogeneous dissimilarities," European Journal of Operational Research, Elsevier, vol. 253(3), pages 659-672.
    3. Rabello, Rômulo Louzada & Mauri, Geraldo Regis & Ribeiro, Glaydston Mattos & Lorena, Luiz Antonio Nogueira, 2014. "A Clustering Search metaheuristic for the Point-Feature Cartographic Label Placement Problem," European Journal of Operational Research, Elsevier, vol. 234(3), pages 802-808.
    4. Bagirov, Adil M. & Yearwood, John, 2006. "A new nonsmooth optimization algorithm for minimum sum-of-squares clustering problems," European Journal of Operational Research, Elsevier, vol. 170(2), pages 578-596, April.
    5. Carrizosa, Emilio & Mladenović, Nenad & Todosijević, Raca, 2013. "Variable neighborhood search for minimum sum-of-squares clustering on networks," European Journal of Operational Research, Elsevier, vol. 230(2), pages 356-363.
    6. A. Bagirov & A. Rubinov & N. Soukhoroukova & J. Yearwood, 2003. "Unsupervised and supervised data classification via nonsmooth and global optimization," TOP: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 11(1), pages 1-75, June.
    7. Adil Bagirov & Napsu Karmitsa & Marko M. Mäkelä, 2014. "Introduction to Nonsmooth Optimization," Springer Books, Springer, edition 127, number 978-3-319-08114-4, November.
    8. Napsu Karmitsa, 2015. "Diagonal Bundle Method for Nonsmooth Sparse Optimization," Journal of Optimization Theory and Applications, Springer, vol. 166(3), pages 889-905, September.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Veronica Piccialli & Antonio M. Sudoso & Angelika Wiegele, 2022. "SOS-SDP: An Exact Solver for Minimum Sum-of-Squares Clustering," INFORMS Journal on Computing, INFORMS, vol. 34(4), pages 2144-2162, July.
    2. Gambella, Claudio & Ghaddar, Bissan & Naoum-Sawaya, Joe, 2021. "Optimization problems for machine learning: A survey," European Journal of Operational Research, Elsevier, vol. 290(3), pages 807-828.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Gambella, Claudio & Ghaddar, Bissan & Naoum-Sawaya, Joe, 2021. "Optimization problems for machine learning: A survey," European Journal of Operational Research, Elsevier, vol. 290(3), pages 807-828.
    2. Adil Bagirov & Asef Ganjehlou, 2008. "An approximate subgradient algorithm for unconstrained nonsmooth, nonconvex optimization," Mathematical Methods of Operations Research, Springer;Gesellschaft für Operations Research (GOR);Nederlands Genootschap voor Besliskunde (NGB), vol. 67(2), pages 187-206, April.
    3. A. Bagirov & B. Ordin & G. Ozturk & A. Xavier, 2015. "An incremental clustering algorithm based on hyperbolic smoothing," Computational Optimization and Applications, Springer, vol. 61(1), pages 219-241, May.
    4. Gaudioso, Manlio & Giallombardo, Giovanni & Mukhametzhanov, Marat, 2018. "Numerical infinitesimals in a variable metric method for convex nonsmooth optimization," Applied Mathematics and Computation, Elsevier, vol. 318(C), pages 312-320.
    5. Bagirov, Adil M. & Ugon, Julien & Mirzayeva, Hijran, 2013. "Nonsmooth nonconvex optimization approach to clusterwise linear regression problems," European Journal of Operational Research, Elsevier, vol. 229(1), pages 132-142.
    6. Manlio Gaudioso & Giovanni Giallombardo & Giovanna Miglionico, 2020. "Essentials of numerical nonsmooth optimization," 4OR, Springer, vol. 18(1), pages 1-47, March.
    7. Chen, Yi-Ting & Sun, Edward W. & Lin, Yi-Bing, 2020. "Merging anomalous data usage in wireless mobile telecommunications: Business analytics with a strategy-focused data-driven approach for sustainability," European Journal of Operational Research, Elsevier, vol. 281(3), pages 687-705.
    8. Kaisa Joki & Adil M. Bagirov & Napsu Karmitsa & Marko M. Mäkelä, 2017. "A proximal bundle method for nonsmooth DC optimization utilizing nonconvex cutting planes," Journal of Global Optimization, Springer, vol. 68(3), pages 501-535, July.
    9. Manlio Gaudioso & Giovanni Giallombardo & Giovanna Miglionico, 2022. "Essentials of numerical nonsmooth optimization," Annals of Operations Research, Springer, vol. 314(1), pages 213-253, July.
    10. A. M. Bagirov & B. Karasözen & M. Sezer, 2008. "Discrete Gradient Method: Derivative-Free Method for Nonsmooth Optimization," Journal of Optimization Theory and Applications, Springer, vol. 137(2), pages 317-334, May.
    11. Outi Montonen & Kaisa Joki, 2018. "Bundle-based descent method for nonsmooth multiobjective DC optimization with inequality constraints," Journal of Global Optimization, Springer, vol. 72(3), pages 403-429, November.
    12. Zhu, Shan & Hu, Xiangpei & Huang, Kai & Yuan, Yufei, 2021. "Optimization of product category allocation in multiple warehouses to minimize splitting of online supermarket customer orders," European Journal of Operational Research, Elsevier, vol. 290(2), pages 556-571.
    13. Manlio Gaudioso & Giovanni Giallombardo & Giovanna Miglionico, 2018. "Minimizing Piecewise-Concave Functions Over Polyhedra," Mathematics of Operations Research, INFORMS, vol. 43(2), pages 580-597, May.
    14. Welington Oliveira, 2019. "Proximal bundle methods for nonsmooth DC programming," Journal of Global Optimization, Springer, vol. 75(2), pages 523-563, October.
    15. Ville-Pekka Eronen & Jan Kronqvist & Tapio Westerlund & Marko M. Mäkelä & Napsu Karmitsa, 2017. "Method for solving generalized convex nonsmooth mixed-integer nonlinear programming problems," Journal of Global Optimization, Springer, vol. 69(2), pages 443-459, October.
    16. Angel Juan & Javier Faulin & Albert Ferrer & Helena Lourenço & Barry Barrios, 2013. "MIRHA: multi-start biased randomization of heuristics with adaptive local search for solving non-smooth routing problems," TOP: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 21(1), pages 109-132, April.
    17. Olivera Janković & Stefan Mišković & Zorica Stanimirović & Raca Todosijević, 2017. "Novel formulations and VNS-based heuristics for single and multiple allocation p-hub maximal covering problems," Annals of Operations Research, Springer, vol. 259(1), pages 191-216, December.
    18. Nader Kanzi & Majid Soleimani-damaneh, 2020. "Characterization of the weakly efficient solutions in nonsmooth quasiconvex multiobjective optimization," Journal of Global Optimization, Springer, vol. 77(3), pages 627-641, July.
    19. Olivier Morand & Kevin Reffett & Suchismita Tarafdar, 2018. "Generalized Envelope Theorems: Applications to Dynamic Programming," Journal of Optimization Theory and Applications, Springer, vol. 176(3), pages 650-687, March.
    20. Masmoudi, Mohamed Amine & Hosny, Manar & Demir, Emrah & Genikomsakis, Konstantinos N. & Cheikhrouhou, Naoufel, 2018. "The dial-a-ride problem with electric vehicles and battery swapping stations," Transportation Research Part E: Logistics and Transportation Review, Elsevier, vol. 118(C), pages 392-420.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:ejores:v:263:y:2017:i:2:p:367-379. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/eor .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.