IDEAS home Printed from https://ideas.repec.org/a/spr/jcomop/v25y2013i3d10.1007_s10878-011-9382-6.html
   My bibliography  Save this article

Clustering with or without the approximation

Author

Listed:
  • Frans Schalekamp

    (Tsinghua University)

  • Michael Yu

    (MIT)

  • Anke Zuylen

    (Tsinghua University
    Max Planck Institute for Informatics)

Abstract

We study algorithms for clustering data that were recently proposed by Balcan et al. (SODA’09: 19th Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1068–1077, 2009a) and that have already given rise to several follow-up papers. The input for the clustering problem consists of points in a metric space and a number k, specifying the desired number of clusters. The algorithms find a clustering that is provably close to a target clustering, provided that the instance has the “(1+α,ε)-property”, which means that the instance is such that all solutions to the k-median problem for which the objective value is at most (1+α) times the optimal objective value correspond to clusterings that misclassify at most an ε fraction of the points with respect to the target clustering. We investigate the theoretical and practical implications of their results. Our main contributions are as follows. First, we show that instances that have the (1+α,ε)-property and for which, additionally, the clusters in the target clustering are large, are easier than general instances: the algorithm proposed in Balcan et al. (SODA’09: 19th Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1068–1077, 2009a) is a constant factor approximation algorithm with an approximation guarantee that is better than the known hardness of approximation for general instances. Further, we show that it is NP-hard to check if an instance satisfies the (1+α,ε)-property for a given (α,ε); the algorithms in Balcan et al. (SODA’09: 19th Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1068–1077, 2009a) need such α and ε as input parameters, however. We propose ways to use their algorithms even if we do not know values of α and ε for which the assumption holds. Finally, we implement these methods and other popular methods, and test them on real world data sets. We find that on these data sets there are no α and ε so that the dataset has both (1+α,ε)-property and sufficiently large clusters in the target solution. For the general case where there are no assumptions about the cluster sizes, we show that on our data sets the performance guarantee proved by Balcan et a. (SODA’09: 19th Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1068–1077, 2009a) is meaningless for the values of α,ε for which the data set has the (1+α,ε)-property. The algorithm nonetheless gives reasonable results, although it is outperformed by other methods.

Suggested Citation

  • Frans Schalekamp & Michael Yu & Anke Zuylen, 2013. "Clustering with or without the approximation," Journal of Combinatorial Optimization, Springer, vol. 25(3), pages 393-429, April.
  • Handle: RePEc:spr:jcomop:v:25:y:2013:i:3:d:10.1007_s10878-011-9382-6
    DOI: 10.1007/s10878-011-9382-6
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s10878-011-9382-6
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s10878-011-9382-6?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Beasley, J. E., 1985. "A note on solving large p-median problems," European Journal of Operational Research, Elsevier, vol. 21(2), pages 270-273, August.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. S Salhi & A Al-Khedhairi, 2010. "Integrating heuristic information into exact methods: The case of the vertex p-centre problem," Journal of the Operational Research Society, Palgrave Macmillan;The OR Society, vol. 61(11), pages 1619-1631, November.
    2. Patricia Domínguez-Marín & Stefan Nickel & Pierre Hansen & Nenad Mladenović, 2005. "Heuristic Procedures for Solving the Discrete Ordered Median Problem," Annals of Operations Research, Springer, vol. 136(1), pages 145-173, April.
    3. Alberto Ceselli & Federico Liberatore & Giovanni Righini, 2009. "A computational evaluation of a general branch-and-price framework for capacitated network location problems," Annals of Operations Research, Springer, vol. 167(1), pages 209-251, March.
    4. Joshua Q. Hale & Enlu Zhou & Jiming Peng, 2017. "A Lagrangian search method for the P-median problem," Journal of Global Optimization, Springer, vol. 69(1), pages 137-156, September.
    5. Blanco, Víctor & Gázquez, Ricardo & Ponce, Diego & Puerto, Justo, 2023. "A branch-and-price approach for the continuous multifacility monotone ordered median problem," European Journal of Operational Research, Elsevier, vol. 306(1), pages 105-126.
    6. Soheil Davari, 2019. "The incremental cooperative design of preventive healthcare networks," Annals of Operations Research, Springer, vol. 272(1), pages 445-492, January.
    7. Dowsland, Kathryn A. & Soubeiga, Eric & Burke, Edmund, 2007. "A simulated annealing based hyperheuristic for determining shipper sizes for storage and transportation," European Journal of Operational Research, Elsevier, vol. 179(3), pages 759-774, June.
    8. Fathali, J. & Kakhki, H. Taghizadeh, 2006. "Solving the p-median problem with pos/neg weights by variable neighborhood search and some results for special cases," European Journal of Operational Research, Elsevier, vol. 170(2), pages 440-462, April.
    9. Filipe Rodrigues & Agostinho Agra & Lars Magnus Hvattum & Cristina Requejo, 2021. "Weighted proximity search," Journal of Heuristics, Springer, vol. 27(3), pages 459-496, June.
    10. Mauricio Resende & Renato Werneck, 2007. "A fast swap-based local search procedure for location problems," Annals of Operations Research, Springer, vol. 150(1), pages 205-230, March.
    11. Daniel Martins & Gabriel M. Vianna & Isabel Rosseti & Simone L. Martins & Alexandre Plastino, 2018. "Making a state-of-the-art heuristic faster with data mining," Annals of Operations Research, Springer, vol. 263(1), pages 141-162, April.
    12. Mladenovic, Nenad & Brimberg, Jack & Hansen, Pierre & Moreno-Perez, Jose A., 2007. "The p-median problem: A survey of metaheuristic approaches," European Journal of Operational Research, Elsevier, vol. 179(3), pages 927-939, June.
    13. Alcaraz, Javier & Landete, Mercedes & Monge, Juan F., 2012. "Design and analysis of hybrid metaheuristics for the Reliability p-Median Problem," European Journal of Operational Research, Elsevier, vol. 222(1), pages 54-64.
    14. Caruso, C. & Colorni, A. & Aloi, L., 2003. "Dominant, an algorithm for the p-center problem," European Journal of Operational Research, Elsevier, vol. 149(1), pages 53-64, August.
    15. Carrizosa, Emilio & Mladenović, Nenad & Todosijević, Raca, 2013. "Variable neighborhood search for minimum sum-of-squares clustering on networks," European Journal of Operational Research, Elsevier, vol. 230(2), pages 356-363.
    16. Hansen, Pierre & Mladenovic, Nenad, 2001. "Variable neighborhood search: Principles and applications," European Journal of Operational Research, Elsevier, vol. 130(3), pages 449-467, May.
    17. P N Ram Kumar & T T Narendran, 2011. "On the usage of Lagrangean Relaxation for the convoy movement problem," Journal of the Operational Research Society, Palgrave Macmillan;The OR Society, vol. 62(4), pages 722-728, April.
    18. Caleb C. Fast & Illya V. Hicks, 2017. "A Branch Decomposition Algorithm for the p -Median Problem," INFORMS Journal on Computing, INFORMS, vol. 29(3), pages 474-488, August.
    19. Olender, Paweł & Ogryczak, Włodzimierz, 2019. "A revised Variable Neighborhood Search for the Discrete Ordered Median Problem," European Journal of Operational Research, Elsevier, vol. 274(2), pages 445-465.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:jcomop:v:25:y:2013:i:3:d:10.1007_s10878-011-9382-6. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.