IDEAS home Printed from https://ideas.repec.org/a/inm/orijoc/v34y2022i1p481-502.html
   My bibliography  Save this article

On Cluster-Aware Supervised Learning: Frameworks, Convergent Algorithms, and Applications

Author

Listed:
  • Shutong Chen

    (School of Business and Management, Donghua University, 200051 Shanghai, China)

  • Weijun Xie

    (Department of Industrial and Systems Engineering, Virginia Tech, Blacksburg, Virginia 24061)

Abstract

This paper proposes a cluster-aware supervised learning (CluSL) framework, which integrates the clustering analysis with supervised learning. The objective of CluSL is to simultaneously find the best clusters of the data points and minimize the sum of loss functions within each cluster. This framework has many potential applications in healthcare, operations management, manufacturing, and so on. Because CluSL, in general, is nonconvex, we develop a regularized alternating minimization (RAM) algorithm to solve it, where at each iteration, we penalize the distance between the current clustering solution and the one from the previous iteration. By choosing a proper penalty function, we show that each iteration of the RAM algorithm can be computed efficiently. We further prove that the proposed RAM algorithm will always converge to a stationary point within a finite number of iterations. This is the first known convergence result in cluster-aware learning literature. Furthermore, we extend CluSL to the high-dimensional data sets, termed the F-CluSL framework. In F-CluSL, we cluster features and minimize loss function at the same time. Similarly, to solve F-CluSL, a variant of the RAM algorithm (i.e., F-RAM) is developed and proven to be convergent to an ∈ -stationary point. Our numerical studies demonstrate that the proposed CluSL and F-CluSL can outperform the existing ones such as random forests and support vector classification, both in the interpretability of learning results and in prediction accuracy. Summary of Contribution : Aligned with the mission and scope of the INFORMS Journal on Computing , this paper proposes a cluster-aware supervised learning (CluSL) framework, which integrates the clustering analysis with supervised learning. Because CluSL is, in general, nonconvex, a regularized alternating projection algorithm is developed to solve it and is proven to always find a stationary solution. We further generalize the framework to the high-dimensional data set, F-CluSL. Our numerical studies demonstrate that the proposed CluSL and F-CluSL can deliver more interpretable learning results and outperform the existing ones such as random forests and support vector classification in computational time and prediction accuracy.

Suggested Citation

  • Shutong Chen & Weijun Xie, 2022. "On Cluster-Aware Supervised Learning: Frameworks, Convergent Algorithms, and Applications," INFORMS Journal on Computing, INFORMS, vol. 34(1), pages 481-502, January.
  • Handle: RePEc:inm:orijoc:v:34:y:2022:i:1:p:481-502
    DOI: 10.1287/ijoc.2020.1053
    as

    Download full text from publisher

    File URL: http://dx.doi.org/10.1287/ijoc.2020.1053
    Download Restriction: no

    File URL: https://libkey.io/10.1287/ijoc.2020.1053?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Adil M. Bagirov & Julien Ugon & Hijran G. Mirzayeva, 2015. "Nonsmooth Optimization Algorithm for Solving Clusterwise Linear Regression Problems," Journal of Optimization Theory and Applications, Springer, vol. 164(3), pages 755-780, March.
    2. Young Woong Park & Yan Jiang & Diego Klabjan & Loren Williams, 2017. "Algorithms for Generalized Clusterwise Linear Regression," INFORMS Journal on Computing, INFORMS, vol. 29(2), pages 301-317, May.
    3. Emilie Devijver, 2017. "Model-based regression clustering for high-dimensional data: application to functional data," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 11(2), pages 243-279, June.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Joki, Kaisa & Bagirov, Adil M. & Karmitsa, Napsu & Mäkelä, Marko M. & Taheri, Sona, 2020. "Clusterwise support vector linear regression," European Journal of Operational Research, Elsevier, vol. 287(1), pages 19-35.
    2. Napsu Karmitsa, 2016. "Testing Different Nonsmooth Formulations of the Lennard–Jones Potential in Atomic Clustering Problems," Journal of Optimization Theory and Applications, Springer, vol. 171(1), pages 316-335, October.
    3. Rodney V. Fonseca & Aluísio Pinheiro, 2020. "Wavelet estimation of the dimensionality of curve time series," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 72(5), pages 1175-1204, October.
    4. M. Golestani & H. Sadeghi & Y. Tavan, 2018. "Nonsmooth Multiobjective Problems and Generalized Vector Variational Inequalities Using Quasi-Efficiency," Journal of Optimization Theory and Applications, Springer, vol. 179(3), pages 896-916, December.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:inm:orijoc:v:34:y:2022:i:1:p:481-502. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Asher (email available below). General contact details of provider: https://edirc.repec.org/data/inforea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.