IDEAS home Printed from https://ideas.repec.org/a/inm/orijoc/v34y2022i3p1587-1605.html
   My bibliography  Save this article

Efficient Algebraic Multigrid Methods for Multilevel Overlapping Coclustering of User-Item Relationships

Author

Listed:
  • Haifeng Xu

    (Department of Computer Science, University of Virginia, Charlottesville, Virginia 22903)

  • Rasha F. Kashef

    (Electrical, Computer, and Biomedical Engineering Department, Ryerson University, Toronto, Ontario M5B 2K3, Canada)

  • Hans De Sterck

    (Department of Applied Mathematics, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada)

  • Geoffrey Sanders

    (Center for Applied Scientific Computing, Lawrence Livermore National Laboratory, Livermore, California 94550)

Abstract

Various digital data sets that encode user-item relationships contain a multilevel overlapping cluster structure. The user-item relation can be encoded in a weighted bipartite graph and uncovering these overlapping coclusters of users and items at multiple levels in the bipartite graph can play an important role in analyzing user-item data in many applications. For example, for effective online marketing, such as placing online ads or deploying smart online marketing strategies, identifying co-occurring clusters of users and items can lead to accurately targeted advertisements and better marketing outcomes. In this paper, we propose fast algorithms inspired by algebraic multigrid methods for finding multilevel overlapping cocluster structures of feature matrices that encode user-item relations. Starting from the weighted bipartite graph structure of the feature matrix, the algorithms use agglomeration procedures to recursively coarsen the bipartite graphs that represent the relations between the coclusters on increasingly coarser levels. New fast coarsening routines are described that circumvent the bottleneck of all-to-all similarity computations by exploiting measures of direct connection strength between row and column variables in the feature matrix. Providing accurate coclusters at multiple levels in a manner that can scale to large data sets is a challenging task. In this paper, we propose heuristic algorithms that approximately and recursively minimize normalized cuts to obtain coclusters in the aggregated bipartite graphs on multiple levels of resolution. Whereas the main novelty and focus of the paper lies in algorithmic aspects of reducing computational complexity to obtain scalable methods specifically for large rectangular user-item matrices, the algorithmic variants also define several new models for determining multilevel coclusters that we justify intuitively by relating them to principles that underlie collaborative filtering methods for user-item relationships. Experimental results show that the proposed algorithms successfully uncover the multilevel overlapping cluster structure for artificial and real data sets. Summary of Contribution: This paper develops new and efficient computational methods for finding the multilevel overlapping cocluster structure of feature matrices that encode user-item relationships. We base our approach on the use of pairwise similarity measures between features, seeking clusters of points that are similar to each other and dissimilar from the points outside the cluster. We approximately solve the problem of finding optimal overlapping coclusters on multiple levels by employing a framework that is based on efficient multilevel methods that have been used previously to solve sparse linear systems and to cluster graphs. Our main contribution is that we extend these methods in efficient manners to find coclusters in the bipartite graphs that encode common and important user-item relationships or social network relations. The novel methods that we propose are inherently scalable to large problem sizes and are naturally able to uncover overlapping coclusters at multiple levels, whereas existing methods generally only find coclusters at the fine level. We illustrate the algorithm and its performance on some standard test problems from the literature and on a proof-of-concept real-world data set that relates LinkedIn users to their skills and expertise.

Suggested Citation

  • Haifeng Xu & Rasha F. Kashef & Hans De Sterck & Geoffrey Sanders, 2022. "Efficient Algebraic Multigrid Methods for Multilevel Overlapping Coclustering of User-Item Relationships," INFORMS Journal on Computing, INFORMS, vol. 34(3), pages 1587-1605, May.
  • Handle: RePEc:inm:orijoc:v:34:y:2022:i:3:p:1587-1605
    DOI: 10.1287/ijoc.2021.1137
    as

    Download full text from publisher

    File URL: http://dx.doi.org/10.1287/ijoc.2021.1137
    Download Restriction: no

    File URL: https://libkey.io/10.1287/ijoc.2021.1137?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Gong, Tingnan & Zhang, Weiping & Chen, Yu, 2023. "Uncovering block structures in large rectangular matrices," Journal of Multivariate Analysis, Elsevier, vol. 198(C).

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:inm:orijoc:v:34:y:2022:i:3:p:1587-1605. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Asher (email available below). General contact details of provider: https://edirc.repec.org/data/inforea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.