IDEAS home Printed from https://ideas.repec.org/a/igg/jirr00/v2y2012i3p72-82.html
   My bibliography  Save this article

On Knowledge-Enhanced Document Clustering

Author

Listed:
  • Manjeet Rege

    (Rochester Institute of Technology, Rochester, NY, USA)

  • Josan Koruthu

    (Rochester Institute of Technology, Rochester, NY, USA)

  • Reynold Bailey

    (Rochester Institute of Technology, Rochester, NY, USA)

Abstract

Document clustering plays an important role in text analytics by finding natural groupings of documents based on their similarity determined by the words appearing in them. Many of the clustering algorithms accessible through various text analytics tools are completely unsupervised in nature. That is, they are unable to incorporate any domain knowledge that might be available about the documents to improve the clustering accuracy and relevance. The authors present a graph partitioning based semi-supervised document clustering algorithm. The user provides knowledge about few of the documents in the form of “must-link” and “cannot-link” constraints between pairs of documents. A “must-link” constraint between two documents expresses the fact that the user feels that the two corresponding documents must be clustered irrespective of their dissimilarity. Similarly, a “cannot-link” signifies that the two documents should never be clustered together no matter how similar they might happen to be. These constraints are then incorporated into a graph partitioning based into a computationally efficient document clustering algorithm. Through experiments performed on publicly available text datasets, the proposed framework is validated.

Suggested Citation

  • Manjeet Rege & Josan Koruthu & Reynold Bailey, 2012. "On Knowledge-Enhanced Document Clustering," International Journal of Information Retrieval Research (IJIRR), IGI Global, vol. 2(3), pages 72-82, July.
  • Handle: RePEc:igg:jirr00:v:2:y:2012:i:3:p:72-82
    as

    Download full text from publisher

    File URL: http://services.igi-global.com/resolvedoi/resolve.aspx?doi=10.4018/ijirr.2012070105
    Download Restriction: no
    ---><---

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:igg:jirr00:v:2:y:2012:i:3:p:72-82. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Journal Editor (email available below). General contact details of provider: https://www.igi-global.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.