IDEAS home Printed from https://ideas.repec.org/a/wsi/jikmxx/v17y2018i02ns0219649218500211.html
   My bibliography  Save this article

PDC-Transitive: An Enhanced Heuristic for Document Clustering Based on Relational Analysis Approach and Iterative MapReduce

Author

Listed:
  • Yasmine Lamari

    (Department of Computer Science, Faculty of Science of Rabat, Mohammed V University, 4 Avenue Ibn Battouta B. P. 1014 RP, Rabat, Morocco)

  • Said Chah Slaoui

    (Department of Computer Science, Faculty of Science of Rabat, Mohammed V University, 4 Avenue Ibn Battouta B. P. 1014 RP, Rabat, Morocco)

Abstract

Recently, MapReduce-based implementations of clustering algorithms have been developed to cope with the Big Data phenomenon, and they show promising results particularly for the document clustering problem. In this paper, we extend an efficient data partitioning method based on the relational analysis (RA) approach and applied to the document clustering problem, called PDC-Transitive. The introduced heuristic is parallelised using the MapReduce model iteratively and designed with a single reducer which represents a bottleneck when processing large data, we improved the design of the PDC-Transitive method to avoid the data dependencies and reduce the computation cost. Experiment results on benchmark datasets demonstrate that the enhanced heuristic yields better quality results and requires less computing time compared to the original method.

Suggested Citation

  • Yasmine Lamari & Said Chah Slaoui, 2018. "PDC-Transitive: An Enhanced Heuristic for Document Clustering Based on Relational Analysis Approach and Iterative MapReduce," Journal of Information & Knowledge Management (JIKM), World Scientific Publishing Co. Pte. Ltd., vol. 17(02), pages 1-18, June.
  • Handle: RePEc:wsi:jikmxx:v:17:y:2018:i:02:n:s0219649218500211
    DOI: 10.1142/S0219649218500211
    as

    Download full text from publisher

    File URL: http://www.worldscientific.com/doi/abs/10.1142/S0219649218500211
    Download Restriction: Access to full text is restricted to subscribers

    File URL: https://libkey.io/10.1142/S0219649218500211?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Gunasekaran Manogaran & Daphne Lopez, 2017. "Disease Surveillance System for Big Climate Data Processing and Dengue Transmission," International Journal of Ambient Computing and Intelligence (IJACI), IGI Global, vol. 8(2), pages 88-105, April.
    2. Arushi Jain & Vishal Bhatnagar, 2017. "Concoction of Ambient Intelligence and Big Data for Better Patient Ministration Services," International Journal of Ambient Computing and Intelligence (IJACI), IGI Global, vol. 8(4), pages 19-30, October.
    3. Hanan Al-Mofareji & Mahmoud Kamel & Mohamed Y. Dahab, 2017. "WeDoCWT: A New Method for Web Document Clustering Using Discrete Wavelet Transforms," Journal of Information & Knowledge Management (JIKM), World Scientific Publishing Co. Pte. Ltd., vol. 16(01), pages 1-19, March.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Saida Ishak Boushaki & Nadjet Kamel & Omar Bendjeghaba, 2018. "High-Dimensional Text Datasets Clustering Algorithm Based on Cuckoo Search and Latent Semantic Indexing," Journal of Information & Knowledge Management (JIKM), World Scientific Publishing Co. Pte. Ltd., vol. 17(03), pages 1-24, September.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:wsi:jikmxx:v:17:y:2018:i:02:n:s0219649218500211. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Tai Tone Lim (email available below). General contact details of provider: http://www.worldscinet.com/jikm/jikm.shtml .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.