IDEAS home Printed from https://ideas.repec.org/a/rge/journl/v3y2015i2p1-12.html
   My bibliography  Save this article

Algoritmo incremental de agrupamiento con traslape para el procesamiento de grandes colecciones de datos (Overlapping clustering incremental algorithm for large data collections processing)

Author

Listed:
  • Lázaro Janier González-Soler

    (Centro de Aplicaciones de Tecnologías de Avanzada)

  • Airel Pérez-Suárez

    (Centro de Aplicaciones de Tecnologías de Avanzada)

  • Leonardo Chang-Fernández

    (Centro de Aplicaciones de Tecnologías de Avanzada)

Abstract

Resumen Existen diversos problemas en el Reconocimiento de Patrones y en la Minería de Datos que, por su naturaleza, consideran que los objetos pueden pertenecer a más de una clase o grupo. DClustR es un algoritmo dinámico de agrupamiento con traslape que ha mostrado, en tareas de agrupamiento de documentos, el mejor balance entre calidad de los grupos y eficiencia entre los algoritmos dinámicos de agrupamiento con traslape reportados en la literatura. A pesar de obtener buenos resultados, DClustR puede ser poco útil en aplicaciones que trabajen con grandes colecciones de documentos, debido a que tiene una complejidad computacional y a la cantidad de memoria que utiliza para el procesamiento de las colecciones. En este trabajo se presenta una versión paralela basada en GPU del algoritmo DClustR, llamada CUDA-DClus, para mejorar la eficiencia de DClustR en aplicaciones que lidien con largas colecciones de documentos. Los experimentos fueron realizados sobre varias colecciones estándares de documentos y en ellos se muestra el buen rendimiento de CUDA-DClus en términos de eficiencia y consumo de memoria. Abstract There are several problems in Pattern Recognition and Data Mining that, by its inherent nature, consider that the objects can belong to more than a class or cluster. DClustR is a dynamic overlapping clustering algorithm that has shown, in document clustering tasks, the best trade-off between cluster’s quality and efficiency among existing dynamic overlapping clustering algorithms. However, DClustR could be less useful when working in applications that deal with large data collections, due to its computational complexity and memory demanded for processing them. In this paper, a GPU-based parallel algorithm of DClustR, named CUDA-DClus is suggested to enhance DClustR efficiency in applications dealing with large data collections. The experimental phase conducted over various standard data collections showed that CUDA-Dclus provides good performance in terms of efficiency and memory consumption.

Suggested Citation

  • Lázaro Janier González-Soler & Airel Pérez-Suárez & Leonardo Chang-Fernández, 2015. "Algoritmo incremental de agrupamiento con traslape para el procesamiento de grandes colecciones de datos (Overlapping clustering incremental algorithm for large data collections processing)," Revista Internacional de Gestión del Conocimiento y la Tecnología (GECONTEC), Revista Internacional de Gestión del Conocimiento y la Tecnología (GECONTEC), vol. 3(2), pages 1-12.
  • Handle: RePEc:rge:journl:v:3:y:2015:i:2:p:1-12
    DOI: 10.5281/zenodo.7080817
    as

    Download full text from publisher

    File URL: https://gecontec.org/index.php/unesco/article/view/76/64
    File Function: Full text
    Download Restriction: no

    File URL: https://libkey.io/10.5281/zenodo.7080817?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    More about this item

    Keywords

    Agrupamiento; Agrupamiento con traslape; Computación en GPU; Minería de Datos; Clustering; Overlapping Clustering; GPU Computing; Data Mining;
    All these keywords.

    JEL classification:

    • L86 - Industrial Organization - - Industry Studies: Services - - - Information and Internet Services; Computer Software
    • M15 - Business Administration and Business Economics; Marketing; Accounting; Personnel Economics - - Business Administration - - - IT Management
    • O31 - Economic Development, Innovation, Technological Change, and Growth - - Innovation; Research and Development; Technological Change; Intellectual Property Rights - - - Innovation and Invention: Processes and Incentives
    • O32 - Economic Development, Innovation, Technological Change, and Growth - - Innovation; Research and Development; Technological Change; Intellectual Property Rights - - - Management of Technological Innovation and R&D
    • D8 - Microeconomics - - Information, Knowledge, and Uncertainty
    • D81 - Microeconomics - - Information, Knowledge, and Uncertainty - - - Criteria for Decision-Making under Risk and Uncertainty
    • D83 - Microeconomics - - Information, Knowledge, and Uncertainty - - - Search; Learning; Information and Knowledge; Communication; Belief; Unawareness

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:rge:journl:v:3:y:2015:i:2:p:1-12. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Dr. Luis Camilo Ortigueira Sánchez (email available below). General contact details of provider: https://www.gecontec.org .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.