IDEAS home Printed from https://ideas.repec.org/a/wsi/ijitdm/v17y2018i03ns0219622018500141.html
   My bibliography  Save this article

Unsupervised Learning from Multi-Dimensional Data: A Fast Clustering Algorithm Utilizing Canopies and Statistical Information

Author

Listed:
  • Giyasettin Ozcan

    (Department of Computer Engineering, Uludag University, Gorukle Kampusu, Bursa 16059, Turkey)

Abstract

In this study, we consider unsupervised learning from multi-dimensional dataset problem. Particularly, we consider k-means clustering which require long duration time during execution of multi-dimensional datasets. In order to speed up clustering in an accurate form, we introduce a new algorithm, that we term Canopy+. The algorithm utilizes canopies and statistical techniques. Also, its efficient initiation and normalization methodologies contributes to the improvement. Furthermore, we consider early termination cases of clustering computation, provided that an intermediate result of the computation is accurate enough. We compared our algorithm with four popular clustering algorithms. Results denote that our algorithm speeds up the clustering computation by at least 2X. Also, we analyzed the contribution of early termination. Results present that further 2X improvement can be obtained while incurring 0.1% error rate. We also observe that our Canopy+ algorithm benefits from early termination and introduces extra 1.2X performance improvement.

Suggested Citation

  • Giyasettin Ozcan, 2018. "Unsupervised Learning from Multi-Dimensional Data: A Fast Clustering Algorithm Utilizing Canopies and Statistical Information," International Journal of Information Technology & Decision Making (IJITDM), World Scientific Publishing Co. Pte. Ltd., vol. 17(03), pages 841-856, May.
  • Handle: RePEc:wsi:ijitdm:v:17:y:2018:i:03:n:s0219622018500141
    DOI: 10.1142/S0219622018500141
    as

    Download full text from publisher

    File URL: http://www.worldscientific.com/doi/abs/10.1142/S0219622018500141
    Download Restriction: Access to full text is restricted to subscribers

    File URL: https://libkey.io/10.1142/S0219622018500141?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Peng, Yi & Kou, Gang & Wang, Guoxun & Shi, Yong, 2011. "FAMCDM: A fusion approach of MCDM methods to rank multiclass classification algorithms," Omega, Elsevier, vol. 39(6), pages 677-689, December.
    2. Yi Peng & Gang Kou & Yong Shi & Zhengxin Chen, 2008. "A Descriptive Framework For The Field Of Data Mining And Knowledge Discovery," International Journal of Information Technology & Decision Making (IJITDM), World Scientific Publishing Co. Pte. Ltd., vol. 7(04), pages 639-682.
    3. Gang Kou & Yanqun Lu & Yi Peng & Yong Shi, 2012. "Evaluation Of Classification Algorithms Using Mcdm And Rank Correlation," International Journal of Information Technology & Decision Making (IJITDM), World Scientific Publishing Co. Pte. Ltd., vol. 11(01), pages 197-225.
    4. Anton Borg & Martin Boldt, 2016. "Clustering Residential Burglaries Using Modus Operandi and Spatiotemporal Information," International Journal of Information Technology & Decision Making (IJITDM), World Scientific Publishing Co. Pte. Ltd., vol. 15(01), pages 23-42, January.
    5. Baroudi Rouba & Safia Nait Bahloul, 2014. "A Multicriteria Clustering Approach Based on Similarity Indices and Clustering Ensemble Techniques," International Journal of Information Technology & Decision Making (IJITDM), World Scientific Publishing Co. Pte. Ltd., vol. 13(04), pages 811-837.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Ginger Saltos & Mihaela Cocea, 2017. "An Exploration of Crime Prediction Using Data Mining on Open Data," International Journal of Information Technology & Decision Making (IJITDM), World Scientific Publishing Co. Pte. Ltd., vol. 16(05), pages 1155-1181, September.
    2. P. D. Mahendhiran & S. Kannimuthu, 2018. "Deep Learning Techniques for Polarity Classification in Multimodal Sentiment Analysis," International Journal of Information Technology & Decision Making (IJITDM), World Scientific Publishing Co. Pte. Ltd., vol. 17(03), pages 883-910, May.
    3. Feyzan Arikan & Senay Citak, 2017. "Multiple Criteria Inventory Classification in an Electronics Firm," International Journal of Information Technology & Decision Making (IJITDM), World Scientific Publishing Co. Pte. Ltd., vol. 16(02), pages 315-331, March.
    4. Rahime Ceylan & Hasan Koyuncu, 2016. "A New Breakpoint in Hybrid Particle Swarm-Neural Network Architecture: Individual Boundary Adjustment," International Journal of Information Technology & Decision Making (IJITDM), World Scientific Publishing Co. Pte. Ltd., vol. 15(06), pages 1313-1343, November.
    5. O. H. Salman & A. A. Zaidan & B. B. Zaidan & Naserkalid & M. Hashim, 2017. "Novel Methodology for Triage and Prioritizing Using “Big Data” Patients with Chronic Heart Diseases Through Telemedicine Environmental," International Journal of Information Technology & Decision Making (IJITDM), World Scientific Publishing Co. Pte. Ltd., vol. 16(05), pages 1211-1245, September.
    6. Jianfeng Xu & Yuanjian Zhang & Peng Zhang & Azhar Mahmood & Yu Li & Shaheen Khatoon, 2017. "Data Mining on ICU Mortality Prediction Using Early Temporal Data: A Survey," International Journal of Information Technology & Decision Making (IJITDM), World Scientific Publishing Co. Pte. Ltd., vol. 16(01), pages 117-159, January.
    7. Si He & Nabil Belacel & Alan Chan & Habib Hamam & Yassine Bouslimani, 2016. "A Hybrid Artificial Fish Swarm Simulated Annealing Optimization Algorithm for Automatic Identification of Clusters," International Journal of Information Technology & Decision Making (IJITDM), World Scientific Publishing Co. Pte. Ltd., vol. 15(05), pages 949-974, September.
    8. Fenghua Wen & Xin Yang & Xu Gong & Kin Keung Lai, 2017. "Multi-Scale Volatility Feature Analysis and Prediction of Gold Price," International Journal of Information Technology & Decision Making (IJITDM), World Scientific Publishing Co. Pte. Ltd., vol. 16(01), pages 205-223, January.
    9. Małgorzata Przybyła-Kasperek, 2019. "Three Conflict Methods in Multiple Classifiers that Use Dispersed Knowledge," International Journal of Information Technology & Decision Making (IJITDM), World Scientific Publishing Co. Pte. Ltd., vol. 18(02), pages 555-599, March.
    10. Thierno M. L. Diallo & Sébastien Henry & Yacine Ouzrout & Abdelaziz Bouras, 2018. "Data-Based Fault Diagnosis Model Using a Bayesian Causal Analysis Framework," International Journal of Information Technology & Decision Making (IJITDM), World Scientific Publishing Co. Pte. Ltd., vol. 17(02), pages 583-620, March.
    11. Yi Peng, 2015. "Regional earthquake vulnerability assessment using a combination of MCDM methods," Annals of Operations Research, Springer, vol. 234(1), pages 95-110, November.
    12. Chun-Hao Chen & Tzung-Pei Hong & Yeong-Chyi Lee & Vincent S. Tseng, 2015. "Finding Active Membership Functions for Genetic-Fuzzy Data Mining," International Journal of Information Technology & Decision Making (IJITDM), World Scientific Publishing Co. Pte. Ltd., vol. 14(06), pages 1215-1242, November.
    13. Yen-Hao Hsieh & Soe-Tsyr Yuan, 2016. "Can Customer Expectations be Measured in Real Time?," International Journal of Information Technology & Decision Making (IJITDM), World Scientific Publishing Co. Pte. Ltd., vol. 15(01), pages 119-149, January.
    14. Daji Ergu & Gang Kou, 2012. "Questionnaire design improvement and missing item scores estimation for rapid and efficient decision making," Annals of Operations Research, Springer, vol. 197(1), pages 5-23, August.
    15. Roman Vavrek, 2019. "Evaluation of the Impact of Selected Weighting Methods on the Results of the TOPSIS Technique," International Journal of Information Technology & Decision Making (IJITDM), World Scientific Publishing Co. Pte. Ltd., vol. 18(06), pages 1821-1843, November.
    16. Peide Liu & Peng Wang, 2017. "Some Improved Linguistic Intuitionistic Fuzzy Aggregation Operators and Their Applications to Multiple-Attribute Decision Making," International Journal of Information Technology & Decision Making (IJITDM), World Scientific Publishing Co. Pte. Ltd., vol. 16(03), pages 817-850, May.
    17. N. Thillaigovindan & S. Anita Shanthi & J. Vadivel Naidu, 2016. "New Method for Solving a General Multiple Criteria Decision-Making Problem Under Risk in Fuzzy Environment," International Journal of Information Technology & Decision Making (IJITDM), World Scientific Publishing Co. Pte. Ltd., vol. 15(05), pages 1157-1179, September.
    18. Carmen De Maio & Aurelio Tommasetti & Orlando Troisi & Massimiliano Vesci & Giuseppe Fenza & Vincenzo Loia, 2016. "Contextual Fuzzy-Based Decision Support System Through Opinion Analysis: A Case Study at University of the Salerno," International Journal of Information Technology & Decision Making (IJITDM), World Scientific Publishing Co. Pte. Ltd., vol. 15(05), pages 923-948, September.
    19. Gang Kou & Wenshuai Wu, 2014. "Multi-criteria decision analysis for emergency medical service assessment," Annals of Operations Research, Springer, vol. 223(1), pages 239-254, December.
    20. Thomas L. Saaty & Daji Ergu, 2015. "When is a Decision-Making Method Trustworthy? Criteria for Evaluating Multi-Criteria Decision-Making Methods," International Journal of Information Technology & Decision Making (IJITDM), World Scientific Publishing Co. Pte. Ltd., vol. 14(06), pages 1171-1187, November.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:wsi:ijitdm:v:17:y:2018:i:03:n:s0219622018500141. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Tai Tone Lim (email available below). General contact details of provider: http://www.worldscinet.com/ijitdm/ijitdm.shtml .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.