IDEAS home Printed from https://ideas.repec.org/a/sae/vikjou/v43y2018i4p179-190.html
   My bibliography  Save this article

Application of Multivariate-Rank-Based Techniques in Clustering of Big Data

Author

Listed:
  • Pritha Guha

Abstract

Executive Summary Very large or complex data sets, which are difficult to process or analyse using traditional data handling techniques, are usually referred to as big data. The idea of big data is characterized by the three ‘v’s which are volume , velocity , and variety ( Liu, McGree, Ge, & Xie, 2015 ) referring respectively to the volume of data, the velocity at which the data are processed and the wide varieties in which big data are available. Every single day, different sectors such as credit risk management, healthcare, media, retail, retail banking, climate prediction, DNA analysis and, sports generate petabytes of data (1 petabyte = 250 bytes). Even basic handling of big data, therefore, poses significant challenges, one of them being organizing the data in such a way that it can give better insights into analysing and decision-making. With the explosion of data in our life, it has become very important to use statistical tools to analyse them.

Suggested Citation

  • Pritha Guha, 2018. "Application of Multivariate-Rank-Based Techniques in Clustering of Big Data," Vikalpa: The Journal for Decision Makers, , vol. 43(4), pages 179-190, December.
  • Handle: RePEc:sae:vikjou:v:43:y:2018:i:4:p:179-190
    DOI: 10.1177/0256090918804385
    as

    Download full text from publisher

    File URL: https://journals.sagepub.com/doi/10.1177/0256090918804385
    Download Restriction: no

    File URL: https://libkey.io/10.1177/0256090918804385?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Robert Tibshirani & Guenther Walther & Trevor Hastie, 2001. "Estimating the number of clusters in a data set via the gap statistic," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 63(2), pages 411-423.
    2. Anil K. Ghosh & Probal Chaudhuri, 2005. "On Maximum Depth and Related Classifiers," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 32(2), pages 327-350, June.
    3. J. Gower & P. Legendre, 1986. "Metric and Euclidean properties of dissimilarity coefficients," Journal of Classification, Springer;The Classification Society, vol. 3(1), pages 5-48, March.
    4. Fraley C. & Raftery A.E., 2002. "Model-Based Clustering, Discriminant Analysis, and Density Estimation," Journal of the American Statistical Association, American Statistical Association, vol. 97, pages 611-631, June.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Arezoo Ghazanfari, 2022. "What Drives Petrol Price Dispersion across Australian Cities?," Energies, MDPI, vol. 15(16), pages 1-24, August.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Henner Gimpel & Daniel Rau & Maximilian Röglinger, 2018. "Understanding FinTech start-ups – a taxonomy of consumer-oriented service offerings," Electronic Markets, Springer;IIM University of St. Gallen, vol. 28(3), pages 245-264, August.
    2. Julian Rossbroich & Jeffrey Durieux & Tom F. Wilderjans, 2022. "Model Selection Strategies for Determining the Optimal Number of Overlapping Clusters in Additive Overlapping Partitional Clustering," Journal of Classification, Springer;The Classification Society, vol. 39(2), pages 264-301, July.
    3. Jonathon J. O’Brien & Michael T. Lawson & Devin K. Schweppe & Bahjat F. Qaqish, 2020. "Suboptimal Comparison of Partitions," Journal of Classification, Springer;The Classification Society, vol. 37(2), pages 435-461, July.
    4. Gallegos, María Teresa & Ritter, Gunter, 2010. "Using combinatorial optimization in model-based trimmed clustering with cardinality constraints," Computational Statistics & Data Analysis, Elsevier, vol. 54(3), pages 637-654, March.
    5. Thiemo Fetzer & Samuel Marden, 2017. "Take What You Can: Property Rights, Contestability and Conflict," Economic Journal, Royal Economic Society, vol. 0(601), pages 757-783, May.
    6. Guohuan Su & Adam Mertel & Sébastien Brosse & Justin M. Calabrese, 2023. "Species invasiveness and community invasibility of North American freshwater fish fauna revealed via trait-based analysis," Nature Communications, Nature, vol. 14(1), pages 1-12, December.
    7. Daniel Agness & Travis Baseler & Sylvain Chassang & Pascaline Dupas & Erik Snowberg, 2022. "Valuing the Time of the Self-Employed," Working Papers 2022-2, Princeton University. Economics Department..
    8. Batool, Fatima & Hennig, Christian, 2021. "Clustering with the Average Silhouette Width," Computational Statistics & Data Analysis, Elsevier, vol. 158(C).
    9. Nicoleta Serban & Huijing Jiang, 2012. "Multilevel Functional Clustering Analysis," Biometrics, The International Biometric Society, vol. 68(3), pages 805-814, September.
    10. Orietta Nicolis & Jean Paul Maidana & Fabian Contreras & Danilo Leal, 2024. "Analyzing the Impact of COVID-19 on Economic Sustainability: A Clustering Approach," Sustainability, MDPI, vol. 16(4), pages 1-30, February.
    11. Li, Pai-Ling & Chiou, Jeng-Min, 2011. "Identifying cluster number for subspace projected functional data clustering," Computational Statistics & Data Analysis, Elsevier, vol. 55(6), pages 2090-2103, June.
    12. Pourahmadi, Mohsen & Daniels, Michael J. & Park, Trevor, 2007. "Simultaneous modelling of the Cholesky decomposition of several covariance matrices," Journal of Multivariate Analysis, Elsevier, vol. 98(3), pages 568-587, March.
    13. Yaeji Lim & Hee-Seok Oh & Ying Kuen Cheung, 2019. "Multiscale Clustering for Functional Data," Journal of Classification, Springer;The Classification Society, vol. 36(2), pages 368-391, July.
    14. Forzani, Liliana & Gieco, Antonella & Tolmasky, Carlos, 2017. "Likelihood ratio test for partial sphericity in high and ultra-high dimensions," Journal of Multivariate Analysis, Elsevier, vol. 159(C), pages 18-38.
    15. la Grange, Anthony & le Roux, Niël & Gardner-Lubbe, Sugnet, 2009. "BiplotGUI: Interactive Biplots in R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 30(i12).
    16. Victor Chernozhukov & Alfred Galichon & Marc Hallin & Marc Henry, 2014. "Monge-Kantorovich Depth, Quantiles, Ranks, and Signs," Papers 1412.8434, arXiv.org, revised Sep 2015.
    17. Yujia Li & Xiangrui Zeng & Chien‐Wei Lin & George C. Tseng, 2022. "Simultaneous estimation of cluster number and feature sparsity in high‐dimensional cluster analysis," Biometrics, The International Biometric Society, vol. 78(2), pages 574-585, June.
    18. Vojtech Blazek & Michal Petruzela & Tomas Vantuch & Zdenek Slanina & Stanislav Mišák & Wojciech Walendziuk, 2020. "The Estimation of the Influence of Household Appliances on the Power Quality in a Microgrid System," Energies, MDPI, vol. 13(17), pages 1-21, August.
    19. Stefano Tonellato & Andrea Pastore, 2013. "On the comparison of model-based clustering solutions," Working Papers 2013:05, Department of Economics, University of Venice "Ca' Foscari".
    20. Michael Brusco & J Dennis Cradit & Douglas Steinley, 2021. "A comparison of 71 binary similarity coefficients: The effect of base rates," PLOS ONE, Public Library of Science, vol. 16(4), pages 1-19, April.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:sae:vikjou:v:43:y:2018:i:4:p:179-190. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: SAGE Publications (email available below). General contact details of provider: .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.