IDEAS home Printed from https://ideas.repec.org/a/apb/jaterr/2017p108-116.html
   My bibliography  Save this article

Analysis of real-time data with spark streaming

Author

Listed:
  • Nikitha Johnsirani Venkatesan

    (School of Electrical and Computer Engineering, Sungkyunkwan University, Suwon, South Korea)

  • Choon Sung Nam

    (School of Electrical and Computer Engineering, Sungkyunkwan University, Suwon, South Korea)

  • Earl Kim

    (School of Electrical and Computer Engineering, Sungkyunkwan University, Suwon, South Korea)

  • Dong Ryeol Shin

    (School of Electrical and Computer Engineering, Sungkyunkwan University, Suwon, South Korea)

Abstract

Data analysis in real-world application domains is a very challenging issue. For example, Thousand Gigabytes of multimedia data gets poured into Social media each and every minute. Since social media and most of the organizations are dealing with Big Data, tools like Hadoop and Spark system is more appropriate for dealing with those data. Hadoop and Map Reduce analyze the data only in batch mode. This makes it difficult for the real-time analysis because it increases latency. In order to solve the above problem, we used Spark streaming to do real-time data analysis. Spark streaming helps to iterate through the data much faster due to its in-memory processing. This paper presents an online machine learning system for real-time data. Using Spark streaming, data from online messaging system is streamed into the local system. Streaming K-means algorithm is applied to cluster the different languages of the people from various countries. Results show that predictions of the incoming data is accurate and fast the when Apache spark is used. Our results and methods are compared with other articles which have used spark streaming for real-time data processing. Queries like total word count and segregation based on keywords are done and the results are presented. The data are then stored in the local disk for future querying process.

Suggested Citation

  • Nikitha Johnsirani Venkatesan & Choon Sung Nam & Earl Kim & Dong Ryeol Shin, 2017. "Analysis of real-time data with spark streaming," Journal of Advances in Technology and Engineering Research, A/Professor Akbar A. Khatibi, vol. 3(4), pages 108-116.
  • Handle: RePEc:apb:jaterr:2017:p:108-116
    DOI: 10.20474/jater-3.4.1
    as

    Download full text from publisher

    File URL: https://tafpublications.com/platform/Articles/full-jater3.4.1.php
    Download Restriction: no

    File URL: https://tafpublications.com/gip_content/paper/jater-3.4.1.pdf
    Download Restriction: no

    File URL: https://libkey.io/10.20474/jater-3.4.1?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Feng-Jen Yang, 2016. "The User Interface Design of an Intelligent Tutoring System for Relational Database Schema Normalization," International Journal of Technology and Engineering Studies, PROF.IR.DR.Mohid Jailani Mohd Nor, vol. 2(3), pages 70-75.
    2. J. A. Hartigan & M. A. Wong, 1979. "A K‐Means Clustering Algorithm," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 28(1), pages 100-108, March.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Yuan-Hsun Liao & Hsiao-Hui Li & Zong-Han Xie, 2019. "Study of Virtual Reality and IoT for Exploring the Deep Sea," Journal of ICT, Design, Engineering and Technological Science, Juhriyansyah Dalle, vol. 3(1), pages 11-14.
    2. Mohd Hilmi Othman & Arif Izzudin Muhammad & Sulaiman Hassan, 2018. "Investigation of Clothes Recycling as Colouring Agent for Polypropylene-Nanoclay Nanocomposites," International Journal of Technology and Engineering Studies, PROF.IR.DR.Mohid Jailani Mohd Nor, vol. 4(1), pages 1-6.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Carlos Carrasco-Farré, 2022. "The fingerprints of misinformation: how deceptive content differs from reliable sources in terms of cognitive effort and appeal to emotions," Palgrave Communications, Palgrave Macmillan, vol. 9(1), pages 1-18, December.
    2. Felix Mbuga & Cristina Tortora, 2021. "Spectral Clustering of Mixed-Type Data," Stats, MDPI, vol. 5(1), pages 1-11, December.
    3. Zhang, Weibin & Zha, Huazhu & Zhang, Shuai & Ma, Lei, 2023. "Road section traffic flow prediction method based on the traffic factor state network," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 618(C).
    4. Michal Bernardelli & Zbigniew Korzeb & Pawel Niedziolka, 2021. "The banking sector as the absorber of the COVID-19 crisis’ economic consequences: perception of WSE investors," Oeconomia Copernicana, Institute of Economic Research, vol. 12(2), pages 335-374, June.
    5. Jelle R Dalenberg & Luca Nanetti & Remco J Renken & René A de Wijk & Gert J ter Horst, 2014. "Dealing with Consumer Differences in Liking during Repeated Exposure to Food; Typical Dynamics in Rating Behavior," PLOS ONE, Public Library of Science, vol. 9(3), pages 1-11, March.
    6. Custodio João, Igor & Lucas, André & Schaumburg, Julia & Schwaab, Bernd, 2023. "Dynamic clustering of multivariate panel data," Journal of Econometrics, Elsevier, vol. 237(2).
    7. Carlos Fernández-Hernández & Carmelo J. León & Jorge E. Araña & Flora Díaz-Pére, 2016. "Market segmentation, activities and environmental behaviour in rural tourism," Tourism Economics, , vol. 22(5), pages 1033-1054, October.
    8. Juvie Pauline L. Relacion, 2017. "Patient Management Information System for the University of the Immaculate Conception College Department Clinic," International Journal of Technology and Engineering Studies, PROF.IR.DR.Mohid Jailani Mohd Nor, vol. 3(5), pages 213-223.
    9. Hafid Kadi & Mohammed Rebbah & Boudjelal Meftah & Olivier Lézoray, 2021. "A Data Representation Model for Personalized Medicine," International Journal of Healthcare Information Systems and Informatics (IJHISI), IGI Global, vol. 16(4), pages 1-25, October.
    10. Zhang, Tonglin & Lin, Ge, 2021. "Generalized k-means in GLMs with applications to the outbreak of COVID-19 in the United States," Computational Statistics & Data Analysis, Elsevier, vol. 159(C).
    11. Andreas Lackner & Michael Müller & Magdalena Gamperl & Delyana Stoeva & Olivia Langmann & Henrieta Papuchova & Elisabeth Roitinger & Gerhard Dürnberger & Richard Imre & Karl Mechtler & Paulina A. Lato, 2023. "The Fgf/Erf/NCoR1/2 repressive axis controls trophoblast cell fate," Nature Communications, Nature, vol. 14(1), pages 1-20, December.
    12. Utkarsh J. Dang & Michael P.B. Gallaugher & Ryan P. Browne & Paul D. McNicholas, 2023. "Model-Based Clustering and Classification Using Mixtures of Multivariate Skewed Power Exponential Distributions," Journal of Classification, Springer;The Classification Society, vol. 40(1), pages 145-167, April.
    13. Beibei Yu & Zhonghui Wang & Haowei Mu & Li Sun & Fengning Hu, 2019. "Identification of Urban Functional Regions Based on Floating Car Track Data and POI Data," Sustainability, MDPI, vol. 11(23), pages 1-18, November.
    14. Liguo Fei & Jun Xia & Yuqiang Feng & Luning Liu, 2019. "A novel method to determine basic probability assignment in Dempster–Shafer theory and its application in multi-sensor information fusion," International Journal of Distributed Sensor Networks, , vol. 15(7), pages 15501477198, July.
    15. Bernd Scherer & Diogo Judice & Stephan Kessler, 2010. "Price reversals in global equity markets," Journal of Asset Management, Palgrave Macmillan, vol. 11(5), pages 332-345, December.
    16. Ugofilippo Basellini & Carlo Giovanni Camarda, 2020. "Modelling COVID-19 mortality at the regional level in Italy," Working Papers axq0sudakgkzhr-blecv, French Institute for Demographic Studies.
    17. Andrew Webb, 1997. "Radial basis functions for exploratory data analysis: An iterative majorisation approach for Minkowski distances based on multidimensional scaling," Journal of Classification, Springer;The Classification Society, vol. 14(2), pages 249-267, September.
    18. Jianzhong Ma & Christopher I Amos, 2012. "Investigation of Inversion Polymorphisms in the Human Genome Using Principal Components Analysis," PLOS ONE, Public Library of Science, vol. 7(7), pages 1-12, July.
    19. Annah Vimbai Bengesai & Evelyn Derera, 2021. "The Association Between Women Empowerment and Emotional Violence in Zimbabwe: A Cluster Analysis Approach," SAGE Open, , vol. 11(2), pages 21582440211, June.
    20. Urmeneta, Jon & Izquierdo, Juan & Leturiondo, Urko, 2023. "A methodology for performance assessment at system level—Identification of operating regimes and anomaly detection in wind turbines," Renewable Energy, Elsevier, vol. 205(C), pages 281-292.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:apb:jaterr:2017:p:108-116. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: A/Professor Akbar A. Khatibi (email available below). General contact details of provider: https://tafpublications.com/platform/published_papers/10 .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.