A case study for performance analysis of big data stream classification using spark architecture

My bibliography Save this article

A case study for performance analysis of big data stream classification using spark architecture

Author

Listed:

B. Srivani
(JNTUH)
N. Sandhya
(VNRVJIET)
B. Padmaja Rani
(JNTUCEH)

Registered:

Abstract

A variety of huge data is being produced at an incredibly high speed in different sectors. Due to the large location of computing devices, the large volume of information is increasingly growing in the recent decades. A main role of big data is that a large set of data enables the machine learning techniques to obtain more accurate and better results. As the amount of data is exploding, it raises more challenges and opportunities for data analytic research in the data mining domain. The massively parallel databases not only have storage mechanisms but also have compute platforms. The extra capacity in the databases to really put some algorithms and move the data into in-memory to solve the problems. However, the big data stream contains different characteristics, such as high dimensionality, sparsity, volume and velocity. These characteristic features pose huge issues for the classification process when employing traditional data stream classification methods. For huge collection of data, effectively selecting the features and then classifying the data is important to make patterns. Recent feature selection strategies are involving the use of optimization methods for picking a subset of important features to get good classification results. Therefore, in this case study the feature selection is performed based on the Dragonfly Moth Search (DMS) optimization. The performance of the classification method is carried out in two different phases, such as offline and online phase by considering the master and slave node with stacked auto encoder (SAE) in the spark architecture. The parameters like accuracy, sensitivity and specificity metrics are evaluated on the performance of the DMS-SAE method.

Suggested Citation

B. Srivani & N. Sandhya & B. Padmaja Rani, 2024. "A case study for performance analysis of big data stream classification using spark architecture," International Journal of System Assurance Engineering and Management, Springer;The Society for Reliability, Engineering Quality and Operations Management (SREQOM),India, and Division of Operation and Maintenance, Lulea University of Technology, Sweden, vol. 15(1), pages 253-266, January.

Handle: RePEc:spr:ijsaem:v:15:y:2024:i:1:d:10.1007_s13198-022-01703-4
DOI: 10.1007/s13198-022-01703-4

Download full text from publisher

As the access to this document is restricted, you may want to search for a different version of it.

More about this item

Keywords

Stream data; Spark framework; Imbalance data; Classification; Stacked auto encoder (SAE);
All these keywords.

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:ijsaem:v:15:y:2024:i:1:d:10.1007_s13198-022-01703-4. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

We have no bibliographic references for this item. You can help adding them by using this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

A case study for performance analysis of big data stream classification using spark architecture

Author

Abstract

Suggested Citation

Download full text from publisher

More about this item

Keywords

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data