IDEAS home Printed from https://ideas.repec.org/a/sae/intdis/v15y2019i7p1550147719862206.html
   My bibliography  Save this article

SPMgr: Dynamic workflow manager for sampling and filtering data streams over Apache Storm

Author

Listed:
  • Youngkuk Kim
  • Siwoon Son
  • Yang-Sae Moon

Abstract

In this article, we address dynamic workflow management for sampling and filtering data streams in Apache Storm. As many sensors generate data streams continuously, we often use sampling to choose some representative data or filtering to remove unnecessary data. Apache Storm is a real-time distributed processing platform suitable for handling large data streams. Storm, however, must stop the entire work when it changes the input data structure or processing algorithm as it needs to modify, redistribute, and restart the programs. In addition, for effective data processing, we often use Storm with Kafka and databases, but it is difficult to use these platforms in an integrated manner. In this article, we derive the problems when applying sampling and filtering algorithms to Storm and propose a dynamic workflow management model that solves these problems. First, we present the concept of a plan consisting of input, processing, and output modules of a data stream. Second, we propose Storm Plan Manager, which can operate Storm, Kafka, and database as a single integrated system. Storm Plan Manager is an integrated workflow manager that dynamically controls sampling and filtering of data streams through plans. Third, as a key feature, Storm Plan Manager provides a Web client interface to visually create, execute, and monitor plans. In this article, we show the usefulness of the proposed Storm Plan Manager by presenting its design, implementation, and experimental results in order.

Suggested Citation

  • Youngkuk Kim & Siwoon Son & Yang-Sae Moon, 2019. "SPMgr: Dynamic workflow manager for sampling and filtering data streams over Apache Storm," International Journal of Distributed Sensor Networks, , vol. 15(7), pages 15501477198, July.
  • Handle: RePEc:sae:intdis:v:15:y:2019:i:7:p:1550147719862206
    DOI: 10.1177/1550147719862206
    as

    Download full text from publisher

    File URL: https://journals.sagepub.com/doi/10.1177/1550147719862206
    Download Restriction: no

    File URL: https://libkey.io/10.1177/1550147719862206?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Wonhyeong Cho & Myeong-Seon Gil & Mi-Jung Choi & Yang-Sae Moon, 2018. "Storm-based distributed sampling system for multi-source stream environment," International Journal of Distributed Sensor Networks, , vol. 14(11), pages 15501477188, November.
    2. Hajin Kim & Myeong-Seon Gil & Yang-Sae Moon & Mi-Jung Choi, 2018. "Variable size sampling to support high uniformity confidence in sensor data streams," International Journal of Distributed Sensor Networks, , vol. 14(4), pages 15501477187, April.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.

      Corrections

      All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:sae:intdis:v:15:y:2019:i:7:p:1550147719862206. See general information about how to correct material in RePEc.

      If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

      If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

      If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

      For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: SAGE Publications (email available below). General contact details of provider: .

      Please note that corrections may take a couple of weeks to filter through the various RePEc services.

      IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.