IDEAS home Printed from https://ideas.repec.org/a/bit/bsrysr/v9y2018i2p69-79n7.html
   My bibliography  Save this article

Autonomous Sensor Data Cleaning in Stream Mining Setting

Author

Listed:
  • Kenda Klemen

    (Jožef Stefan Institute, Ljubljana, Slovenia, Jozef Stefan International Postgraduate School,Ljubljana, Slovenia)

  • Mladenić Dunja

    (Jožef Stefan Institute, Ljubljana, Slovenia, Jozef Stefan International Postgraduate School,Ljubljana, Slovenia)

Abstract

Background: Internet of Things (IoT), earth observation and big scientific experiments are sources of extensive amounts of sensor big data today. We are faced with large amounts of data with low measurement costs. A standard approach in such cases is a stream mining approach, implying that we look at a particular measurement only once during the real-time processing. This requires the methods to be completely autonomous. In the past, very little attention was given to the most time-consuming part of the data mining process, i.e. data pre-processing. Objectives: In this paper we propose an algorithm for data cleaning, which can be applied to real-world streaming big data. Methods/Approach: We use the short-term prediction method based on the Kalman filter to detect admissible intervals for future measurements. The model can be adapted to the concept drift and is useful for detecting random additive outliers in a sensor data stream. Results: For datasets with low noise, our method has proven to perform better than the method currently commonly used in batch processing scenarios. Our results on higher noise datasets are comparable. Conclusions: We have demonstrated a successful application of the proposed method in real-world scenarios including the groundwater level, server load and smart-grid data

Suggested Citation

  • Kenda Klemen & Mladenić Dunja, 2018. "Autonomous Sensor Data Cleaning in Stream Mining Setting," Business Systems Research, Sciendo, vol. 9(2), pages 69-79, July.
  • Handle: RePEc:bit:bsrysr:v:9:y:2018:i:2:p:69-79:n:7
    DOI: 10.2478/bsrj-2018-0020
    as

    Download full text from publisher

    File URL: https://doi.org/10.2478/bsrj-2018-0020
    Download Restriction: no

    File URL: https://libkey.io/10.2478/bsrj-2018-0020?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Marczak, Martyna & Proietti, Tommaso & Grassi, Stefano, 2018. "A data-cleaning augmented Kalman filter for robust estimation of state space models," Econometrics and Statistics, Elsevier, vol. 5(C), pages 107-123.
    2. Al Quhtani Masoud, 2017. "Data Mining Usage in Corporate Information Security: Intrusion Detection Applications," Business Systems Research, Sciendo, vol. 8(1), pages 51-59, March.
    3. Zekić-Sušac Marijana & Has Adela, 2015. "Data Mining as Support to Knowledge Management in Marketing," Business Systems Research, Sciendo, vol. 6(2), pages 18-30, September.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Aleksandar Grubor & Olja Jaksa, 2018. "Internet Marketing as a Business Necessity," Interdisciplinary Description of Complex Systems - scientific journal, Croatian Interdisciplinary Society Provider Homepage: http://indecs.eu, vol. 16(2), pages 265-274.
    2. Rombouts, Jeroen V.K. & Stentoft, Lars & Violante, Francesco, 2020. "Variance swap payoffs, risk premia and extreme market conditions," Econometrics and Statistics, Elsevier, vol. 13(C), pages 106-124.
    3. Mirjana Pejić Bach & Živko Krstić & Sanja Seljan & Lejla Turulja, 2019. "Text Mining for Big Data Analysis in Financial Sector: A Literature Review," Sustainability, MDPI, vol. 11(5), pages 1-27, February.
    4. Barbarino, Alessandro & Bura, Efstathia, 2024. "Forecasting Near-equivalence of Linear Dimension Reduction Methods in Large Panels of Macro-variables," Econometrics and Statistics, Elsevier, vol. 31(C), pages 1-18.
    5. Chini, Emilio Zanetti, 2023. "Can we estimate macroforecasters’ mis-behavior?," Journal of Economic Dynamics and Control, Elsevier, vol. 149(C).
    6. Opiła Janusz, 2019. "Role of Visualization in a Knowledge Transfer Process," Business Systems Research, Sciendo, vol. 10(1), pages 164-179, April.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bit:bsrysr:v:9:y:2018:i:2:p:69-79:n:7. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Peter Golla (email available below). General contact details of provider: https://www.sciendo.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.