IDEAS home Printed from https://ideas.repec.org/a/gam/jdataj/v8y2023i12p182-d1292300.html
   My bibliography  Save this article

An Automated Big Data Quality Anomaly Correction Framework Using Predictive Analysis

Author

Listed:
  • Widad Elouataoui

    (Laboratory of Engineering Sciences, National School of Applied Sciences, Ibn Tofail University, Kenitra 14000, Morocco)

  • Saida El Mendili

    (Laboratory of Engineering Sciences, National School of Applied Sciences, Ibn Tofail University, Kenitra 14000, Morocco)

  • Youssef Gahi

    (Laboratory of Engineering Sciences, National School of Applied Sciences, Ibn Tofail University, Kenitra 14000, Morocco)

Abstract

Big data has emerged as a fundamental component in various domains, enabling organizations to extract valuable insights and make informed decisions. However, ensuring data quality is crucial for effectively using big data. Thus, big data quality has been gaining more attention in recent years by researchers and practitioners due to its significant impact on decision-making processes. However, existing studies addressing data quality anomalies often have a limited scope, concentrating on specific aspects such as outliers or inconsistencies. Moreover, many approaches are context-specific, lacking a generic solution applicable across different domains. To the best of our knowledge, no existing framework currently automatically addresses quality anomalies comprehensively and generically, considering all aspects of data quality. To fill the gaps in the field, we propose a sophisticated framework that automatically corrects big data quality anomalies using an intelligent predictive model. The proposed framework comprehensively addresses the main aspects of data quality by considering six key quality dimensions: Accuracy, Completeness, Conformity, Uniqueness, Consistency, and Readability. Moreover, the framework is not correlated to a specific field and is designed to be applicable across various areas, offering a generic approach to address data quality anomalies. The proposed framework was implemented on two datasets and has achieved an accuracy of 98.22%. Moreover, the results have shown that the framework has allowed the data quality to be boosted to a great score, reaching 99%, with an improvement rate of up to 14.76% of the quality score.

Suggested Citation

  • Widad Elouataoui & Saida El Mendili & Youssef Gahi, 2023. "An Automated Big Data Quality Anomaly Correction Framework Using Predictive Analysis," Data, MDPI, vol. 8(12), pages 1-22, December.
  • Handle: RePEc:gam:jdataj:v:8:y:2023:i:12:p:182-:d:1292300
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2306-5729/8/12/182/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2306-5729/8/12/182/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Ruiqing Wang & Jinlei Feng & Wu Zhang & Bo Liu & Tao Wang & Chenlu Zhang & Shaoxiang Xu & Lifu Zhang & Guanpeng Zuo & Yixi Lv & Zhe Zheng & Yu Hong & Xiuqi Wang, 2023. "Detection and Correction of Abnormal IoT Data from Tea Plantations Based on Deep Learning," Agriculture, MDPI, vol. 13(2), pages 1-20, February.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.

      Corrections

      All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jdataj:v:8:y:2023:i:12:p:182-:d:1292300. See general information about how to correct material in RePEc.

      If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

      If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

      If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

      For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

      Please note that corrections may take a couple of weeks to filter through the various RePEc services.

      IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.