IDEAS home Printed from https://ideas.repec.org/a/ajp/edwast/v9y2025i6p846-863id7970.html
   My bibliography  Save this article

Optimizing data quality in big data through unsupervised record linkage techniques

Author

Listed:
  • Aissam Bendida
  • Amar Bensaber Djamel
  • Réda Adjoudj
  • Yahia Atig

Abstract

In today's era of Big Data, maintaining high-quality data is crucial for effective data management. One key aspect of this is record linkage, which involves identifying, comparing, and merging records from different sources that refer to the same real-world entity. However, traditional record linkage methods struggle to keep up with the rapidly increasing volume and diversity of data. These methods often rely on labeled data, which can be expensive and difficult to obtain. To overcome these challenges, unsupervised blocking techniques have emerged as a promising alternative, allowing large-scale datasets to be managed efficiently without the need for pre-labeled data. In this article, we introduce a novel approach that integrates the Firefly Algorithm for optimized feature selection, Locality-Sensitive Hashing (LSH) for dimensionality reduction, and Length-based Feature Weighting (LFW) for improved data representation. Our methodology aims to enhance both the accuracy and scalability of record linkage in Big Data environments. Experimental results show that our approach is highly effective, demonstrating its potential to significantly improve data quality in large-scale datasets.

Suggested Citation

  • Aissam Bendida & Amar Bensaber Djamel & Réda Adjoudj & Yahia Atig, 2025. "Optimizing data quality in big data through unsupervised record linkage techniques," Edelweiss Applied Science and Technology, Learning Gate, vol. 9(6), pages 846-863.
  • Handle: RePEc:ajp:edwast:v:9:y:2025:i:6:p:846-863:id:7970
    as

    Download full text from publisher

    File URL: https://learning-gate.com/index.php/2576-8484/article/view/7970/2709
    Download Restriction: no
    ---><---

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:ajp:edwast:v:9:y:2025:i:6:p:846-863:id:7970. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Melissa Fernandes (email available below). General contact details of provider: https://learning-gate.com/index.php/2576-8484/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.