IDEAS home Printed from https://ideas.repec.org/a/igg/jitwe0/v16y2021i2p45-57.html
   My bibliography  Save this article

Towards Efficient Big Data Storage With MapReduce Deduplication System

Author

Listed:
  • Vijesh Joe

    (VV College of Engineering, India)

  • Jennifer S. Raj

    (Gnanamani College of Technology, India)

  • Smys S.

    (RVS Technical Campus, India)

Abstract

In the big data era, there is a high requirement for data storage and processing. The conventional approach faces a great challenge, and de-duplication is an excellent approach to reduce the storage space and computational time. Many existing approaches take much time to pinpoint the similar data. MapReduce de-duplication system is proposed to attain high duplication ratio. MapReduce is the parallel processing approach that helps to process large number of files in less time. The proposed system uses two threshold two divisor with switch algorithm for chunking. Switch is the average parameter used by TTTD-S to minimize the chunk size variance. Hashing using SHA-3 and fractal tree indexing is used here. In fractal index tree, read and write takes place at the same time. Data size after de-duplication, de-duplication ratio, throughput, hash time, chunk time, and de-duplication time are the parameters used. The performance of the system is tested by college scorecard and ZCTA dataset. The experimental results show that the proposed system can lessen the duplicity and processing time.

Suggested Citation

  • Vijesh Joe & Jennifer S. Raj & Smys S., 2021. "Towards Efficient Big Data Storage With MapReduce Deduplication System," International Journal of Information Technology and Web Engineering (IJITWE), IGI Global, vol. 16(2), pages 45-57, April.
  • Handle: RePEc:igg:jitwe0:v:16:y:2021:i:2:p:45-57
    as

    Download full text from publisher

    File URL: http://services.igi-global.com/resolvedoi/resolve.aspx?doi=10.4018/IJITWE.2021040103
    Download Restriction: no
    ---><---

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:igg:jitwe0:v:16:y:2021:i:2:p:45-57. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Journal Editor (email available below). General contact details of provider: https://www.igi-global.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.