IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0185189.html
   My bibliography  Save this article

DOMe: A deduplication optimization method for the NewSQL database backups

Author

Listed:
  • Longxiang Wang
  • Zhengdong Zhu
  • Xingjun Zhang
  • Xiaoshe Dong
  • Yinfeng Wang

Abstract

Reducing duplicated data of database backups is an important application scenario for data deduplication technology. NewSQL is an emerging database system and is now being used more and more widely. NewSQL systems need to improve data reliability by periodically backing up in-memory data, resulting in a lot of duplicated data. The traditional deduplication method is not optimized for the NewSQL server system and cannot take full advantage of hardware resources to optimize deduplication performance. A recent research pointed out that the future NewSQL server will have thousands of CPU cores, large DRAM and huge NVRAM. Therefore, how to utilize these hardware resources to optimize the performance of data deduplication is an important issue. To solve this problem, we propose a deduplication optimization method (DOMe) for NewSQL system backup. To take advantage of the large number of CPU cores in the NewSQL server to optimize deduplication performance, DOMe parallelizes the deduplication method based on the fork-join framework. The fingerprint index, which is the key data structure in the deduplication process, is implemented as pure in-memory hash table, which makes full use of the large DRAM in NewSQL system, eliminating the performance bottleneck problem of fingerprint index existing in traditional deduplication method. The H-store is used as a typical NewSQL database system to implement DOMe method. DOMe is experimentally analyzed by two representative backup data. The experimental results show that: 1) DOMe can reduce the duplicated NewSQL backup data. 2) DOMe significantly improves deduplication performance by parallelizing CDC algorithms. In the case of the theoretical speedup ratio of the server is 20.8, the speedup ratio of DOMe can achieve up to 18; 3) DOMe improved the deduplication throughput by 1.5 times through the pure in-memory index optimization method.

Suggested Citation

  • Longxiang Wang & Zhengdong Zhu & Xingjun Zhang & Xiaoshe Dong & Yinfeng Wang, 2017. "DOMe: A deduplication optimization method for the NewSQL database backups," PLOS ONE, Public Library of Science, vol. 12(10), pages 1-17, October.
  • Handle: RePEc:plo:pone00:0185189
    DOI: 10.1371/journal.pone.0185189
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0185189
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0185189&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0185189?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Editors The, 2008. "From the Editors," Basic Income Studies, De Gruyter, vol. 2(2), pages 1-3, January.
    2. Editors The, 2010. "Content," Basic Income Studies, De Gruyter, vol. 5(1), pages 1-1, September.
    3. Editors The, 2010. "Content," Basic Income Studies, De Gruyter, vol. 4(2), pages 1-1, September.
    4. Editors The, 2008. "From the Editors," Basic Income Studies, De Gruyter, vol. 3(1), pages 1-1, July.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Rosalie L Tung & Günter K Stahl, 2018. "The tortuous evolution of the role of culture in IB research: What we know, what we don’t know, and where we are headed," Journal of International Business Studies, Palgrave Macmillan;Academy of International Business, vol. 49(9), pages 1167-1189, December.
    2. Laurent, Catherine E. & Berriet-Solliec, Marielle & Kirsch, Marc & Labarthe, Pierre & Trouve, Aurelie, 2010. "Multifunctionality Of Agriculture, Public Policies And Scientific Evidences: Some Critical Issues Of Contemporary Controversies," APSTRACT: Applied Studies in Agribusiness and Commerce, AGRIMBA, vol. 4(1-2), pages 1-6.
    3. Hsu, Dan K. & Burmeister-Lamp, Katrin & Simmons, Sharon A. & Foo, Maw-Der & Hong, Michelle C. & Pipes, Jesse D., 2019. "“I know I can, but I don't fit”: Perceived fit, self-efficacy, and entrepreneurial intention," Journal of Business Venturing, Elsevier, vol. 34(2), pages 311-326.
    4. Lude, Maximilian & Prügl, Reinhard, 2021. "Experimental studies in family business research," Journal of Family Business Strategy, Elsevier, vol. 12(1).
    5. Batistič, Saša & Černe, Matej & Kaše, Robert & Zupic, Ivan, 2016. "The role of organizational context in fostering employee proactive behavior: The interplay between HR system configurations and relational climates," European Management Journal, Elsevier, vol. 34(5), pages 579-588.
    6. Dan K. Hsu & Johan Wiklund & Richard D. Cotton, 2017. "Success, Failure, and Entrepreneurial Reentry: An Experimental Assessment of the Veracity of Self–Efficacy and Prospect Theory," Entrepreneurship Theory and Practice, , vol. 41(1), pages 19-47, January.
    7. Krueger, Norris & Bogers, Marcel L.A.M. & Labaki, Rania & Basco, Rodrigo, 2021. "Advancing family business science through context theorizing: The case of the Arab world," Journal of Family Business Strategy, Elsevier, vol. 12(1).
    8. Jana Schmutzler & Edward Lorenz, 2018. "Tolerance, agglomeration, and enterprise innovation performance: a multilevel analysis of Latin American regions," Industrial and Corporate Change, Oxford University Press and the Associazione ICC, vol. 27(2), pages 243-268.
    9. Choi, James J. & Haisley, Emily & Kurkoski, Jennifer & Massey, Cade, 2017. "Small cues change savings choices," Journal of Economic Behavior & Organization, Elsevier, vol. 142(C), pages 378-395.
    10. Catherine Welch & Eriikka Paavilainen-Mäntymäki & Rebecca Piekkari & Emmanuella Plakoyiannaki, 2022. "Reconciling theory and context: How the case study can set a new agenda for international business research," Journal of International Business Studies, Palgrave Macmillan;Academy of International Business, vol. 53(1), pages 4-26, February.
    11. Sirola, Nina & Pitesa, Marko, 2018. "The macroeconomic environment and the psychology of work evaluation," Organizational Behavior and Human Decision Processes, Elsevier, vol. 144(C), pages 11-24.
    12. Weber, Ellen & Büttgen, Marion & Bartsch, Silke, 2022. "How to take employees on the digital transformation journey: An experimental study on complementary leadership behaviors in managing organizational change," Journal of Business Research, Elsevier, vol. 143(C), pages 225-238.
    13. Sturt W Manning & Brita Lorentzen & Lynn Welton & Stephen Batiuk & Timothy P Harrison, 2020. "Beyond megadrought and collapse in the Northern Levant: The chronology of Tell Tayinat and two historical inflection episodes, around 4.2ka BP, and following 3.2ka BP," PLOS ONE, Public Library of Science, vol. 15(10), pages 1-38, October.
    14. Milazzo, M.F. & Spina, F. & Primerano, P. & Bart, J.C.J., 2013. "Soy biodiesel pathways: Global prospects," Renewable and Sustainable Energy Reviews, Elsevier, vol. 26(C), pages 579-624.
    15. Ravi KANBUR & Lucas RONCONI, 2018. "Enforcement matters: The effective regulation of labour," International Labour Review, International Labour Organization, vol. 157(3), pages 331-356, September.
    16. Meuleman, Miguel & Wright, Mike, 2011. "Cross-border private equity syndication: Institutional context and learning," Journal of Business Venturing, Elsevier, vol. 26(1), pages 35-48, January.
    17. Ferreira, Manuel Portugal & Li, Dan & Guisinger, Stephen & Serra, Fernando A. Ribeiro, 2009. "Será o ambiente internacional de negócios o contexto efetivo para a pesquisa em negócios internacionais?," RAE - Revista de Administração de Empresas, FGV-EAESP Escola de Administração de Empresas de São Paulo (Brazil), vol. 49(3), July.
    18. Stephanie B Linek & Benedikt Fecher & Sascha Friesike & Marcel Hebing, 2017. "Data sharing as social dilemma: Influence of the researcher’s personality," PLOS ONE, Public Library of Science, vol. 12(8), pages 1-24, August.
    19. Louis Sevitenyi Nkwatoh & Yahya Zakari Abdullahi & Chika Usman Aliyu, 2019. "Past and Current European Monetary Union Crises: Lessons for the Envisaged West African Monetary Union," International Journal of Economics and Financial Issues, Econjournals, vol. 9(4), pages 50-59.
    20. Robert E. Marks, 2010. "Welcome to SAGE Publications," Australian Journal of Management, Australian School of Business, vol. 35(1), pages 3-5, April.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0185189. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.