IDEAS home Printed from https://ideas.repec.org/a/eee/ejores/v252y2016i3p985-994.html
   My bibliography  Save this article

An approximate dynamic programming approach for improving accuracy of lossy data compression by Bloom filters

Author

Listed:
  • Yang, Xinan
  • Vernitski, Alexei
  • Carrea, Laura

Abstract

Bloom filters are a data structure for storing data in a compressed form. They offer excellent space and time efficiency at the cost of some loss of accuracy (so-called lossy compression). This work presents a yes–no Bloom filter, which as a data structure consisting of two parts: the yes-filter which is a standard Bloom filter and the no-filter which is another Bloom filter whose purpose is to represent those objects that were recognized incorrectly by the yes-filter (that is, to recognize the false positives of the yes-filter). By querying the no-filter after an object has been recognized by the yes-filter, we get a chance of rejecting it, which improves the accuracy of data recognition in comparison with the standard Bloom filter of the same total length. A further increase in accuracy is possible if one chooses objects to include in the no-filter so that the no-filter recognizes as many as possible false positives but no true positives, thus producing the most accurate yes–no Bloom filter among all yes–no Bloom filters. This paper studies how optimization techniques can be used to maximize the number of false positives recognized by the no-filter, with the constraint being that it should recognize no true positives. To achieve this aim, an Integer Linear Program (ILP) is proposed for the optimal selection of false positives. In practice the problem size is normally large leading to intractable optimal solution. Considering the similarity of the ILP with the Multidimensional Knapsack Problem, an Approximate Dynamic Programming (ADP) model is developed making use of a reduced ILP for the value function approximation. Numerical results show the ADP model works best comparing with a number of heuristics as well as the CPLEX built-in solver (B&B), and this is what can be recommended for use in yes–no Bloom filters. In a wider context of the study of lossy compression algorithms, our research is an example showing how the arsenal of optimization methods can be applied to improving the accuracy of compressed data.

Suggested Citation

  • Yang, Xinan & Vernitski, Alexei & Carrea, Laura, 2016. "An approximate dynamic programming approach for improving accuracy of lossy data compression by Bloom filters," European Journal of Operational Research, Elsevier, vol. 252(3), pages 985-994.
  • Handle: RePEc:eee:ejores:v:252:y:2016:i:3:p:985-994
    DOI: 10.1016/j.ejor.2016.01.042
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0377221716000874
    Download Restriction: Full text for ScienceDirect subscribers only

    File URL: https://libkey.io/10.1016/j.ejor.2016.01.042?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. James H. Lorie & Leonard J. Savage, 1955. "Three Problems in Rationing Capital," The Journal of Business, University of Chicago Press, vol. 28, pages 229-229.
    2. Dimitris Bertsimas & Ramazan Demir, 2002. "An Approximate Dynamic Programming Approach to Multidimensional Knapsack Problems," Management Science, INFORMS, vol. 48(4), pages 550-565, April.
    3. Grothey, Andreas & Yang, Xinan, 2012. "Approximate dynamic programming with Bézier Curves/Surfaces for Top-percentile Traffic Routing," European Journal of Operational Research, Elsevier, vol. 218(3), pages 698-707.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. José García & José Lemus-Romani & Francisco Altimiras & Broderick Crawford & Ricardo Soto & Marcelo Becerra-Rozas & Paola Moraga & Alex Paz Becerra & Alvaro Peña Fritz & Jose-Miguel Rubio & Gino Astor, 2021. "A Binary Machine Learning Cuckoo Search Algorithm Improved by a Local Search Operator for the Set-Union Knapsack Problem," Mathematics, MDPI, vol. 9(20), pages 1-19, October.
    2. Yanhong Feng & Haizhong An & Xiangyun Gao, 2018. "The Importance of Transfer Function in Solving Set-Union Knapsack Problem Based on Discrete Moth Search Algorithm," Mathematics, MDPI, vol. 7(1), pages 1-25, December.
    3. José García & Andres Leiva-Araos & Broderick Crawford & Ricardo Soto & Hernan Pinto, 2023. "Exploring Initialization Strategies for Metaheuristic Optimization: Case Study of the Set-Union Knapsack Problem," Mathematics, MDPI, vol. 11(12), pages 1-20, June.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Hanafi, Said & Freville, Arnaud, 1998. "An efficient tabu search approach for the 0-1 multidimensional knapsack problem," European Journal of Operational Research, Elsevier, vol. 106(2-3), pages 659-675, April.
    2. E A Silver, 2004. "An overview of heuristic solution methods," Journal of the Operational Research Society, Palgrave Macmillan;The OR Society, vol. 55(9), pages 936-956, September.
    3. D. Babusiaux, 1988. "Financing investment and calculations of profitability [Financement des investissements et calculs de rentabilité]," Working Papers hal-01534450, HAL.
    4. Osborne, Michael J., 2010. "A resolution to the NPV-IRR debate?," The Quarterly Review of Economics and Finance, Elsevier, vol. 50(2), pages 234-239, May.
    5. Bertazzi, Luca & Bosco, Adamo & Laganà, Demetrio, 2015. "Managing stochastic demand in an Inventory Routing Problem with transportation procurement," Omega, Elsevier, vol. 56(C), pages 112-121.
    6. Yuji Nakagawa & Ross J. W. James & César Rego & Chanaka Edirisinghe, 2014. "Entropy-Based Optimization of Nonlinear Separable Discrete Decision Models," Management Science, INFORMS, vol. 60(3), pages 695-707, March.
    7. Kannapiran C. Arjunan, 2019. "Non‐monotonic NPV Function Leads to Spurious NPVs and Multiple IRR Problems: A New Method that Resolves these Problems," Economic Papers, The Economic Society of Australia, vol. 38(1), pages 56-69, March.
    8. Xi, Haoning & Liu, Wei & Waller, S. Travis & Hensher, David A. & Kilby, Philip & Rey, David, 2023. "Incentive-compatible mechanisms for online resource allocation in Mobility-as-a-Service systems," Transportation Research Part B: Methodological, Elsevier, vol. 170(C), pages 119-147.
    9. Wilbaut, Christophe & Todosijevic, Raca & Hanafi, Saïd & Fréville, Arnaud, 2023. "Heuristic and exact reduction procedures to solve the discounted 0–1 knapsack problem," European Journal of Operational Research, Elsevier, vol. 304(3), pages 901-911.
    10. Enrique Garza-Escalante & Arturo de la Torre, 2015. "Nacional Monte de Piedad Uses a Novel Social-Value Measure for Allocating Grants Among Charities," Interfaces, INFORMS, vol. 45(6), pages 514-528, December.
    11. Pérez, Fátima & Gómez, Trinidad & Caballero, Rafael & Liern, Vicente, 2018. "Project portfolio selection and planning with fuzzy constraints," Technological Forecasting and Social Change, Elsevier, vol. 131(C), pages 117-129.
    12. Ulmer, Marlin W. & Soeffker, Ninja & Mattfeld, Dirk C., 2018. "Value function approximation for dynamic multi-period vehicle routing," European Journal of Operational Research, Elsevier, vol. 269(3), pages 883-899.
    13. Xingmei Li & Yaxian Wang & Qingyou Yan & Xinchao Zhao, 2019. "Uncertain mean-variance model for dynamic project portfolio selection problem with divisibility," Fuzzy Optimization and Decision Making, Springer, vol. 18(1), pages 37-56, March.
    14. Deane, Jason & Agarwal, Anurag, 2012. "Scheduling online advertisements to maximize revenue under variable display frequency," Omega, Elsevier, vol. 40(5), pages 562-570.
    15. Goodson, Justin C. & Thomas, Barrett W. & Ohlmann, Jeffrey W., 2017. "A rollout algorithm framework for heuristic solutions to finite-horizon stochastic dynamic programs," European Journal of Operational Research, Elsevier, vol. 258(1), pages 216-229.
    16. Petr Marek, 2007. "Notes about Study on History of Theoretical Approaches to Investment Decision [Poznámky ke studiu historie teoretických přístupů k investičnímu rozhodování]," Český finanční a účetní časopis, Prague University of Economics and Business, vol. 2007(4), pages 23-29.
    17. Jean-Pierre van Zyl & Andries Petrus Engelbrecht, 2023. "Set-Based Particle Swarm Optimisation: A Review," Mathematics, MDPI, vol. 11(13), pages 1-36, July.
    18. Yuri Biondi, 2006. "The double emergence of the Modified Internal Rate of Return: The neglected financial work of Duvillard (1755 - 1832) in a comparative perspective," The European Journal of the History of Economic Thought, Taylor & Francis Journals, vol. 13(3), pages 311-335.
    19. Freville, Arnaud, 2004. "The multidimensional 0-1 knapsack problem: An overview," European Journal of Operational Research, Elsevier, vol. 155(1), pages 1-21, May.
    20. Ye Tian & Miao Sun & Zuoliang Ye & Wei Yang, 2016. "Expanded models of the project portfolio selection problem with loss in divisibility," Journal of the Operational Research Society, Palgrave Macmillan;The OR Society, vol. 67(8), pages 1097-1107, August.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:ejores:v:252:y:2016:i:3:p:985-994. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/eor .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.