IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v10y2022i20p3847-d945132.html
   My bibliography  Save this article

Using Probabilistic Models for Data Compression

Author

Listed:
  • Iuliana Iatan

    (Department of Mathematics and Computer Science, Technical University of Civil Engineering, 020396 Bucharest, Romania)

  • Mihăiţă Drăgan

    (Faculty of Mathematics and Computer Science, University of Bucharest, 010014 Bucharest, Romania)

  • Silvia Dedu

    (Department of Applied Mathematics, Bucharest University of Economic Studies, 010734 Bucharest, Romania)

  • Vasile Preda

    (Faculty of Mathematics and Computer Science, University of Bucharest, 010014 Bucharest, Romania
    “Gheorghe Mihoc-Caius Iacob” Institute of Mathematical Statistics and Applied Mathematics, 050711 Bucharest, Romania
    “Costin C. Kiriţescu” National Institute of Economic Research, 050711 Bucharest, Romania)

Abstract

Our research objective is to improve the Huffman coding efficiency by adjusting the data using a Poisson distribution, which avoids the undefined entropies too. The scientific value added by our paper consists in the fact of minimizing the average length of the code words, which is greater in the absence of applying the Poisson distribution. Huffman Coding is an error-free compression method, designed to remove the coding redundancy, by yielding the smallest number of code symbols per source symbol, which in practice can be represented by the intensity of an image or the output of a mapping operation. We shall use the images from the PASCAL Visual Object Classes (VOC) to evaluate our methods. In our work we use 10,102 randomly chosen images, such that half of them are for training, while the other half is for testing. The VOC data sets display significant variability regarding object size, orientation, pose, illumination, position and occlusion. The data sets are composed by 20 object classes, respectively: aeroplane, bicycle, bird, boat, bottle, bus, car, motorbike, train, sofa, table, chair, tv/monitor, potted plant, person, cat, cow, dog, horse and sheep. The descriptors of different objects can be compared to give a measurement of their similarity. Image similarity is an important concept in many applications. This paper is focused on the measure of similarity in the computer science domain, more specifically information retrieval and data mining. Our approach uses 64 descriptors for each image belonging to the training and test set, therefore the number of symbols is 64. The data of our information source are different from a finite memory source (Markov), where its output depends on a finite number of previous outputs. When dealing with large volumes of data, an effective approach to increase the Information Retrieval speed is based on using Neural Networks as an artificial intelligent technique.

Suggested Citation

  • Iuliana Iatan & Mihăiţă Drăgan & Silvia Dedu & Vasile Preda, 2022. "Using Probabilistic Models for Data Compression," Mathematics, MDPI, vol. 10(20), pages 1-29, October.
  • Handle: RePEc:gam:jmathe:v:10:y:2022:i:20:p:3847-:d:945132
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/10/20/3847/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/10/20/3847/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Masaki Ishikawa & Hajime Kawakami, 2013. "Compression-based distance between string data and its application to literary work classification based on authorship," Computational Statistics, Springer, vol. 28(2), pages 851-873, April.
    2. Athanasios Sachlas & Takis Papaioannou, 2014. "Residual and Past Entropy in Actuarial Science and Survival Models," Methodology and Computing in Applied Probability, Springer, vol. 16(1), pages 79-99, March.
    3. Enchakudiyil Ibrahim Abdul-Sathar & Glory Sathyanesan Sathyareji, 2018. "Estimation Of Dynamic Cumulative Past Entropy For Power Function Distribution," Statistica, Department of Statistics, University of Bologna, vol. 78(4), pages 319-334.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Helio M. de Oliveira & Raydonal Ospina & Carlos Martin-Barreiro & Víctor Leiva & Christophe Chesneau, 2023. "On the Use of Variability Measures to Analyze Source Coding Data Based on the Shannon Entropy," Mathematics, MDPI, vol. 11(2), pages 1-16, January.
    2. Cristina-Liliana Pripoae & Iulia-Elena Hirica & Gabriel-Teodor Pripoae & Vasile Preda, 2023. "Holonomic and Non-Holonomic Geometric Models Associated to the Gibbs–Helmholtz Equation," Mathematics, MDPI, vol. 11(18), pages 1-20, September.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Antonio Di Crescenzo & Patrizia Di Gironimo, 2018. "Stochastic Comparisons and Dynamic Information of Random Lifetimes in a Replacement Model," Mathematics, MDPI, vol. 6(10), pages 1-13, October.
    2. Asok K. Nanda & Shovan Chowdhury, 2021. "Shannon's Entropy and Its Generalisations Towards Statistical Inference in Last Seven Decades," International Statistical Review, International Statistical Institute, vol. 89(1), pages 167-185, April.
    3. Jiamin Yu, 2021. "Three fundamental problems in risk modeling on big data: an information theory view," Papers 2109.03541, arXiv.org.
    4. Maya, R. & Abdul-Sathar, E.I. & Rajesh, G., 2014. "Non-parametric estimation of the generalized past entropy function with censored dependent data," Statistics & Probability Letters, Elsevier, vol. 90(C), pages 129-135.
    5. Răzvan-Cornel Sfetcu & Vasile Preda, 2024. "Order Properties Concerning Tsallis Residual Entropy," Mathematics, MDPI, vol. 12(3), pages 1-16, January.
    6. Aswathy S. Krishnan & S. M. Sunoj & P. G. Sankaran, 2019. "Quantile-based reliability aspects of cumulative Tsallis entropy in past lifetime," Metrika: International Journal for Theoretical and Applied Statistics, Springer, vol. 82(1), pages 17-38, January.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:10:y:2022:i:20:p:3847-:d:945132. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.