IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v8y2020i7p1090-d379907.html
   My bibliography  Save this article

Optimizing the Estimation of a Histogram-Bin Width—Application to the Multivariate Mixture-Model Estimation

Author

Listed:
  • Branislav Panić

    (Faculty of Mechanical Engineering, University of Ljubljana, Aškerčeva ulica 6, 1000 Ljubljana, Slovenia)

  • Jernej Klemenc

    (Faculty of Mechanical Engineering, University of Ljubljana, Aškerčeva ulica 6, 1000 Ljubljana, Slovenia)

  • Marko Nagode

    (Faculty of Mechanical Engineering, University of Ljubljana, Aškerčeva ulica 6, 1000 Ljubljana, Slovenia)

Abstract

A maximum-likelihood estimation of a multivariate mixture model’s parameters is a difficult problem. One approach is to combine the REBMIX and EM algorithms. However, the REBMIX algorithm requires the use of histogram estimation, which is the most rudimentary approach to an empirical density estimation and has many drawbacks. Nevertheless, because of its simplicity, it is still one of the most commonly used techniques. The main problem is to estimate the optimum histogram-bin width, which is usually set by the number of non-overlapping, regularly spaced bins. For univariate problems it is usually denoted by an integer value; i.e., the number of bins. However, for multivariate problems, in order to obtain a histogram estimation, a regular grid must be formed. Thus, to obtain the optimum histogram estimation, an integer-optimization problem must be solved. The aim is therefore the estimation of optimum histogram binning, alone and in application to the mixture model parameter estimation with the REBMIX&EM strategy. As an estimator, the Knuth rule was used. For the optimization algorithm, a derivative based on the coordinate-descent optimization was composed. These proposals yielded promising results. The optimization algorithm was efficient and the results were accurate. When applied to the multivariate, Gaussian-mixture-model parameter estimation, the results were competitive. All the improvements were implemented in the rebmix R package.

Suggested Citation

  • Branislav Panić & Jernej Klemenc & Marko Nagode, 2020. "Optimizing the Estimation of a Histogram-Bin Width—Application to the Multivariate Mixture-Model Estimation," Mathematics, MDPI, vol. 8(7), pages 1-30, July.
  • Handle: RePEc:gam:jmathe:v:8:y:2020:i:7:p:1090-:d:379907
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/8/7/1090/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/8/7/1090/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Luca Scrucca & Adrian Raftery, 2015. "Improved initialisation of model-based clustering using Gaussian hierarchical partitions," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 9(4), pages 447-460, December.
    2. Bergé, Laurent & Bouveyron, Charles & Girard, Stéphane, 2012. "HDclassif: An R Package for Model-Based Clustering and Discriminant Analysis of High-Dimensional Data," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 46(i06).
    3. Melnykov, Volodymyr & Melnykov, Igor, 2012. "Initializing the EM algorithm in Gaussian mixture models with an unknown number of components," Computational Statistics & Data Analysis, Elsevier, vol. 56(6), pages 1381-1395.
    4. Scrucca, Luca, 2013. "GA: A Package for Genetic Algorithms in R," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 53(i04).
    5. Chris Fraley & Adrian E. Raftery, 2007. "Bayesian Regularization for Normal Mixture Estimation and Model-Based Clustering," Journal of Classification, Springer;The Classification Society, vol. 24(2), pages 155-181, September.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Nagode, Marko & Oman, Simon & Klemenc, Jernej & Panić, Branislav, 2023. "Gumbel mixture modelling for multiple failure data," Reliability Engineering and System Safety, Elsevier, vol. 230(C).
    2. Branislav Panić & Marko Nagode & Jernej Klemenc & Simon Oman, 2022. "On Methods for Merging Mixture Model Components Suitable for Unsupervised Image Segmentation Tasks," Mathematics, MDPI, vol. 10(22), pages 1-22, November.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Bouveyron, Charles & Brunet-Saumard, Camille, 2014. "Model-based clustering of high-dimensional data: A review," Computational Statistics & Data Analysis, Elsevier, vol. 71(C), pages 52-78.
    2. Bergeaud, Antonin & Raimbault, Juste, 2020. "An empirical analysis of the spatial variability of fuel prices in the United States," Transportation Research Part A: Policy and Practice, Elsevier, vol. 132(C), pages 131-143.
    3. Roberto Rocci & Stefano Antonio Gattone & Roberto Di Mari, 2018. "A data driven equivariant approach to constrained Gaussian mixture modeling," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 12(2), pages 235-260, June.
    4. Zhu, Xuwen & Melnykov, Volodymyr, 2018. "Manly transformation in finite mixture modeling," Computational Statistics & Data Analysis, Elsevier, vol. 121(C), pages 190-208.
    5. Sucharitha, Rahul Srinivas & Lee, Seokcheon, 2022. "GMM clustering for in-depth food accessibility pattern exploration and prediction model of food demand behavior," Socio-Economic Planning Sciences, Elsevier, vol. 83(C).
    6. Xu, Wenjing & Pan, Qing & Gastwirth, Joseph L., 2014. "Cox proportional hazards models with frailty for negatively correlated employment processes," Computational Statistics & Data Analysis, Elsevier, vol. 70(C), pages 295-307.
    7. Lazzari, Florencia & Mor, Gerard & Cipriano, Jordi & Solsona, Francesc & Chemisana, Daniel & Guericke, Daniela, 2023. "Optimizing planning and operation of renewable energy communities with genetic algorithms," Applied Energy, Elsevier, vol. 338(C).
    8. Konon, Alexander, 2016. "Career choice under uncertainty," VfS Annual Conference 2016 (Augsburg): Demographic Change 145583, Verein für Socialpolitik / German Economic Association.
    9. Alessandro Casa & Andrea Cappozzo & Michael Fop, 2022. "Group-Wise Shrinkage Estimation in Penalized Model-Based Clustering," Journal of Classification, Springer;The Classification Society, vol. 39(3), pages 648-674, November.
    10. Olgun Aydin & Bartłomiej Igliński & Krzysztof Krukowski & Marek Siemiński, 2022. "Analyzing Wind Energy Potential Using Efficient Global Optimization: A Case Study for the City Gdańsk in Poland," Energies, MDPI, vol. 15(9), pages 1-22, April.
    11. Roberto Mari & Roberto Rocci & Stefano Antonio Gattone, 2020. "Scale-constrained approaches for maximum likelihood estimation and model selection of clusterwise linear regression models," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 29(1), pages 49-78, March.
    12. Semhar Michael & Volodymyr Melnykov, 2016. "An effective strategy for initializing the EM algorithm in finite mixture models," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 10(4), pages 563-583, December.
    13. Castellares, Fredy & Patrício, Silvio C. & Lemonte, Artur J. & Queiroz, Bernardo L., 2020. "On closed-form expressions to Gompertz–Makeham life expectancy," Theoretical Population Biology, Elsevier, vol. 134(C), pages 53-60.
    14. Dirick, Lore & Claeskens, Gerda & Baesens, Bart, 2015. "An Akaike information criterion for multiple event mixture cure models," European Journal of Operational Research, Elsevier, vol. 241(2), pages 449-457.
    15. Huan Yu & Jun Yang & Yu Zhao, 2018. "Reliability of nonrepairable phased-mission systems with common bus performance sharing," Journal of Risk and Reliability, , vol. 232(6), pages 647-660, December.
    16. Muhammet Burak Kılıç & Yusuf Şahin & Melih Burak Koca, 2021. "Genetic algorithm approach with an adaptive search space based on EM algorithm in two-component mixture Weibull parameter estimation," Computational Statistics, Springer, vol. 36(2), pages 1219-1242, June.
    17. Jelle R Dalenberg & Luca Nanetti & Remco J Renken & René A de Wijk & Gert J ter Horst, 2014. "Dealing with Consumer Differences in Liking during Repeated Exposure to Food; Typical Dynamics in Rating Behavior," PLOS ONE, Public Library of Science, vol. 9(3), pages 1-11, March.
    18. Marek Śmieja & Magdalena Wiercioch, 2017. "Constrained clustering with a complex cluster structure," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 11(3), pages 493-518, September.
    19. Luca Scrucca & Adrian Raftery, 2015. "Improved initialisation of model-based clustering using Gaussian hierarchical partitions," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 9(4), pages 447-460, December.
    20. Volodymyr Melnykov, 2013. "Finite mixture modelling in mass spectrometry analysis," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 62(4), pages 573-592, August.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:8:y:2020:i:7:p:1090-:d:379907. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.