IDEAS home Printed from https://ideas.repec.org/a/eee/jmvana/v99y2008i3p490-509.html
   My bibliography  Save this article

Grouped Dirichlet distribution: A new tool for incomplete categorical data analysis

Author

Listed:
  • Ng, Kai Wang
  • Tang, Man-Lai
  • Tan, Ming
  • Tian, Guo-Liang

Abstract

Motivated by the likelihood functions of several incomplete categorical data, this article introduces a new family of distributions, grouped Dirichlet distributions (GDD), which includes the classical Dirichlet distribution (DD) as a special case. First, we develop distribution theory for the GDD in its own right. Second, we use this expanded family as a new tool for statistical analysis of incomplete categorical data. Starting with a GDD with two partitions, we derive its stochastic representation that provides a simple procedure for simulation. Other properties such as mixed moments, mode, marginal and conditional distributions are also derived. The general GDD with more than two partitions is considered in a parallel manner. Three data sets from a case-control study, a leprosy survey, and a neurological study are used to illustrate how the GDD can be used as a new tool for analyzing incomplete categorical data. Our approach based on GDD has at least two advantages over the commonly used approach based on the DD in both frequentist and conjugate Bayesian inference: (a) in some cases, both the maximum likelihood and Bayes estimates have closed-form expressions in the new approach, but not so when they are based on the commonly-used approach; and (b) even if a closed-form solution is not available, the EM and data augmentation algorithms in the new approach converge much faster than in the commonly-used approach.

Suggested Citation

  • Ng, Kai Wang & Tang, Man-Lai & Tan, Ming & Tian, Guo-Liang, 2008. "Grouped Dirichlet distribution: A new tool for incomplete categorical data analysis," Journal of Multivariate Analysis, Elsevier, vol. 99(3), pages 490-509, March.
  • Handle: RePEc:eee:jmvana:v:99:y:2008:i:3:p:490-509
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0047-259X(07)00012-7
    Download Restriction: Full text for ScienceDirect subscribers only
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Liu, Chuanhai, 1999. "Efficient ML Estimation of the Multivariate Normal Distribution from Incomplete Data," Journal of Multivariate Analysis, Elsevier, vol. 69(2), pages 206-217, May.
    2. Zhi Geng & Kang Wan & Feng Tao, 2000. "Mixed Graphical Models with Missing Data and the Partial Imputation EM Algorithm," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 27(3), pages 433-444, September.
    3. S. C. Choi & D. M. Stablein, 1982. "Practical Tests for Comparing Two Proportions with Incomplete Data," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 31(3), pages 256-262, November.
    4. Geng, Zhi & Li, Kaican, 2003. "Factorization of posteriors and partial imputation algorithm for graphical models with missing data," Statistics & Probability Letters, Elsevier, vol. 64(4), pages 369-379, October.
    5. Gupta, Rameshwar D. & Richards, Donald St.P., 1987. "Multivariate Liouville distributions," Journal of Multivariate Analysis, Elsevier, vol. 23(2), pages 233-256, December.
    6. Gupta, Rameshwar D. & Richards, Donald St. P., 1992. "Multivariate Liouville distributions, III," Journal of Multivariate Analysis, Elsevier, vol. 43(1), pages 29-57, October.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Li, Huiqiong & Tian, Guoliang & Tang, Niansheng & Cao, Hongyuan, 2018. "Assessing non-inferiority for incomplete paired-data under non-ignorable missing mechanism," Computational Statistics & Data Analysis, Elsevier, vol. 127(C), pages 69-81.
    2. Jamotton, Charlotte & Hainaut, Donatien, 2024. "Latent Dirichlet Allocation for structured insurance data," LIDAM Discussion Papers ISBA 2024008, Université catholique de Louvain, Institute of Statistics, Biostatistics and Actuarial Sciences (ISBA).
    3. Tian, Guo-Liang & Tang, Man-Lai & Yuen, Kam Chuen & Ng, Kai Wang, 2010. "Further properties and new applications of the nested Dirichlet distribution," Computational Statistics & Data Analysis, Elsevier, vol. 54(2), pages 394-405, February.
    4. Nguyen, H.D. & Gouno, E., 2020. "Bayesian inference for Common cause failure rate based on causal inference with missing data," Reliability Engineering and System Safety, Elsevier, vol. 197(C).
    5. Qiu, Shi-Fang & Poon, Wai-Yin & Tang, Man-Lai, 2016. "Confidence intervals for an ordinal effect size measure based on partially validated series," Computational Statistics & Data Analysis, Elsevier, vol. 103(C), pages 170-192.
    6. Ongaro, A. & Migliorati, S., 2013. "A generalization of the Dirichlet distribution," Journal of Multivariate Analysis, Elsevier, vol. 114(C), pages 412-426.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Tang, Man-Lai & Wang Ng, Kai & Tian, Guo-Liang & Tan, Ming, 2007. "On improved EM algorithm and confidence interval construction for incomplete rxc tables," Computational Statistics & Data Analysis, Elsevier, vol. 51(6), pages 2919-2933, March.
    2. Ongaro, A. & Migliorati, S., 2013. "A generalization of the Dirichlet distribution," Journal of Multivariate Analysis, Elsevier, vol. 114(C), pages 412-426.
    3. Edward Hoyle & Levent Ali Menguturk, 2020. "Generalised Liouville Processes and their Properties," Papers 2003.11312, arXiv.org, revised May 2020.
    4. Tian, Guo-Liang & Tang, Man-Lai & Yuen, Kam Chuen & Ng, Kai Wang, 2010. "Further properties and new applications of the nested Dirichlet distribution," Computational Statistics & Data Analysis, Elsevier, vol. 54(2), pages 394-405, February.
    5. Geng, Zhi & Wang, Chi & Zhao, Qiang, 2005. "Decomposition of search for v-structures in DAGs," Journal of Multivariate Analysis, Elsevier, vol. 96(2), pages 282-294, October.
    6. Bhattacharya, P. K. & Burman, Prabir, 1998. "Semiparametric Estimation in the Multivariate Liouville Model," Journal of Multivariate Analysis, Elsevier, vol. 65(1), pages 1-18, April.
    7. McNeil, Alexander J. & Neslehová, Johanna, 2010. "From Archimedean to Liouville copulas," Journal of Multivariate Analysis, Elsevier, vol. 101(8), pages 1772-1790, September.
    8. Jones, M.C. & Marchand, Éric, 2019. "Multivariate discrete distributions via sums and shares," Journal of Multivariate Analysis, Elsevier, vol. 171(C), pages 83-93.
    9. Xiaoqian Sun & Dongchu Sun, 2007. "Estimation of a Multivariate Normal Covariance Matrix with Staircase Pattern Data," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 59(2), pages 211-233, June.
    10. Denuit, Michel & Robert, Christian Y., 2020. "Conditional tail expectation decomposition and conditional mean risk sharing for dependent and conditionally independent risks," LIDAM Discussion Papers ISBA 2020018, Université catholique de Louvain, Institute of Statistics, Biostatistics and Actuarial Sciences (ISBA).
    11. Bhattacharya, Bhaskar, 2006. "Maximum entropy characterizations of the multivariate Liouville distributions," Journal of Multivariate Analysis, Elsevier, vol. 97(6), pages 1272-1283, July.
    12. Geng, Zhi & He, Yang-Bo & Wang, Xue-Li & Zhao, Qiang, 2003. "Bayesian method for learning graphical models with incompletely categorical data," Computational Statistics & Data Analysis, Elsevier, vol. 44(1-2), pages 175-192, October.
    13. Volkmar Henschel, 2002. "Statistical inference in simplicially contoured sample distributions," Metrika: International Journal for Theoretical and Applied Statistics, Springer, vol. 56(3), pages 215-228, December.
    14. Malini Iyengar & Dipak Dey, 2002. "A semiparametric model for compositional data analysis in presence of covariates on the simplex," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 11(2), pages 303-315, December.
    15. Wan-Lun Wang & Min Liu & Tsung-I Lin, 2017. "Robust skew-t factor analysis models for handling missing data," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 26(4), pages 649-672, November.
    16. Li, Huiqiong & Tian, Guoliang & Tang, Niansheng & Cao, Hongyuan, 2018. "Assessing non-inferiority for incomplete paired-data under non-ignorable missing mechanism," Computational Statistics & Data Analysis, Elsevier, vol. 127(C), pages 69-81.
    17. Gupta, Rameshwar D. & Richards, Donald St. P., 2002. "Moment Properties of the Multivariate Dirichlet Distributions," Journal of Multivariate Analysis, Elsevier, vol. 82(1), pages 240-262, July.
    18. Mohammed, Nawaf & Furman, Edward & Su, Jianxi, 2021. "Can a regulatory risk measure induce profit-maximizing risk capital allocations? The case of conditional tail expectation," Insurance: Mathematics and Economics, Elsevier, vol. 101(PB), pages 425-436.
    19. Fang, B. Q., 2003. "The skew elliptical distributions and their quadratic forms," Journal of Multivariate Analysis, Elsevier, vol. 87(2), pages 298-314, November.
    20. Belzile, Léo R. & Nešlehová, Johanna G., 2017. "Extremal attractors of Liouville copulas," Journal of Multivariate Analysis, Elsevier, vol. 160(C), pages 68-92.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:jmvana:v:99:y:2008:i:3:p:490-509. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/wps/find/journaldescription.cws_home/622892/description#description .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.