IDEAS home Printed from https://ideas.repec.org/a/eee/ecosta/v22y2022icp67-97.html
   My bibliography  Save this article

Improved Inference of Gaussian Mixture Copula Model for Clustering and Reproducibility Analysis using Automatic Differentiation

Author

Listed:
  • Kasa, Siva Rajesh
  • Rajan, Vaibhav

Abstract

Copulas provide a modular parameterization of multivariate distributions that decouples the modeling of marginals from the dependencies between them. The Gaussian Mixture Copula Model (GMCM) is a highly flexible copula that can model many kinds of multi-modal dependencies, as well as asymmetric and tail dependencies. They have been effectively used in clustering non-Gaussian data and in Reproducibility Analysis, a meta-analysis method designed to verify the reliability and consistency of multiple high-throughput genomic experiments. Parameter estimation for GMCM is challenging due to its intractable likelihood. The best previous methods maximize a proxy-likelihood through a Pseudo Expectation Maximization (PEM) algorithm. No guarantees of convergence or convergence to the correct parameters are provided by those methods. Using Automatic Differentiation (AD), a method, called AD-GMCM, is developed that can maximize the exact GMCM likelihood. Simulation studies and experiments on real data show that AD-GMCM finds more accurate parameter estimates than PEM and yields better performance in clustering and reproducibility analysis. The advantages of an AD-based approach to address problems related to monotonic increase of likelihood and parameter identifiability in GMCM are discussed. The two well-known cases of degeneracy of maximum likelihood in GMM that can lead to spurious clustering solutions are analyzed for GMCM as well. The analysis reveals that, unlike GMM, GMCM is not affected in one of the cases.

Suggested Citation

  • Kasa, Siva Rajesh & Rajan, Vaibhav, 2022. "Improved Inference of Gaussian Mixture Copula Model for Clustering and Reproducibility Analysis using Automatic Differentiation," Econometrics and Statistics, Elsevier, vol. 22(C), pages 67-97.
  • Handle: RePEc:eee:ecosta:v:22:y:2022:i:c:p:67-97
    DOI: 10.1016/j.ecosta.2021.08.010
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S2452306221001040
    Download Restriction: Full text for ScienceDirect subscribers only. Contains open access articles

    File URL: https://libkey.io/10.1016/j.ecosta.2021.08.010?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Bilgrau, Anders Ellern & Eriksen, Poul Svante & Rasmussen, Jakob Gulddahl & Johnsen, Hans Erik & Dybkaer, Karen & Boegsted, Martin, 2016. "GMCM: Unsupervised Clustering and Meta-Analysis Using Gaussian Mixture Copula Models," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 70(i02).
    2. Salvatore Ingrassia, 2004. "A likelihood-based constrained algorithm for multivariate normal mixture models," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 13(2), pages 151-166, September.
    3. Genest, Christian & Nešlehová, Johanna, 2007. "A Primer on Copulas for Count Data," ASTIN Bulletin, Cambridge University Press, vol. 37(2), pages 475-515, November.
    4. Martin Bladt & Alexander J. McNeil, 2020. "Time series copula models using d-vines and v-transforms," Papers 2006.11088, arXiv.org, revised Jul 2021.
    5. Pravin Trivedi & David Zimmer, 2017. "A Note on Identification of Bivariate Copulas for Discrete Count Data," Econometrics, MDPI, vol. 5(1), pages 1-11, February.
    6. Czado, Claudia & Ivanov, Eugen & Okhrin, Yarema, 2019. "Modelling temporal dependence of realized variances with vines," Econometrics and Statistics, Elsevier, vol. 12(C), pages 198-216.
    7. Chen, Jiahua & Tan, Xianming, 2009. "Inference for multivariate normal mixtures," Journal of Multivariate Analysis, Elsevier, vol. 100(7), pages 1367-1383, August.
    8. Krupskii, Pavel & Joe, Harry, 2020. "Flexible copula models with dynamic dependence and application to financial data," Econometrics and Statistics, Elsevier, vol. 16(C), pages 148-167.
    9. Efron, Bradley, 2004. "Large-Scale Simultaneous Hypothesis Testing: The Choice of a Null Hypothesis," Journal of the American Statistical Association, American Statistical Association, vol. 99, pages 96-104, January.
    10. Skaug, Hans J. & Fournier, David A., 2006. "Automatic approximation of the marginal likelihood in non-Gaussian hierarchical models," Computational Statistics & Data Analysis, Elsevier, vol. 51(2), pages 699-709, November.
    11. García-Escudero, Luis Angel & Gordaliza, Alfonso & Greselin, Francesca & Ingrassia, Salvatore & Mayo-Iscar, Agustín, 2016. "The joint role of trimming and constraints in robust estimation for mixtures of Gaussian factor analyzers," Computational Statistics & Data Analysis, Elsevier, vol. 99(C), pages 131-147.
    12. Ingrassia, Salvatore & Rocci, Roberto, 2007. "Constrained monotone EM algorithms for finite mixture of multivariate Gaussians," Computational Statistics & Data Analysis, Elsevier, vol. 51(11), pages 5339-5351, July.
    13. Ingrassia, Salvatore & Rocci, Roberto, 2011. "Degeneracy of the EM algorithm for the MLE of multivariate Gaussian mixtures and dynamic constraints," Computational Statistics & Data Analysis, Elsevier, vol. 55(4), pages 1715-1725, April.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Colubi, Ana & Ramos-Guajardo, Ana Belén, 2023. "Fuzzy sets and (fuzzy) random sets in Econometrics and Statistics," Econometrics and Statistics, Elsevier, vol. 26(C), pages 84-98.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Roberto Rocci & Stefano Antonio Gattone & Roberto Di Mari, 2018. "A data driven equivariant approach to constrained Gaussian mixture modeling," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 12(2), pages 235-260, June.
    2. Luis Angel García-Escudero & Alfonso Gordaliza & Francesca Greselin & Salvatore Ingrassia & Agustín Mayo-Iscar, 2018. "Eigenvalues and constraints in mixture modeling: geometric and computational issues," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 12(2), pages 203-233, June.
    3. Andrews, Jeffrey L., 2018. "Addressing overfitting and underfitting in Gaussian model-based clustering," Computational Statistics & Data Analysis, Elsevier, vol. 127(C), pages 160-171.
    4. Ingrassia, Salvatore & Rocci, Roberto, 2011. "Degeneracy of the EM algorithm for the MLE of multivariate Gaussian mixtures and dynamic constraints," Computational Statistics & Data Analysis, Elsevier, vol. 55(4), pages 1715-1725, April.
    5. Pietro Coretto & Christian Hennig, 2016. "Robust Improper Maximum Likelihood: Tuning, Computation, and a Comparison With Other Methods for Robust Gaussian Clustering," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 111(516), pages 1648-1659, October.
    6. Smith, Michael Stanley, 2023. "Implicit Copulas: An Overview," Econometrics and Statistics, Elsevier, vol. 28(C), pages 81-104.
    7. Hien Nguyen & Geoffrey McLachlan, 2015. "Maximum likelihood estimation of Gaussian mixture models without matrix operations," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 9(4), pages 371-394, December.
    8. Michael Stanley Smith, 2021. "Implicit Copulas: An Overview," Papers 2109.04718, arXiv.org.
    9. Gery Geenens, 2024. "(Re-)Reading Sklar (1959)—A Personal View on Sklar’s Theorem," Mathematics, MDPI, vol. 12(3), pages 1-7, January.
    10. Roberto Mari & Roberto Rocci & Stefano Antonio Gattone, 2020. "Scale-constrained approaches for maximum likelihood estimation and model selection of clusterwise linear regression models," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 29(1), pages 49-78, March.
    11. Salvatore Ingrassia & Simona Minotti & Giorgio Vittadini, 2012. "Local Statistical Modeling via a Cluster-Weighted Approach with Elliptical Distributions," Journal of Classification, Springer;The Classification Society, vol. 29(3), pages 363-401, October.
    12. Seo, Byungtae & Kim, Daeyoung, 2012. "Root selection in normal mixture models," Computational Statistics & Data Analysis, Elsevier, vol. 56(8), pages 2454-2470.
    13. Fantazzini, Dean, 2020. "Discussing copulas with Sergey Aivazian: a memoir," MPRA Paper 102317, University Library of Munich, Germany.
    14. Darolles, Serge & Fol, Gaëlle Le & Lu, Yang & Sun, Ran, 2019. "Bivariate integer-autoregressive process with an application to mutual fund flows," Journal of Multivariate Analysis, Elsevier, vol. 173(C), pages 181-203.
    15. Mauro Laudicella & Paolo Li Donni, 2022. "The dynamic interdependence in the demand of primary and emergency secondary care: A hidden Markov approach," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 37(3), pages 521-536, April.
    16. Chi, Eric C. & Lange, Kenneth, 2014. "Stable estimation of a covariance matrix guided by nuclear norm penalties," Computational Statistics & Data Analysis, Elsevier, vol. 80(C), pages 117-128.
    17. Seo, Byungtae & Lindsay, Bruce G., 2010. "A computational strategy for doubly smoothed MLE exemplified in the normal mixture model," Computational Statistics & Data Analysis, Elsevier, vol. 54(8), pages 1930-1941, August.
    18. Lloyd-Jones, Luke R. & Nguyen, Hien D. & McLachlan, Geoffrey J., 2018. "A globally convergent algorithm for lasso-penalized mixture of linear regression models," Computational Statistics & Data Analysis, Elsevier, vol. 119(C), pages 19-38.
    19. Kimberly F. Sellers & Tong Li & Yixuan Wu & Narayanaswamy Balakrishnan, 2021. "A Flexible Multivariate Distribution for Correlated Count Data," Stats, MDPI, vol. 4(2), pages 1-19, April.
    20. Andrea Cappozzo & Francesca Greselin & Thomas Brendan Murphy, 2020. "A robust approach to model-based classification based on trimming and constraints," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 14(2), pages 327-354, June.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:ecosta:v:22:y:2022:i:c:p:67-97. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: https://www.journals.elsevier.com/econometrics-and-statistics .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.