IDEAS home Printed from https://ideas.repec.org/a/spr/advdac/v14y2020i2d10.1007_s11634-020-00401-y.html
   My bibliography  Save this article

ParticleMDI: particle Monte Carlo methods for the cluster analysis of multiple datasets with applications to cancer subtype identification

Author

Listed:
  • Nathan Cunningham

    (University of Warwick Coventry)

  • Jim E. Griffin

    (University College London)

  • David L. Wild

    (University of Warwick Coventry)

Abstract

We present a novel nonparametric Bayesian approach for performing cluster analysis in a context where observational units have data arising from multiple sources. Our approach uses a particle Gibbs sampler for inference in which cluster allocations are jointly updated using a conditional particle filter within a Gibbs sampler, improving the mixing of the MCMC chain. We develop several approaches to improving the computational performance of our algorithm. These methods can achieve greater than an order-of-magnitude improvement in performance at no cost to accuracy and can be applied more broadly to Bayesian inference for mixture models with a single dataset. We apply our algorithm to the discovery of risk cohorts amongst 243 patients presenting with kidney renal clear cell carcinoma, using samples from the Cancer Genome Atlas, for which there are gene expression, copy number variation, DNA methylation, protein expression and microRNA data. We identify 4 distinct consensus subtypes and show they are prognostic for survival rate ( $$p

Suggested Citation

  • Nathan Cunningham & Jim E. Griffin & David L. Wild, 2020. "ParticleMDI: particle Monte Carlo methods for the cluster analysis of multiple datasets with applications to cancer subtype identification," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 14(2), pages 463-484, June.
  • Handle: RePEc:spr:advdac:v:14:y:2020:i:2:d:10.1007_s11634-020-00401-y
    DOI: 10.1007/s11634-020-00401-y
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11634-020-00401-y
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11634-020-00401-y?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Christophe Andrieu & Arnaud Doucet & Roman Holenstein, 2010. "Particle Markov chain Monte Carlo methods," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 72(3), pages 269-342, June.
    2. Evelina Gabasova & John Reid & Lorenz Wernisch, 2017. "Clusternomics: Integrative context-dependent clustering for heterogeneous datasets," PLOS Computational Biology, Public Library of Science, vol. 13(10), pages 1-29, October.
    3. Peter J. Green & Sylvia Richardson, 2001. "Modelling Heterogeneity With and Without the Dirichlet Process," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 28(2), pages 355-375, June.
    4. Nicolas Chopin, 2002. "A sequential particle filter method for static models," Biometrika, Biometrika Trust, vol. 89(3), pages 539-552, August.
    5. Huimin Li & Dong Han & Yawen Hou & Huilin Chen & Zheng Chen, 2015. "Statistical Inference Methods for Two Crossing Survival Curves: A Comparison of Methods," PLOS ONE, Public Library of Science, vol. 10(1), pages 1-18, January.
    6. Douglas Steinley & Michael Brusco, 2008. "Selection of Variables in Cluster Analysis: An Empirical Comparison of Eight Procedures," Psychometrika, Springer;The Psychometric Society, vol. 73(1), pages 125-144, March.
    7. repec:dau:papers:123456789/4648 is not listed on IDEAS
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Lingsong Meng & Dorina Avram & George Tseng & Zhiguang Huo, 2022. "Outcome‐guided sparse K‐means for disease subtype discovery via integrating phenotypic data with high‐dimensional transcriptomic data," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 71(2), pages 352-375, March.
    2. Veronica Distefano & Maria Mannone & Irene Poli, 2023. "Exploring Heterogeneity with Category and Cluster Analyses for Mixed Data," Stats, MDPI, vol. 6(3), pages 1-16, July.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Arnaud Dufays, 2016. "Evolutionary Sequential Monte Carlo Samplers for Change-Point Models," Econometrics, MDPI, vol. 4(1), pages 1-33, March.
    2. Hirokuni Iiboshi & Mototsugu Shintani & Kozo Ueda, 2022. "Estimating a Nonlinear New Keynesian Model with the Zero Lower Bound for Japan," Journal of Money, Credit and Banking, Blackwell Publishing, vol. 54(6), pages 1637-1671, September.
    3. Nicolas Chopin & Mathieu Gerber, 2017. "Sequential quasi-Monte Carlo: Introduction for Non-Experts, Dimension Reduction, Application to Partly Observed Diffusion Processes," Working Papers 2017-35, Center for Research in Economics and Statistics.
    4. Herbst, Edward & Schorfheide, Frank, 2019. "Tempered particle filtering," Journal of Econometrics, Elsevier, vol. 210(1), pages 26-44.
    5. Lau, F. Din-Houn & Gandy, Axel, 2014. "RMCMC: A system for updating Bayesian models," Computational Statistics & Data Analysis, Elsevier, vol. 80(C), pages 99-110.
    6. Salima El Kolei, 2013. "Parametric estimation of hidden stochastic model by contrast minimization and deconvolution," Metrika: International Journal for Theoretical and Applied Statistics, Springer, vol. 76(8), pages 1031-1081, November.
    7. Duan, Jin-Chuan & Fulop, Andras & Hsieh, Yu-Wei, 2020. "Data-cloning SMC2: A global optimizer for maximum likelihood estimation of latent variable models," Computational Statistics & Data Analysis, Elsevier, vol. 143(C).
    8. Axel Finke & Adam Johansen & Dario Spanò, 2014. "Static-parameter estimation in piecewise deterministic processes using particle Gibbs samplers," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 66(3), pages 577-609, June.
    9. Huang, Jing-Zhi & Ni, Jun & Xu, Li, 2022. "Leverage effect in cryptocurrency markets," Pacific-Basin Finance Journal, Elsevier, vol. 73(C).
    10. repec:bla:istatr:v:83:y:2015:i:3:p:405-435 is not listed on IDEAS
    11. Arnaud Dufays, 2014. "On the conjugacy of off-line and on-line Sequential Monte Carlo Samplers," Working Paper Research 263, National Bank of Belgium.
    12. Axel Finke & Ruth King & Alexandros Beskos & Petros Dellaportas, 2019. "Efficient Sequential Monte Carlo Algorithms for Integrated Population Models," Journal of Agricultural, Biological and Environmental Statistics, Springer;The International Biometric Society;American Statistical Association, vol. 24(2), pages 204-224, June.
    13. Emmanuel Mamatzakis & Mike Tsionas, 2018. "A Bayesian dynamic model to test persistence in funds' performance," Working Paper series 18-23, Rimini Centre for Economic Analysis.
    14. Fernández-Villaverde, J. & Rubio-Ramírez, J.F. & Schorfheide, F., 2016. "Solution and Estimation Methods for DSGE Models," Handbook of Macroeconomics, in: J. B. Taylor & Harald Uhlig (ed.), Handbook of Macroeconomics, edition 1, volume 2, chapter 0, pages 527-724, Elsevier.
    15. Ajay Jasra & Kody Law & Carina Suciu, 2020. "Advanced Multilevel Monte Carlo Methods," International Statistical Review, International Statistical Institute, vol. 88(3), pages 548-579, December.
    16. Patrick Aschermayr & Konstantinos Kalogeropoulos, 2023. "Sequential Bayesian Learning for Hidden Semi-Markov Models," Papers 2301.10494, arXiv.org.
    17. repec:wyi:journl:002173 is not listed on IDEAS
    18. N. Chopin & P. E. Jacob & O. Papaspiliopoulos, 2013. "SMC-super-2: an efficient algorithm for sequential analysis of state space models," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 75(3), pages 397-426, June.
    19. Vergé, Christelle & Morio, Jérôme & Moral, Pierre Del, 2016. "An island particle algorithm for rare event analysis," Reliability Engineering and System Safety, Elsevier, vol. 149(C), pages 63-75.
    20. Ioannis Bournakis & Mike Tsionas, 2024. "A Non‐parametric Estimation of Productivity with Idiosyncratic and Aggregate Shocks: The Role of Research and Development (R&D) and Corporate Tax," Oxford Bulletin of Economics and Statistics, Department of Economics, University of Oxford, vol. 86(3), pages 641-671, June.
    21. S. Bogan Aruoba & Pablo Cuba-Borda & Kenji Higa-Flores & Frank Schorfheide & Sergio Villalvazo, 2021. "Piecewise-Linear Approximations and Filtering for DSGE Models with Occasionally Binding Constraints," Review of Economic Dynamics, Elsevier for the Society for Economic Dynamics, vol. 41, pages 96-120, July.
    22. Arellano, Manuel & Blundell, Richard & Bonhomme, Stéphane & Light, Jack, 2024. "Heterogeneity of consumption responses to income shocks in the presence of nonlinear persistence," Journal of Econometrics, Elsevier, vol. 240(2).

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:advdac:v:14:y:2020:i:2:d:10.1007_s11634-020-00401-y. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.