IDEAS home Printed from https://ideas.repec.org/a/bla/biomet/v79y2023i3p1775-1787.html
   My bibliography  Save this article

A Bayesian multivariate mixture model for high throughput spatial transcriptomics

Author

Listed:
  • Carter Allen
  • Yuzhou Chang
  • Brian Neelon
  • Won Chang
  • Hang J. Kim
  • Zihai Li
  • Qin Ma
  • Dongjun Chung

Abstract

High throughput spatial transcriptomics (HST) is a rapidly emerging class of experimental technologies that allow for profiling gene expression in tissue samples at or near single‐cell resolution while retaining the spatial location of each sequencing unit within the tissue sample. Through analyzing HST data, we seek to identify sub‐populations of cells within a tissue sample that may inform biological phenomena. Existing computational methods either ignore the spatial heterogeneity in gene expression profiles, fail to account for important statistical features such as skewness, or are heuristic‐based network clustering methods that lack the inferential benefits of statistical modeling. To address this gap, we develop SPRUCE: a Bayesian spatial multivariate finite mixture model based on multivariate skew‐normal distributions, which is capable of identifying distinct cellular sub‐populations in HST data. We further implement a novel combination of Pólya–Gamma data augmentation and spatial random effects to infer spatially correlated mixture component membership probabilities without relying on approximate inference techniques. Via a simulation study, we demonstrate the detrimental inferential effects of ignoring skewness or spatial correlation in HST data. Using publicly available human brain HST data, SPRUCE outperforms existing methods in recovering expertly annotated brain layers. Finally, our application of SPRUCE to human breast cancer HST data indicates that SPRUCE can distinguish distinct cell populations within the tumor microenvironment. An R package spruce for fitting the proposed models is available through The Comprehensive R Archive Network.

Suggested Citation

  • Carter Allen & Yuzhou Chang & Brian Neelon & Won Chang & Hang J. Kim & Zihai Li & Qin Ma & Dongjun Chung, 2023. "A Bayesian multivariate mixture model for high throughput spatial transcriptomics," Biometrics, The International Biometric Society, vol. 79(3), pages 1775-1787, September.
  • Handle: RePEc:bla:biomet:v:79:y:2023:i:3:p:1775-1787
    DOI: 10.1111/biom.13727
    as

    Download full text from publisher

    File URL: https://doi.org/10.1111/biom.13727
    Download Restriction: no

    File URL: https://libkey.io/10.1111/biom.13727?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Lawrence Hubert & Phipps Arabie, 1985. "Comparing partitions," Journal of Classification, Springer;The Classification Society, vol. 2(1), pages 193-218, December.
    2. Brian Neelon & Alan E. Gelfand & Marie Lynn Miranda, 2014. "A multivariate spatial mixture model for areal data: examining regional differences in standardized test scores," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 63(5), pages 737-761, November.
    3. Nicholas G. Polson & James G. Scott & Jesse Windle, 2013. "Bayesian Inference for Logistic Models Using Pólya--Gamma Latent Variables," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 108(504), pages 1339-1349, December.
    4. Madhav Mantri & Gaetano J. Scuderi & Roozbeh Abedini-Nassab & Michael F. Z. Wang & David McKellar & Hao Shi & Benjamin Grodner & Jonathan T. Butcher & Iwijn De Vlaminck, 2021. "Spatiotemporal single-cell RNA sequencing of developing chicken hearts identifies interplay between cellular differentiation and morphogenesis," Nature Communications, Nature, vol. 12(1), pages 1-13, December.
    5. Carter Allen & Sara E. Benjamin‐Neelon & Brian Neelon, 2021. "A Bayesian multivariate mixture model for skewed longitudinal data with intermittent missing observations: An application to infant motor development," Biometrics, The International Biometric Society, vol. 77(2), pages 675-688, June.
    6. David J. Spiegelhalter & Nicola G. Best & Bradley P. Carlin & Angelika Van Der Linde, 2002. "Bayesian measures of model complexity and fit," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 64(4), pages 583-639, October.
    7. Eddelbuettel, Dirk & Francois, Romain, 2011. "Rcpp: Seamless R and C++ Integration," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 40(i08).
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Buddhavarapu, Prasad & Scott, James G. & Prozzi, Jorge A., 2016. "Modeling unobserved heterogeneity using finite mixture random parameters for spatially correlated discrete count data," Transportation Research Part B: Methodological, Elsevier, vol. 91(C), pages 492-510.
    2. Buddhavarapu, Prasad & Bansal, Prateek & Prozzi, Jorge A., 2021. "A new spatial count data model with time-varying parameters," Transportation Research Part B: Methodological, Elsevier, vol. 150(C), pages 566-586.
    3. Simon Mak & Derek Bingham & Yi Lu, 2016. "A regional compound Poisson process for hurricane and tropical storm damage," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 65(5), pages 677-703, November.
    4. James Joseph Balamuta & Steven Andrew Culpepper, 2022. "Exploratory Restricted Latent Class Models with Monotonicity Requirements under PÒLYA–GAMMA Data Augmentation," Psychometrika, Springer;The Psychometric Society, vol. 87(3), pages 903-945, September.
    5. Martina Sundqvist & Julien Chiquet & Guillem Rigaill, 2023. "Adjusting the adjusted Rand Index," Computational Statistics, Springer, vol. 38(1), pages 327-347, March.
    6. Yuan Fang & Dimitris Karlis & Sanjeena Subedi, 2022. "Infinite Mixtures of Multivariate Normal-Inverse Gaussian Distributions for Clustering of Skewed Data," Journal of Classification, Springer;The Classification Society, vol. 39(3), pages 510-552, November.
    7. Sylvia Frühwirth-Schnatter & Gertraud Malsiner-Walli, 2019. "From here to infinity: sparse finite versus Dirichlet process mixtures in model-based clustering," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 13(1), pages 33-64, March.
    8. Evelina Gabasova & John Reid & Lorenz Wernisch, 2017. "Clusternomics: Integrative context-dependent clustering for heterogeneous datasets," PLOS Computational Biology, Public Library of Science, vol. 13(10), pages 1-29, October.
    9. Catalina A. Vallejos & Mark F. J. Steel, 2017. "Bayesian survival modelling of university outcomes," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 180(2), pages 613-631, February.
    10. Matthew D. Koslovsky, 2023. "A Bayesian zero‐inflated Dirichlet‐multinomial regression model for multivariate compositional count data," Biometrics, The International Biometric Society, vol. 79(4), pages 3239-3251, December.
    11. Anoek Castelein & Dennis Fok & Richard Paap, 2019. "Dynamics in clickthrough and conversion probabilities of paid search advertisements," Tinbergen Institute Discussion Papers 19-056/III, Tinbergen Institute.
    12. Lulu Shang & Xiang Zhou, 2022. "Spatially aware dimension reduction for spatial transcriptomics," Nature Communications, Nature, vol. 13(1), pages 1-22, December.
    13. Xia, Ye-Mao & Tang, Nian-Sheng, 2019. "Bayesian analysis for mixture of latent variable hidden Markov models with multivariate longitudinal data," Computational Statistics & Data Analysis, Elsevier, vol. 132(C), pages 190-211.
    14. Reem Aljarallah & Samer A Kharroubi, 2021. "Use of Bayesian Markov Chain Monte Carlo Methods to Model Kuwait Medical Genetic Center Data: An Application to Down Syndrome and Mental Retardation," Mathematics, MDPI, vol. 9(3), pages 1-11, January.
    15. Marco Berrettini & Giuliano Galimberti & Saverio Ranciati, 2023. "Semiparametric finite mixture of regression models with Bayesian P-splines," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 17(3), pages 745-775, September.
    16. Shonosuke Sugasawa & Kosuke Morikawa & Keisuke Takahata, 2022. "Bayesian semiparametric modeling of response mechanism for nonignorable missing data," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 31(1), pages 101-117, March.
    17. George Gerogiannis & Mark Tranmer & Duncan Lee & Thomas Valente, 2022. "A Bayesian spatio‐network model for multiple adolescent adverse health behaviours," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 71(2), pages 271-287, March.
    18. Quentin F. Gronau & Eric-Jan Wagenmakers & Daniel W. Heck & Dora Matzke, 2019. "A Simple Method for Comparing Complex Models: Bayesian Model Comparison for Hierarchical Multinomial Processing Tree Models Using Warp-III Bridge Sampling," Psychometrika, Springer;The Psychometric Society, vol. 84(1), pages 261-284, March.
    19. Victor Muthama Musau & Carlo Gaetan & Paolo Girardi, 2022. "Clustering of bivariate satellite time series: A quantile approach," Environmetrics, John Wiley & Sons, Ltd., vol. 33(7), November.
    20. Sanjeena Subedi & Paul D. McNicholas, 2021. "A Variational Approximations-DIC Rubric for Parameter Estimation and Mixture Model Selection Within a Family Setting," Journal of Classification, Springer;The Classification Society, vol. 38(1), pages 89-108, April.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:biomet:v:79:y:2023:i:3:p:1775-1787. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: http://www.blackwellpublishing.com/journal.asp?ref=0006-341X .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.