IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v203y2025ics0167947324001786.html
   My bibliography  Save this article

Efficient Bayesian functional principal component analysis of irregularly-observed multivariate curves

Author

Listed:
  • Nolan, Tui H.
  • Richardson, Sylvia
  • Ruffieux, Hélène

Abstract

The analysis of multivariate functional curves has the potential to yield important scientific discoveries in domains such as healthcare, medicine, economics and social sciences. However, it is common for real-world settings to present longitudinal data that are both irregularly and sparsely observed, which introduces important challenges for the current functional data methodology. A Bayesian hierarchical framework for multivariate functional principal component analysis is proposed, which accommodates the intricacies of such irregular observation settings by flexibly pooling information across subjects and correlated curves. The model represents common latent dynamics via shared functional principal component scores, thereby effectively borrowing strength across curves while circumventing the computationally challenging task of estimating covariance matrices. These scores also provide a parsimonious representation of the major modes of joint variation of the curves and constitute interpretable scalar summaries that can be employed in follow-up analyses. Estimation is conducted using variational inference, ensuring that accurate posterior approximation and robust uncertainty quantification are achieved. The algorithm also introduces a novel variational message passing fragment for multivariate functional principal component Gaussian likelihood that enables modularity and reuse across models. Detailed simulations assess the effectiveness of the approach in sharing information from sparse and irregularly sampled multivariate curves. The methodology is also exploited to estimate the molecular disease courses of individual patients with SARS-CoV-2 infection and characterise patient heterogeneity in recovery outcomes; this study reveals key coordinated dynamics across the immune, inflammatory and metabolic systems, which are associated with long-COVID symptoms up to one year post disease onset. The approach is implemented in the R package bayesFPCA.

Suggested Citation

  • Nolan, Tui H. & Richardson, Sylvia & Ruffieux, Hélène, 2025. "Efficient Bayesian functional principal component analysis of irregularly-observed multivariate curves," Computational Statistics & Data Analysis, Elsevier, vol. 203(C).
  • Handle: RePEc:eee:csdana:v:203:y:2025:i:c:s0167947324001786
    DOI: 10.1016/j.csda.2024.108094
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167947324001786
    Download Restriction: Full text for ScienceDirect subscribers only.

    File URL: https://libkey.io/10.1016/j.csda.2024.108094?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. David M. Blei & Alp Kucukelbir & Jon D. McAuliffe, 2017. "Variational Inference: A Review for Statisticians," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 112(518), pages 859-877, April.
    2. Carpenter, Bob & Gelman, Andrew & Hoffman, Matthew D. & Lee, Daniel & Goodrich, Ben & Betancourt, Michael & Brubaker, Marcus & Guo, Jiqiang & Li, Peter & Riddell, Allen, 2017. "Stan: A Probabilistic Programming Language," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 76(i01).
    3. Shen, Haipeng & Huang, Jianhua Z., 2008. "Sparse principal component analysis via regularized low rank matrix approximation," Journal of Multivariate Analysis, Elsevier, vol. 99(6), pages 1015-1034, July.
    4. Sirio Legramanti & Daniele Durante & David B Dunson, 2020. "Bayesian cumulative shrinkage for infinite factorizations," Biometrika, Biometrika Trust, vol. 107(3), pages 745-752.
    5. Michael E. Tipping & Christopher M. Bishop, 1999. "Probabilistic Principal Component Analysis," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 61(3), pages 611-622.
    6. Wang Q. & Linton O. & Hardle W., 2004. "Semiparametric Regression Analysis With Missing Response at Random," Journal of the American Statistical Association, American Statistical Association, vol. 99, pages 334-345, January.
    7. Yao, Fang & Muller, Hans-Georg & Wang, Jane-Ling, 2005. "Functional Data Analysis for Sparse Longitudinal Data," Journal of the American Statistical Association, American Statistical Association, vol. 100, pages 577-590, June.
    8. Ruppert,David & Wand,M. P. & Carroll,R. J., 2003. "Semiparametric Regression," Cambridge Books, Cambridge University Press, number 9780521785167, Enero-Abr.
    9. Daniel R. Kowal & David S. Matteson & David Ruppert, 2017. "A Bayesian Multivariate Functional Dynamic Linear Model," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 112(518), pages 733-744, April.
    10. Ormerod, J. T. & Wand, M. P., 2010. "Explaining Variational Approximations," The American Statistician, American Statistical Association, vol. 64(2), pages 140-153.
    11. Jeff Goldsmith & Vadim Zipunnikov & Jennifer Schrack, 2015. "Generalized multilevel function-on-scalar regression and principal component analysis," Biometrics, The International Biometric Society, vol. 71(2), pages 344-353, June.
    12. Ruonan Li & Luo Xiao, 2023. "Latent factor model for multivariate functional data," Biometrics, The International Biometric Society, vol. 79(4), pages 3307-3318, December.
    13. A. Bhattacharya & D. B. Dunson, 2011. "Sparse Bayesian infinite factor models," Biometrika, Biometrika Trust, vol. 98(2), pages 291-306.
    14. Silvia Montagna & Surya T. Tokdar & Brian Neelon & David B. Dunson, 2012. "Bayesian Latent Factor Regression for Functional and Longitudinal Data," Biometrics, The International Biometric Society, vol. 68(4), pages 1064-1073, December.
    15. M. P. Wand, 2017. "Fast Approximate Inference for Arbitrarily Large Semiparametric Regression Models via Message Passing," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 112(517), pages 137-168, January.
    16. Carolina Lucas & Patrick Wong & Jon Klein & Tiago B. R. Castro & Julio Silva & Maria Sundaram & Mallory K. Ellingson & Tianyang Mao & Ji Eun Oh & Benjamin Israelow & Takehiro Takahashi & Maria Tokuyam, 2020. "Longitudinal analyses reveal immunological misfiring in severe COVID-19," Nature, Nature, vol. 584(7821), pages 463-469, August.
    17. L Schiavon & A Canale & D B Dunson, 2022. "Generalized infinite factorization models [A latent factor linear mixed model for high-dimensional longitudinal data analysis]," Biometrika, Biometrika Trust, vol. 109(3), pages 817-835.
    18. Ruppert,David & Wand,M. P. & Carroll,R. J., 2003. "Semiparametric Regression," Cambridge Books, Cambridge University Press, number 9780521780506, Enero-Abr.
    19. van der Linde, Angelika, 2008. "Variational Bayesian functional PCA," Computational Statistics & Data Analysis, Elsevier, vol. 53(2), pages 517-533, December.
    20. Inyoung Kim & Noah D. Cohen & Raymond J. Carroll, 2003. "Semiparametric Regression Splines in Matched Case-Control Studies," Biometrics, The International Biometric Society, vol. 59(4), pages 1158-1169, December.
    21. Clara Happ & Sonja Greven, 2018. "Multivariate Functional Principal Component Analysis for Data Observed on Different (Dimensional) Domains," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 113(522), pages 649-659, April.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Li, Yehua & Qiu, Yumou & Xu, Yuhang, 2022. "From multivariate to functional data analysis: Fundamentals, recent developments, and emerging areas," Journal of Multivariate Analysis, Elsevier, vol. 188(C).
    2. Shirun Shen & Huiya Zhou & Kejun He & Lan Zhou, 2024. "Principal Component Analysis of Two-dimensional Functional Data with Serial Correlation," Journal of Agricultural, Biological and Environmental Statistics, Springer;The International Biometric Society;American Statistical Association, vol. 29(3), pages 601-620, September.
    3. Jeff Goldsmith & Vadim Zipunnikov & Jennifer Schrack, 2015. "Generalized multilevel function-on-scalar regression and principal component analysis," Biometrics, The International Biometric Society, vol. 71(2), pages 344-353, June.
    4. Gertheiss, Jan & Goldsmith, Jeff & Staicu, Ana-Maria, 2017. "A note on modeling sparse exponential-family functional response curves," Computational Statistics & Data Analysis, Elsevier, vol. 105(C), pages 46-52.
    5. Qi Qian & Danh V. Nguyen & Esra Kürüm & Connie M. Rhee & Sudipto Banerjee & Yihao Li & Damla Şentürk, 2024. "Multivariate Varying Coefficient Spatiotemporal Model," Statistics in Biosciences, Springer;International Chinese Statistical Association, vol. 16(3), pages 761-786, December.
    6. Daniel R. Kowal & Antonio Canale, 2021. "Semiparametric Functional Factor Models with Bayesian Rank Selection," Papers 2108.02151, arXiv.org, revised May 2022.
    7. Mark J. Meyer & Haobo Cheng & Katherine Hobbs Knutson, 2023. "Bayesian Analysis of Multivariate Matched Proportions with Sparse Response," Statistics in Biosciences, Springer;International Chinese Statistical Association, vol. 15(2), pages 490-509, July.
    8. Dlugosz, Stephan & Mammen, Enno & Wilke, Ralf A., 2017. "Generalized partially linear regression with misclassified data and an application to labour market transitions," Computational Statistics & Data Analysis, Elsevier, vol. 110(C), pages 145-159.
    9. Akdeniz Duran, Esra & Härdle, Wolfgang Karl & Osipenko, Maria, 2012. "Difference based ridge and Liu type estimators in semiparametric regression models," Journal of Multivariate Analysis, Elsevier, vol. 105(1), pages 164-175.
    10. Morteza Amini & Mahdi Roozbeh & Nur Anisah Mohamed, 2024. "Separation of the Linear and Nonlinear Covariates in the Sparse Semi-Parametric Regression Model in the Presence of Outliers," Mathematics, MDPI, vol. 12(2), pages 1-17, January.
    11. Afonso, António & Alves, José & Beck, Krzysztof & Jackson, Karen, 2024. "Financial, institutional, and macroeconomic determinants of cross-country portfolio equity flows: The case of developed countries," Economic Modelling, Elsevier, vol. 141(C).
    12. Gressani, Oswaldo & Lambert, Philippe, 2021. "Laplace approximations for fast Bayesian inference in generalized additive models based on P-splines," Computational Statistics & Data Analysis, Elsevier, vol. 154(C).
    13. Daewon Yang & Taeryon Choi & Eric Lavigne & Yeonseung Chung, 2022. "Non‐parametric Bayesian covariate‐dependent multivariate functional clustering: An application to time‐series data for multiple air pollutants," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 71(5), pages 1521-1542, November.
    14. Zanin, Luca, 2023. "A flexible estimation of sectoral portfolio exposure to climate transition risks in the European stock market," Journal of Behavioral and Experimental Finance, Elsevier, vol. 39(C).
    15. Gao, Lisa & Shi, Peng, 2022. "Leveraging high-resolution weather information to predict hail damage claims: A spatial point process for replicated point patterns," Insurance: Mathematics and Economics, Elsevier, vol. 107(C), pages 161-179.
    16. Yu Liu & Chin-Shang Li, 2023. "A linear spline Cox cure model with its applications," Computational Statistics, Springer, vol. 38(2), pages 935-954, June.
    17. Elizabeth Goult & Laura Andrea Barrero Guevara & Michael Briga & Matthieu Domenech de Cellès, 2024. "Estimating the optimal age for infant measles vaccination," Nature Communications, Nature, vol. 15(1), pages 1-14, December.
    18. Caldeira, João F. & Santos, André A.P. & Torrent, Hudson S., 2023. "Semiparametric portfolios: Improving portfolio performance by exploiting non-linearities in firm characteristics," Economic Modelling, Elsevier, vol. 122(C).
    19. Julia Wrobel & Vadim Zipunnikov & Jennifer Schrack & Jeff Goldsmith, 2019. "Registration for exponential family functional data," Biometrics, The International Biometric Society, vol. 75(1), pages 48-57, March.
    20. Benjamin Owusu & Bettina Bökemeier & Alfred Greiner, 2023. "Assessing nonlinearities and heterogeneity in debt sustainability analysis: a panel spline approach," Empirical Economics, Springer, vol. 64(3), pages 1315-1346, March.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:203:y:2025:i:c:s0167947324001786. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.