IDEAS home Printed from https://ideas.repec.org/a/spr/testjl/v30y2021i3d10.1007_s11749-020-00733-z.html
   My bibliography  Save this article

Multivariate functional data modeling with time-varying clustering

Author

Listed:
  • Philip A. White

    (Brigham Young University)

  • Alan E. Gelfand

    (Duke University)

Abstract

We consider the setting of multivariate functional data collected over time at each of a set of sites. Our objective is to implement model-based clustering of the functions across the sites where we allow such clustering to vary over time. Anticipating dependence between the functions within a site as well as across sites, we model the collection of functions using a multivariate Gaussian process. With many sites and several functions at each site, we use dimension reduction to provide a computationally manageable stochastic process specification. To jointly cluster the functions, we use the Dirichlet process which enables shared labeling of the functions across the sites. Specifically, we cluster functions based on their response to exogenous variables. Though the functions arise over continuous time, clustering in continuous time is extremely computationally demanding and not of practical interest. Therefore, we employ partitioning of the timescale to capture time-varying clustering. Our illustrative setting is bivariate, monitoring ozone and PM $$_{10}$$ 10 levels over time for one year at a set of monitoring sites. The data we work with is from 24 monitoring sites in Mexico City for 2017 which record hourly ozone and PM $$_{10}$$ 10 levels. Hence, we have 48 functions to work with across 8760 hours. We provide a Gaussian process model for each function using continuous-time meteorological variables as regressors along with adjustment for daily periodicity. We interpret the similarity of functions in terms of their shape, captured through site-specific coefficients, and use these coefficients to develop the clustering.

Suggested Citation

  • Philip A. White & Alan E. Gelfand, 2021. "Multivariate functional data modeling with time-varying clustering," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 30(3), pages 586-602, September.
  • Handle: RePEc:spr:testjl:v:30:y:2021:i:3:d:10.1007_s11749-020-00733-z
    DOI: 10.1007/s11749-020-00733-z
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11749-020-00733-z
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11749-020-00733-z?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Abhirup Datta & Sudipto Banerjee & Andrew O. Finley & Alan E. Gelfand, 2016. "Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geostatistical Datasets," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 111(514), pages 800-812, April.
    2. Geweke, John & Zhou, Guofu, 1996. "Measuring the Pricing Error of the Arbitrage Pricing Theory," The Review of Financial Studies, Society for Financial Studies, vol. 9(2), pages 557-587.
    3. P. A. White & E. Porcu, 2019. "Nonseparable covariance models on circles cross time: A study of Mexico City ozone," Environmetrics, John Wiley & Sons, Ltd., vol. 30(5), August.
    4. Zhang, Hao, 2004. "Inconsistent Estimation and Asymptotically Equal Interpolations in Model-Based Geostatistics," Journal of the American Statistical Association, American Statistical Association, vol. 99, pages 250-261, January.
    5. Sudipto Banerjee & Alan E. Gelfand & Andrew O. Finley & Huiyan Sang, 2008. "Gaussian predictive process models for large spatial data sets," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 70(4), pages 825-848, September.
    6. Telesca, Donatello & Inoue, Lurdes Y.T., 2008. "Bayesian Hierarchical Curve Registration," Journal of the American Statistical Association, American Statistical Association, vol. 103, pages 328-339, March.
    7. Aguilar, Omar & West, Mike, 2000. "Bayesian Dynamic Factor Models and Portfolio Allocation," Journal of Business & Economic Statistics, American Statistical Association, vol. 18(3), pages 338-357, July.
    8. Sugar, Catherine A. & James, Gareth M., 2003. "Finding the Number of Clusters in a Dataset: An Information-Theoretic Approach," Journal of the American Statistical Association, American Statistical Association, vol. 98, pages 750-763, January.
    9. Sonia Petrone & Michele Guindani & Alan E. Gelfand, 2009. "Hybrid Dirichlet mixture models for functional data," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 71(4), pages 755-782, September.
    10. Jacques, Julien & Preda, Cristian, 2014. "Model-based clustering for multivariate functional data," Computational Statistics & Data Analysis, Elsevier, vol. 71(C), pages 92-106.
    11. Philip A. White & Alan E. Gelfand & Eliane R. Rodrigues & Guadalupe Tzintzun, 2019. "Pollution state modelling for Mexico City," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 182(3), pages 1039-1060, June.
    12. C. Abraham & P. A. Cornillon & E. Matzner‐Løber & N. Molinari, 2003. "Unsupervised Curve Clustering using B‐Splines," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 30(3), pages 581-595, September.
    13. Hogan J.W. & Tchernis R., 2004. "Bayesian Factor Analysis for Spatially Correlated Data, With Application to Summarizing Area-Level Material Deprivation From Census Data," Journal of the American Statistical Association, American Statistical Association, vol. 99, pages 314-324, January.
    14. Amandine Schmutz & Julien Jacques & Charles Bouveyron & Laurence Chèze & Pauline Martin, 2020. "Clustering multivariate functional data in group-specific functional subspaces," Computational Statistics, Springer, vol. 35(3), pages 1101-1131, September.
    15. Gelfand A.E. & Kim H-J. & Sirmans C.F. & Banerjee S., 2003. "Spatial Modeling With Spatially Varying Coefficient Processes," Journal of the American Statistical Association, American Statistical Association, vol. 98, pages 387-396, January.
    16. A. Bhattacharya & D. B. Dunson, 2011. "Sparse Bayesian infinite factor models," Biometrika, Biometrika Trust, vol. 98(2), pages 291-306.
    17. J. Ramsay, 1982. "When the data are functions," Psychometrika, Springer;The Psychometric Society, vol. 47(4), pages 379-396, December.
    18. Sahu, Sujit K. & Gelfand, Alan E. & Holland, David M., 2007. "High-Resolution SpaceTime Ozone Modeling for Assessing Trends," Journal of the American Statistical Association, American Statistical Association, vol. 102, pages 1221-1234, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Christoph Hellmayr & Alan E. Gelfand, 2021. "A Partition Dirichlet Process Model for Functional Data Analysis," Sankhya B: The Indian Journal of Statistics, Springer;Indian Statistical Institute, vol. 83(1), pages 30-65, May.
    2. Jorge Castillo-Mateo & Miguel Lafuente & Jesús Asín & Ana C. Cebrián & Alan E. Gelfand & Jesús Abaurrea, 2022. "Spatial Modeling of Day-Within-Year Temperature Time Series: An Examination of Daily Maximum Temperatures in Aragón, Spain," Journal of Agricultural, Biological and Environmental Statistics, Springer;The International Biometric Society;American Statistical Association, vol. 27(3), pages 487-505, September.
    3. Conti, Gabriella & Frühwirth-Schnatter, Sylvia & Heckman, James J. & Piatek, Rémi, 2014. "Bayesian exploratory factor analysis," Journal of Econometrics, Elsevier, vol. 183(1), pages 31-57.
    4. Sierra Pugh & Matthew J. Heaton & Jeff Svedin & Neil Hansen, 2019. "Spatiotemporal Lagged Models for Variable Rate Irrigation in Agriculture," Journal of Agricultural, Biological and Environmental Statistics, Springer;The International Biometric Society;American Statistical Association, vol. 24(4), pages 634-650, December.
    5. Kim, Hea-Jung & Choi, Taeryon & Jo, Seongil, 2016. "Bayesian factor analysis with uncertain functional constraints about factor loadings," Journal of Multivariate Analysis, Elsevier, vol. 144(C), pages 110-128.
    6. Crespo Cuaresma, Jesús & Huber, Florian & Onorante, Luca, 2020. "Fragility and the effect of international uncertainty shocks," Journal of International Money and Finance, Elsevier, vol. 108(C).
    7. Veronica J. Berrocal & Alan E. Gelfand & David M. Holland, 2012. "Space-Time Data fusion Under Error in Computer Model Output: An Application to Modeling Air Quality," Biometrics, The International Biometric Society, vol. 68(3), pages 837-848, September.
    8. Jaewoo Park & Sangwan Lee, 2022. "A projection‐based Laplace approximation for spatial latent variable models," Environmetrics, John Wiley & Sons, Ltd., vol. 33(1), February.
    9. Fang, Kuangnan & Chen, Yuanxing & Ma, Shuangge & Zhang, Qingzhao, 2022. "Biclustering analysis of functionals via penalized fusion," Journal of Multivariate Analysis, Elsevier, vol. 189(C).
    10. Kaufmann, Sylvia & Schumacher, Christian, 2019. "Bayesian estimation of sparse dynamic factor models with order-independent and ex-post mode identification," Journal of Econometrics, Elsevier, vol. 210(1), pages 116-134.
    11. S. J. Koopman & G. Mesters, 2017. "Empirical Bayes Methods for Dynamic Factor Models," The Review of Economics and Statistics, MIT Press, vol. 99(3), pages 486-498, July.
    12. Rhoden, Imke & Weller, Daniel & Voit, Ann-Katrin, 2021. "Spatio-temporal dynamics of European innovation: An exploratory approach via multivariate functional data cluster analysis," Ruhr Economic Papers 926, RWI - Leibniz-Institut für Wirtschaftsforschung, Ruhr-University Bochum, TU Dortmund University, University of Duisburg-Essen.
    13. Sabyasachi Mukhopadhyay & Joseph O. Ogutu & Gundula Bartzke & Holly T. Dublin & Hans-Peter Piepho, 2019. "Modelling Spatio-Temporal Variation in Sparse Rainfall Data Using a Hierarchical Bayesian Regression Model," Journal of Agricultural, Biological and Environmental Statistics, Springer;The International Biometric Society;American Statistical Association, vol. 24(2), pages 369-393, June.
    14. Leung, Dennis & Drton, Mathias, 2016. "Order-invariant prior specification in Bayesian factor analysis," Statistics & Probability Letters, Elsevier, vol. 111(C), pages 60-66.
    15. Jakob A. Dambon & Stefan S. Fahrländer & Saira Karlen & Manuel Lehner & Jaron Schlesinger & Fabio Sigrist & Anna Zimmermann, 2022. "Examining the vintage effect in hedonic pricing using spatially varying coefficients models: a case study of single-family houses in the Canton of Zurich," Swiss Journal of Economics and Statistics, Springer;Swiss Society of Economics and Statistics, vol. 158(1), pages 1-14, December.
    16. Amovin-Assagba, Martial & Gannaz, Irène & Jacques, Julien, 2022. "Outlier detection in multivariate functional data through a contaminated mixture model," Computational Statistics & Data Analysis, Elsevier, vol. 174(C).
    17. repec:bfi:wpaper:2014-014 is not listed on IDEAS
    18. Matthias Katzfuss & Joseph Guinness & Wenlong Gong & Daniel Zilber, 2020. "Vecchia Approximations of Gaussian-Process Predictions," Journal of Agricultural, Biological and Environmental Statistics, Springer;The International Biometric Society;American Statistical Association, vol. 25(3), pages 383-414, September.
    19. Kastner, Gregor, 2019. "Sparse Bayesian time-varying covariance estimation in many dimensions," Journal of Econometrics, Elsevier, vol. 210(1), pages 98-115.
    20. Galatia Cleanthous & Emilio Porcu & Philip White, 2021. "Regularity and approximation of Gaussian random fields evolving temporally over compact two-point homogeneous spaces," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 30(4), pages 836-860, December.
    21. Bai, Jushan & Ando, Tomohiro, 2013. "Multifactor asset pricing with a large number of observable risk factors and unobservable common and group-specific factors," MPRA Paper 52785, University Library of Munich, Germany, revised Dec 2013.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:testjl:v:30:y:2021:i:3:d:10.1007_s11749-020-00733-z. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.