IDEAS home Printed from https://ideas.repec.org/a/spr/advdac/v14y2020i1d10.1007_s11634-019-00365-8.html
   My bibliography  Save this article

A fragmented-periodogram approach for clustering big data time series

Author

Listed:
  • Jorge Caiado

    (Cemapre and The University of Lisbon)

  • Nuno Crato

    (Cemapre and The University of Lisbon
    Joint Research Centre)

  • Pilar Poncela

    (Joint Research Centre
    Univ Autónoma de Madrid)

Abstract

We propose and study a new frequency-domain procedure for characterizing and comparing large sets of long time series. Instead of using all the information available from data, which would be computationally very expensive, we propose some regularization rules in order to select and summarize the most relevant information for clustering purposes. Essentially, we suggest to use a fragmented periodogram computed around the driving cyclical components of interest and to compare the various estimates. This procedure is computationally simple, but able to condense relevant information of the time series. A simulation exercise shows that the smoothed fragmented periodogram works in general better than the non-smoothed one and not worse than the complete periodogram for medium to large sample sizes. We illustrate this procedure in a study of the evolution of several stock markets indices. We further show the effect of recent financial crises over these indices behaviour.

Suggested Citation

  • Jorge Caiado & Nuno Crato & Pilar Poncela, 2020. "A fragmented-periodogram approach for clustering big data time series," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 14(1), pages 117-146, March.
  • Handle: RePEc:spr:advdac:v:14:y:2020:i:1:d:10.1007_s11634-019-00365-8
    DOI: 10.1007/s11634-019-00365-8
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11634-019-00365-8
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11634-019-00365-8?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Forni, Mario & Hallin, Marc & Lippi, Marco & Reichlin, Lucrezia, 2005. "The Generalized Dynamic Factor Model: One-Sided Estimation and Forecasting," Journal of the American Statistical Association, American Statistical Association, vol. 100, pages 830-840, September.
    2. Doz, Catherine & Giannone, Domenico & Reichlin, Lucrezia, 2011. "A two-step estimator for large approximate dynamic factor models based on Kalman filtering," Journal of Econometrics, Elsevier, vol. 164(1), pages 188-205, September.
    3. Tim Bollerslev & Benjamin Hood & John Huss & Lasse Heje Pedersen, 2018. "Risk Everywhere: Modeling and Managing Volatility," The Review of Financial Studies, Society for Financial Studies, vol. 31(7), pages 2729-2773.
    4. Jorge Caiado & Nuno Crato, 2010. "Identifying common dynamic features in stock returns," Quantitative Finance, Taylor & Francis Journals, vol. 10(7), pages 797-807.
    5. Catherine Doz & Domenico Giannone & Lucrezia Reichlin, 2012. "A Quasi–Maximum Likelihood Approach for Large, Approximate Dynamic Factor Models," The Review of Economics and Statistics, MIT Press, vol. 94(4), pages 1014-1024, November.
    6. Boivin, Jean & Ng, Serena, 2006. "Are more data always better for factor analysis?," Journal of Econometrics, Elsevier, vol. 132(1), pages 169-194, May.
    7. Caiado, Jorge & Crato, Nuno & Pena, Daniel, 2006. "A periodogram-based metric for time series classification," Computational Statistics & Data Analysis, Elsevier, vol. 50(10), pages 2668-2684, June.
    8. Jushan Bai & Serena Ng, 2004. "A PANIC Attack on Unit Roots and Cointegration," Econometrica, Econometric Society, vol. 72(4), pages 1127-1177, July.
    9. repec:hal:journl:peer-00844811 is not listed on IDEAS
    10. Albert C Yang & Shih-Jen Tsai & Chen-Jee Hong & Cynthia Wang & Tai-Jui Chen & Ying-Jay Liou & Chung-Kang Peng, 2011. "Clustering Heart Rate Dynamics Is Associated with β-Adrenergic Receptor Polymorphisms: Analysis by Information-Based Similarity Index," PLOS ONE, Public Library of Science, vol. 6(5), pages 1-8, May.
    11. Clements, Michael P. & Hendry, David F. (ed.), 2011. "The Oxford Handbook of Economic Forecasting," OUP Catalogue, Oxford University Press, number 9780195398649.
    12. Stock, James H. & Watson, Mark, 2011. "Dynamic Factor Models," Scholarly Articles 28469541, Harvard University Department of Economics.
    13. Clifford Lam & Qiwei Yao & Neil Bathia, 2011. "Estimation of latent factors for high-dimensional time series," Biometrika, Biometrika Trust, vol. 98(4), pages 901-918.
    14. Lam, Clifford & Yao, Qiwei & Bathia, Neil, 2011. "Estimation of latent factors for high-dimensional time series," LSE Research Online Documents on Economics 31549, London School of Economics and Political Science, LSE Library.
    15. Domenico Piccolo, 1990. "A Distance Measure For Classifying Arima Models," Journal of Time Series Analysis, Wiley Blackwell, vol. 11(2), pages 153-164, March.
    16. Mario Forni & Marc Hallin & Marco Lippi & Lucrezia Reichlin, 2000. "The Generalized Dynamic-Factor Model: Identification And Estimation," The Review of Economics and Statistics, MIT Press, vol. 82(4), pages 540-554, November.
    17. Peter J. Diggle & Nicholas I. Fisher, 1991. "Nonparametric Comparison of Cumulative Periodograms," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 40(3), pages 423-434, November.
    18. Otranto, Edoardo, 2010. "Identifying financial time series with similar dynamic conditional correlation," Computational Statistics & Data Analysis, Elsevier, vol. 54(1), pages 1-15, January.
    19. Caiado, Jorge & Crato, Nuno & Peña, Daniel, 2009. "Comparison of time series with unequal length in the frequency domain," MPRA Paper 15310, University Library of Munich, Germany.
    20. João A. Bastos & Jorge Caiado, 2014. "Clustering financial time series with variance ratio statistics," Quantitative Finance, Taylor & Francis Journals, vol. 14(12), pages 2121-2133, December.
    21. Bai, Jushan & Ng, Serena, 2008. "Large Dimensional Factor Analysis," Foundations and Trends(R) in Econometrics, now publishers, vol. 3(2), pages 89-163, June.
    22. Stock J.H. & Watson M.W., 2002. "Forecasting Using Principal Components From a Large Number of Predictors," Journal of the American Statistical Association, American Statistical Association, vol. 97, pages 1167-1179, December.
    23. D. S. Coates & P. J. Diggle, 1986. "Tests For Comparing Two Estimated Spectral Densities," Journal of Time Series Analysis, Wiley Blackwell, vol. 7(1), pages 7-20, January.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Raffaele Mattera & Philipp Otto, 2023. "Network log-ARCH models for forecasting stock market volatility," Papers 2303.11064, arXiv.org.
    2. Lúcio, Francisco & Caiado, Jorge, 2022. "COVID-19 and Stock Market Volatility: A Clustering Approach for S&P 500 Industry Indices," Finance Research Letters, Elsevier, vol. 49(C).
    3. João A. Bastos & Jorge Caiado, 2021. "On the classification of financial data with domain agnostic features," Working Papers REM 2021/0185, ISEG - Lisbon School of Economics and Management, REM, Universidade de Lisboa.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Stock, J.H. & Watson, M.W., 2016. "Dynamic Factor Models, Factor-Augmented Vector Autoregressions, and Structural Vector Autoregressions in Macroeconomics," Handbook of Macroeconomics, in: J. B. Taylor & Harald Uhlig (ed.), Handbook of Macroeconomics, edition 1, volume 2, chapter 0, pages 415-525, Elsevier.
    2. Poncela, Pilar & Ruiz, Esther & Miranda, Karen, 2021. "Factor extraction using Kalman filter and smoothing: This is not just another survey," International Journal of Forecasting, Elsevier, vol. 37(4), pages 1399-1425.
    3. Pilar Poncela & Esther Ruiz, 2016. "Small- Versus Big-Data Factor Extraction in Dynamic Factor Models: An Empirical Assessment," Advances in Econometrics, in: Dynamic Factor Models, volume 35, pages 401-434, Emerald Group Publishing Limited.
    4. Francisco Corona & Pilar Poncela & Esther Ruiz, 2020. "Estimating Non-stationary Common Factors: Implications for Risk Sharing," Computational Economics, Springer;Society for Computational Economics, vol. 55(1), pages 37-60, January.
    5. Catherine Doz & Peter Fuleky, 2019. "Dynamic Factor Models," Working Papers 2019-4, University of Hawaii Economic Research Organization, University of Hawaii at Manoa.
    6. Catherine Doz & Peter Fuleky, 2019. "Dynamic Factor Models," PSE Working Papers halshs-02262202, HAL.
    7. Catherine Doz & Peter Fuleky, 2019. "Dynamic Factor Models," Working Papers halshs-02262202, HAL.
    8. Chiara Casoli & Riccardo (Jack) Lucchetti, 2022. "Permanent-Transitory decomposition of cointegrated time series via dynamic factor models, with an application to commodity prices [Commodity-price comovement and global economic activity]," The Econometrics Journal, Royal Economic Society, vol. 25(2), pages 494-514.
    9. Lucchetti, Riccardo & Venetis, Ioannis A., 2020. "A replication of "A quasi-maximum likelihood approach for large, approximate dynamic factor models" (Review of Economics and Statistics, 2012)," Economics - The Open-Access, Open-Assessment E-Journal (2007-2020), Kiel Institute for the World Economy (IfW Kiel), vol. 14, pages 1-14.
    10. Hallin, Marc & Lippi, Marco, 2013. "Factor models in high-dimensional time series—A time-domain approach," Stochastic Processes and their Applications, Elsevier, vol. 123(7), pages 2678-2695.
    11. Jörg Breitung & In Choi, 2013. "Factor models," Chapters, in: Nigar Hashimzade & Michael A. Thornton (ed.), Handbook of Research Methods and Applications in Empirical Macroeconomics, chapter 11, pages 249-265, Edward Elgar Publishing.
      • In Choi & Jorg Breitung, 2011. "Factor models," Working Papers 1121, Nam Duck-Woo Economic Research Institute, Sogang University (Former Research Institute for Market Economy), revised Dec 2011.
    12. Helmut Lütkepohl, 2014. "Structural Vector Autoregressive Analysis in a Data Rich Environment: A Survey," Discussion Papers of DIW Berlin 1351, DIW Berlin, German Institute for Economic Research.
    13. Jianqing Fan & Yuan Liao & Martina Mincheva, 2013. "Large covariance estimation by thresholding principal orthogonal complements," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 75(4), pages 603-680, September.
    14. Kihwan Kim & Norman Swanson, 2013. "Diffusion Index Model Specification and Estimation Using Mixed Frequency Datasets," Departmental Working Papers 201315, Rutgers University, Department of Economics.
    15. Claudio Morana, 2010. "Heteroskedastic Factor Vector Autoregressive Estimation of Persistent and Non Persistent Processes Subject to Structural Breaks," ICER Working Papers - Applied Mathematics Series 36-2010, ICER - International Centre for Economic Research.
    16. Matteo Barigozzi & Marc Hallin, 2023. "Dynamic Factor Models: a Genealogy," Papers 2310.17278, arXiv.org, revised Jan 2024.
    17. Rua, António, 2017. "A wavelet-based multivariate multiscale approach for forecasting," International Journal of Forecasting, Elsevier, vol. 33(3), pages 581-590.
    18. Ergemen, Yunus Emre & Rodríguez-Caballero, C. Vladimir, 2023. "Estimation of a dynamic multi-level factor model with possible long-range dependence," International Journal of Forecasting, Elsevier, vol. 39(1), pages 405-430.
    19. Banerjee, Anindya & Marcellino, Massimiliano & Masten, Igor, 2014. "Forecasting with factor-augmented error correction models," International Journal of Forecasting, Elsevier, vol. 30(3), pages 589-612.
    20. Tóth, Peter, 2014. "Malý dynamický faktorový model na krátkodobé prognózovanie slovenského HDP [A Small Dynamic Factor Model for the Short-Term Forecasting of Slovak GDP]," MPRA Paper 63713, University Library of Munich, Germany.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:advdac:v:14:y:2020:i:1:d:10.1007_s11634-019-00365-8. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.