IDEAS home Printed from https://ideas.repec.org/a/spr/advdac/v13y2019i3d10.1007_s11634-018-0335-0.html
   My bibliography  Save this article

Greedy Gaussian segmentation of multivariate time series

Author

Listed:
  • David Hallac

    (Stanford University)

  • Peter Nystrup

    (Technical University of Denmark)

  • Stephen Boyd

    (Stanford University)

Abstract

We consider the problem of breaking a multivariate (vector) time series into segments over which the data is well explained as independent samples from a Gaussian distribution. We formulate this as a covariance-regularized maximum likelihood problem, which can be reduced to a combinatorial optimization problem of searching over the possible breakpoints, or segment boundaries. This problem can be solved using dynamic programming, with complexity that grows with the square of the time series length. We propose a heuristic method that approximately solves the problem in linear time with respect to this length, and always yields a locally optimal choice, in the sense that no change of any one breakpoint improves the objective. Our method, which we call greedy Gaussian segmentation (GGS), easily scales to problems with vectors of dimension over 1000 and time series of arbitrary length. We discuss methods that can be used to validate such a model using data, and also to automatically choose appropriate values of the two hyperparameters in the method. Finally, we illustrate our GGS approach on financial time series and Wikipedia text data.

Suggested Citation

  • David Hallac & Peter Nystrup & Stephen Boyd, 2019. "Greedy Gaussian segmentation of multivariate time series," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 13(3), pages 727-751, September.
  • Handle: RePEc:spr:advdac:v:13:y:2019:i:3:d:10.1007_s11634-018-0335-0
    DOI: 10.1007/s11634-018-0335-0
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11634-018-0335-0
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11634-018-0335-0?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Allou Samé & Faicel Chamroukhi & Gérard Govaert & Patrice Aknin, 2011. "Model-based clustering and segmentation of time series with changes in regime," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 5(4), pages 301-321, December.
    2. Bauwens, Luc & Rombouts, Jeroen V.K., 2012. "On marginal likelihood computation in change-point models," Computational Statistics & Data Analysis, Elsevier, vol. 56(11), pages 3415-3429.
    3. Ledoit, Olivier & Wolf, Michael, 2004. "A well-conditioned estimator for large-dimensional covariance matrices," Journal of Multivariate Analysis, Elsevier, vol. 88(2), pages 365-411, February.
    4. Andrew Ang & Allan Timmermann, 2012. "Regime Changes and Financial Markets," Annual Review of Financial Economics, Annual Reviews, vol. 4(1), pages 313-337, October.
    5. Peter Nystrup & Henrik Madsen & Erik Lindström, 2017. "Long Memory of Financial Time Series and Hidden Markov Models with Time‐Varying Parameters," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 36(8), pages 989-1002, December.
    6. De Gooijer, Jan G., 2006. "Detecting change-points in multidimensional stochastic processes," Computational Statistics & Data Analysis, Elsevier, vol. 51(3), pages 1892-1903, December.
    7. Mark Fiecas & Jürgen Franke & Rainer von Sachs & Joseph Tadjuidje Kamgaing, 2017. "Shrinkage Estimation for Multivariate Hidden Markov Models," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 112(517), pages 424-435, January.
    8. Tobias Rydén & Timo Teräsvirta & Stefan Åsbrink, 1998. "Stylized facts of daily return series and the hidden Markov model," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 13(3), pages 217-244.
    9. Peter Nystrup & Bo William Hansen & Henrik Madsen & Erik Lindström, 2016. "Detecting change points in VIX and S&P 500: A new approach to dynamic asset allocation," Journal of Asset Management, Palgrave Macmillan, vol. 17(5), pages 361-374, September.
    10. Venter, J. H. & Steel, S. J., 1996. "Finding multiple abrupt change points," Computational Statistics & Data Analysis, Elsevier, vol. 22(5), pages 481-504, September.
    11. Picard, F. & Lebarbier, E. & Budinskà, E. & Robin, S., 2011. "Joint segmentation of multivariate Gaussian processes using mixed linear models," Computational Statistics & Data Analysis, Elsevier, vol. 55(2), pages 1160-1170, February.
    12. Daniela M. Witten & Robert Tibshirani, 2009. "Covariance‐regularized regression and classification for high dimensional problems," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 71(3), pages 615-636, June.
    13. Galeano, Pedro & Wied, Dominik, 2014. "Multiple break detection in the correlation structure of random variables," Computational Statistics & Data Analysis, Elsevier, vol. 76(C), pages 262-282.
    14. Jun Li, 2015. "Nonparametric multivariate statistical process control charts: a hypothesis testing-based approach," Journal of Nonparametric Statistics, Taylor & Francis Journals, vol. 27(3), pages 384-400, September.
    15. Jianhua Z. Huang & Naiping Liu & Mohsen Pourahmadi & Linxu Liu, 2006. "Covariance matrix selection and estimation via penalised normal likelihood," Biometrika, Biometrika Trust, vol. 93(1), pages 85-98, March.
    16. Robert Tibshirani & Michael Saunders & Saharon Rosset & Ji Zhu & Keith Knight, 2005. "Sparsity and smoothness via the fused lasso," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(1), pages 91-108, February.
    17. repec:ebl:ecbull:v:7:y:2004:i:3:p:1-10 is not listed on IDEAS
    18. Cheon, Sooyoung & Kim, Jaehee, 2010. "Multiple change-point detection of multivariate mean vectors with the Bayesian approach," Computational Statistics & Data Analysis, Elsevier, vol. 54(2), pages 406-415, February.
    19. Lee, Chung-Bow, 1998. "Bayesian analysis of a change-point in exponential families with applications," Computational Statistics & Data Analysis, Elsevier, vol. 27(2), pages 195-208, April.
    20. Booth, N.B. & Smith, A.F.M., 1982. "A Bayesian approach to retrospective identification of change-points," Journal of Econometrics, Elsevier, vol. 19(1), pages 7-22, May.
    21. Daniel J. Fenn & Mason A. Porter & Stacy Williams & Mark McDonald & Neil F. Johnson & Nick S. Jones, 2010. "Temporal Evolution of Financial Market Correlations," Papers 1011.3225, arXiv.org, revised May 2011.
    22. M. Hossein Partovi & Michael Caputo, 2004. "Principal Portfolios: Recasting the Efficient Frontier," Economics Bulletin, AccessEcon, vol. 7(3), pages 1-10.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Shao, Zhen & Zheng, Qingru & Yang, Shanlin & Gao, Fei & Cheng, Manli & Zhang, Qiang & Liu, Chen, 2020. "Modeling and forecasting the electricity clearing price: A novel BELM based pattern classification framework and a comparative analytic study on multi-layer BELM and LSTM," Energy Economics, Elsevier, vol. 86(C).
    2. David Degras, 2021. "Sparse group fused lasso for model segmentation: a hybrid approach," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 15(3), pages 625-671, September.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Peter Nystrup & Stephen Boyd & Erik Lindström & Henrik Madsen, 2019. "Multi-period portfolio selection with drawdown control," Annals of Operations Research, Springer, vol. 282(1), pages 245-271, November.
    2. Marcelo Lewin & Carlos Heitor Campani, 2023. "Constrained portfolio strategies in a regime-switching economy," Financial Markets and Portfolio Management, Springer;Swiss Society for Financial Market Research, vol. 37(1), pages 27-59, March.
    3. Pesaran, M. Hashem & Yamagata, Takashi, 2012. "Testing CAPM with a Large Number of Assets," IZA Discussion Papers 6469, Institute of Labor Economics (IZA).
    4. Yizhan Shu & Chenyu Yu & John M. Mulvey, 2024. "Regime-Aware Asset Allocation: a Statistical Jump Model Approach," Papers 2402.05272, arXiv.org.
    5. Firoozye, Nikan & Tan, Vincent & Zohren, Stefan, 2023. "Canonical portfolios: Optimal asset and signal combination," Journal of Banking & Finance, Elsevier, vol. 154(C).
    6. Bastidon, Cécile & Parent, Antoine & Jensen, Pablo & Abry, Patrice & Borgnat, Pierre, 2020. "Graph-based era segmentation of international financial integration," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 539(C).
    7. Xi Luo, 2011. "Recovering Model Structures from Large Low Rank and Sparse Covariance Matrix Estimation," Papers 1111.1133, arXiv.org, revised Mar 2013.
    8. Lam, Clifford, 2020. "High-dimensional covariance matrix estimation," LSE Research Online Documents on Economics 101667, London School of Economics and Political Science, LSE Library.
    9. Fan, Jianqing & Fan, Yingying & Lv, Jinchi, 2008. "High dimensional covariance matrix estimation using a factor model," Journal of Econometrics, Elsevier, vol. 147(1), pages 186-197, November.
    10. Gautam Sabnis & Debdeep Pati & Anirban Bhattacharya, 2019. "Compressed Covariance Estimation with Automated Dimension Learning," Sankhya A: The Indian Journal of Statistics, Springer;Indian Statistical Institute, vol. 81(2), pages 466-481, December.
    11. Maruotti, Antonello & Petrella, Lea & Sposito, Luca, 2021. "Hidden semi-Markov-switching quantile regression for time series," Computational Statistics & Data Analysis, Elsevier, vol. 159(C).
    12. Ma, Chenchen & Tu, Yundong, 2023. "Shrinkage estimation of multiple threshold factor models," Journal of Econometrics, Elsevier, vol. 235(2), pages 1876-1892.
    13. Bastidon, Cécile & Parent, Antoine & Jensen, Pablo & Abry, Patrice & Borgnat, Pierre, 2020. "Graph-based era segmentation of international financial integration," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 539(C).
    14. Matthias Weber & Martin Schumacher & Harald Binder, 2014. "Regularized Regression Incorporating Network Information: Simultaneous Estimation of Covariate Coefficients and Connection Signs," Tinbergen Institute Discussion Papers 14-089/I, Tinbergen Institute.
    15. Chen, Cathy W.S. & Chan, Jennifer S.K. & So, Mike K.P. & Lee, Kevin K.M., 2011. "Classification in segmented regression problems," Computational Statistics & Data Analysis, Elsevier, vol. 55(7), pages 2276-2287, July.
    16. Xiaoping Zhou & Dmitry Malioutov & Frank J. Fabozzi & Svetlozar T. Rachev, 2014. "Smooth monotone covariance for elliptical distributions and applications in finance," Quantitative Finance, Taylor & Francis Journals, vol. 14(9), pages 1555-1571, September.
    17. Ollier, Edouard & Samson, Adeline & Delavenne, Xavier & Viallon, Vivian, 2016. "A SAEM algorithm for fused lasso penalized NonLinear Mixed Effect Models: Application to group comparison in pharmacokinetics," Computational Statistics & Data Analysis, Elsevier, vol. 95(C), pages 207-221.
    18. Kang, Xiaoning & Wang, Mingqiu, 2021. "Ensemble sparse estimation of covariance structure for exploring genetic disease data," Computational Statistics & Data Analysis, Elsevier, vol. 159(C).
    19. Elizabeth Fons & Paula Dawson & Jeffrey Yau & Xiao-jun Zeng & John Keane, 2019. "A novel dynamic asset allocation system using Feature Saliency Hidden Markov models for smart beta investing," Papers 1902.10849, arXiv.org.
    20. Chi, Eric C. & Lange, Kenneth, 2014. "Stable estimation of a covariance matrix guided by nuclear norm penalties," Computational Statistics & Data Analysis, Elsevier, vol. 80(C), pages 117-128.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:advdac:v:13:y:2019:i:3:d:10.1007_s11634-018-0335-0. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.