IDEAS home Printed from https://ideas.repec.org/a/spr/advdac/v13y2019i3d10.1007_s11634-018-0335-0.html
   My bibliography  Save this article

Greedy Gaussian segmentation of multivariate time series

Author

Listed:
  • David Hallac

    (Stanford University)

  • Peter Nystrup

    (Technical University of Denmark)

  • Stephen Boyd

    (Stanford University)

Abstract

We consider the problem of breaking a multivariate (vector) time series into segments over which the data is well explained as independent samples from a Gaussian distribution. We formulate this as a covariance-regularized maximum likelihood problem, which can be reduced to a combinatorial optimization problem of searching over the possible breakpoints, or segment boundaries. This problem can be solved using dynamic programming, with complexity that grows with the square of the time series length. We propose a heuristic method that approximately solves the problem in linear time with respect to this length, and always yields a locally optimal choice, in the sense that no change of any one breakpoint improves the objective. Our method, which we call greedy Gaussian segmentation (GGS), easily scales to problems with vectors of dimension over 1000 and time series of arbitrary length. We discuss methods that can be used to validate such a model using data, and also to automatically choose appropriate values of the two hyperparameters in the method. Finally, we illustrate our GGS approach on financial time series and Wikipedia text data.

Suggested Citation

  • David Hallac & Peter Nystrup & Stephen Boyd, 2019. "Greedy Gaussian segmentation of multivariate time series," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 13(3), pages 727-751, September.
  • Handle: RePEc:spr:advdac:v:13:y:2019:i:3:d:10.1007_s11634-018-0335-0
    DOI: 10.1007/s11634-018-0335-0
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11634-018-0335-0
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11634-018-0335-0?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Bauwens, Luc & Rombouts, Jeroen V.K., 2012. "On marginal likelihood computation in change-point models," Computational Statistics & Data Analysis, Elsevier, vol. 56(11), pages 3415-3429.
    2. Ledoit, Olivier & Wolf, Michael, 2004. "A well-conditioned estimator for large-dimensional covariance matrices," Journal of Multivariate Analysis, Elsevier, vol. 88(2), pages 365-411, February.
    3. Mark Fiecas & Jürgen Franke & Rainer von Sachs & Joseph Tadjuidje Kamgaing, 2017. "Shrinkage Estimation for Multivariate Hidden Markov Models," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 112(517), pages 424-435, January.
    4. Venter, J. H. & Steel, S. J., 1996. "Finding multiple abrupt change points," Computational Statistics & Data Analysis, Elsevier, vol. 22(5), pages 481-504, September.
    5. Picard, F. & Lebarbier, E. & Budinskà, E. & Robin, S., 2011. "Joint segmentation of multivariate Gaussian processes using mixed linear models," Computational Statistics & Data Analysis, Elsevier, vol. 55(2), pages 1160-1170, February.
    6. Daniela M. Witten & Robert Tibshirani, 2009. "Covariance‐regularized regression and classification for high dimensional problems," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 71(3), pages 615-636, June.
    7. Andrew Ang & Allan Timmermann, 2012. "Regime Changes and Financial Markets," Annual Review of Financial Economics, Annual Reviews, vol. 4(1), pages 313-337, October.
    8. Daniel J. Fenn & Mason A. Porter & Stacy Williams & Mark McDonald & Neil F. Johnson & Nick S. Jones, 2010. "Temporal Evolution of Financial Market Correlations," Papers 1011.3225, arXiv.org, revised May 2011.
    9. Allou Samé & Faicel Chamroukhi & Gérard Govaert & Patrice Aknin, 2011. "Model-based clustering and segmentation of time series with changes in regime," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 5(4), pages 301-321, December.
    10. Peter Nystrup & Henrik Madsen & Erik Lindström, 2017. "Long Memory of Financial Time Series and Hidden Markov Models with Time‐Varying Parameters," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 36(8), pages 989-1002, December.
    11. De Gooijer, Jan G., 2006. "Detecting change-points in multidimensional stochastic processes," Computational Statistics & Data Analysis, Elsevier, vol. 51(3), pages 1892-1903, December.
    12. Tobias Rydén & Timo Teräsvirta & Stefan Åsbrink, 1998. "Stylized facts of daily return series and the hidden Markov model," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 13(3), pages 217-244.
    13. Peter Nystrup & Bo William Hansen & Henrik Madsen & Erik Lindström, 2016. "Detecting change points in VIX and S&P 500: A new approach to dynamic asset allocation," Journal of Asset Management, Palgrave Macmillan, vol. 17(5), pages 361-374, September.
    14. Galeano, Pedro & Wied, Dominik, 2014. "Multiple break detection in the correlation structure of random variables," Computational Statistics & Data Analysis, Elsevier, vol. 76(C), pages 262-282.
    15. Jun Li, 2015. "Nonparametric multivariate statistical process control charts: a hypothesis testing-based approach," Journal of Nonparametric Statistics, Taylor & Francis Journals, vol. 27(3), pages 384-400, September.
    16. Jianhua Z. Huang & Naiping Liu & Mohsen Pourahmadi & Linxu Liu, 2006. "Covariance matrix selection and estimation via penalised normal likelihood," Biometrika, Biometrika Trust, vol. 93(1), pages 85-98, March.
    17. Robert Tibshirani & Michael Saunders & Saharon Rosset & Ji Zhu & Keith Knight, 2005. "Sparsity and smoothness via the fused lasso," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(1), pages 91-108, February.
    18. repec:ebl:ecbull:v:7:y:2004:i:3:p:1-10 is not listed on IDEAS
    19. Cheon, Sooyoung & Kim, Jaehee, 2010. "Multiple change-point detection of multivariate mean vectors with the Bayesian approach," Computational Statistics & Data Analysis, Elsevier, vol. 54(2), pages 406-415, February.
    20. Lee, Chung-Bow, 1998. "Bayesian analysis of a change-point in exponential families with applications," Computational Statistics & Data Analysis, Elsevier, vol. 27(2), pages 195-208, April.
    21. Booth, N.B. & Smith, A.F.M., 1982. "A Bayesian approach to retrospective identification of change-points," Journal of Econometrics, Elsevier, vol. 19(1), pages 7-22, May.
    22. M. Hossein Partovi & Michael Caputo, 2004. "Principal Portfolios: Recasting the Efficient Frontier," Economics Bulletin, AccessEcon, vol. 7(3), pages 1-10.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Shao, Zhen & Zheng, Qingru & Yang, Shanlin & Gao, Fei & Cheng, Manli & Zhang, Qiang & Liu, Chen, 2020. "Modeling and forecasting the electricity clearing price: A novel BELM based pattern classification framework and a comparative analytic study on multi-layer BELM and LSTM," Energy Economics, Elsevier, vol. 86(C).
    2. David Degras, 2021. "Sparse group fused lasso for model segmentation: a hybrid approach," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 15(3), pages 625-671, September.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Peter Nystrup & Stephen Boyd & Erik Lindström & Henrik Madsen, 2019. "Multi-period portfolio selection with drawdown control," Annals of Operations Research, Springer, vol. 282(1), pages 245-271, November.
    2. Marcelo Lewin & Carlos Heitor Campani, 2023. "Constrained portfolio strategies in a regime-switching economy," Financial Markets and Portfolio Management, Springer;Swiss Society for Financial Market Research, vol. 37(1), pages 27-59, March.
    3. Bastidon, Cécile & Parent, Antoine & Jensen, Pablo & Abry, Patrice & Borgnat, Pierre, 2020. "Graph-based era segmentation of international financial integration," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 539(C).
    4. Lam, Clifford, 2020. "High-dimensional covariance matrix estimation," LSE Research Online Documents on Economics 101667, London School of Economics and Political Science, LSE Library.
    5. Gautam Sabnis & Debdeep Pati & Anirban Bhattacharya, 2019. "Compressed Covariance Estimation with Automated Dimension Learning," Sankhya A: The Indian Journal of Statistics, Springer;Indian Statistical Institute, vol. 81(2), pages 466-481, December.
    6. Bastidon, Cécile & Parent, Antoine & Jensen, Pablo & Abry, Patrice & Borgnat, Pierre, 2020. "Graph-based era segmentation of international financial integration," Physica A: Statistical Mechanics and its Applications, Elsevier, vol. 539(C).
    7. Chen, Cathy W.S. & Chan, Jennifer S.K. & So, Mike K.P. & Lee, Kevin K.M., 2011. "Classification in segmented regression problems," Computational Statistics & Data Analysis, Elsevier, vol. 55(7), pages 2276-2287, July.
    8. Xiaoping Zhou & Dmitry Malioutov & Frank J. Fabozzi & Svetlozar T. Rachev, 2014. "Smooth monotone covariance for elliptical distributions and applications in finance," Quantitative Finance, Taylor & Francis Journals, vol. 14(9), pages 1555-1571, September.
    9. Ollier, Edouard & Samson, Adeline & Delavenne, Xavier & Viallon, Vivian, 2016. "A SAEM algorithm for fused lasso penalized NonLinear Mixed Effect Models: Application to group comparison in pharmacokinetics," Computational Statistics & Data Analysis, Elsevier, vol. 95(C), pages 207-221.
    10. M Hashem Pesaran & Takashi Yamagata, 2012. "Testing CAPM with a Large Number of Assets," Discussion Papers 12/05, Department of Economics, University of York.
    11. Chi, Eric C. & Lange, Kenneth, 2014. "Stable estimation of a covariance matrix guided by nuclear norm penalties," Computational Statistics & Data Analysis, Elsevier, vol. 80(C), pages 117-128.
    12. Focardi, Sergio M. & Fabozzi, Frank J. & Mazza, Davide, 2019. "Modeling local trends with regime shifting models with time-varying probabilities," International Review of Financial Analysis, Elsevier, vol. 66(C).
    13. Natalia Bailey & George Kapetanios & M. Hashem Pesaran, 2019. "Exponent of Cross-sectional Dependence for Residuals," Sankhya B: The Indian Journal of Statistics, Springer;Indian Statistical Institute, vol. 81(1), pages 46-102, September.
    14. Lim Hao Shen Keith, 2024. "Covariance Matrix Analysis for Optimal Portfolio Selection," Papers 2407.08748, arXiv.org.
    15. Bailey, Natalia & Pesaran, M. Hashem & Smith, L. Vanessa, 2019. "A multiple testing approach to the regularisation of large sample correlation matrices," Journal of Econometrics, Elsevier, vol. 208(2), pages 507-534.
    16. M Hashem Pesaran & Takashi Yamagata, 2024. "Testing for Alpha in Linear Factor Pricing Models with a Large Number of Securities," Journal of Financial Econometrics, Oxford University Press, vol. 22(2), pages 407-460.
    17. Lončarski, Igor & Vidovič, Luka, 2019. "Sorting out the financials: Making economic sense out of statistical factors," Finance Research Letters, Elsevier, vol. 31(C), pages 110-118.
    18. Margherita Giuzio & Sandra Paterlini, 2019. "Un-diversifying during crises: Is it a good idea?," Computational Management Science, Springer, vol. 16(3), pages 401-432, July.
    19. Pedro Galeano & Dominik Wied, 2017. "Dating multiple change points in the correlation matrix," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 26(2), pages 331-352, June.
    20. Xingqi Du & Subhashis Ghosal, 2018. "Bayesian Discriminant Analysis Using a High Dimensional Predictor," Sankhya A: The Indian Journal of Statistics, Springer;Indian Statistical Institute, vol. 80(1), pages 112-145, December.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:advdac:v:13:y:2019:i:3:d:10.1007_s11634-018-0335-0. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.