IDEAS home Printed from https://ideas.repec.org/a/spr/jclass/v31y2014i2p154-178.html
   My bibliography  Save this article

A Run Length Transformation for Discriminating Between Auto Regressive Time Series

Author

Listed:
  • Anthony Bagnall
  • Gareth Janacek

Abstract

We describe a simple time series transformation to detect differences in series that can be accurately modelled as stationary autoregressive (AR) processes. The transformation involves forming the histogram of above and below the mean run lengths. The run length (RL) transformation has the benefits of being very fast, compact and updatable for new data in constant time. Furthermore, it can be generated directly from data that has already been highly compressed. We first establish the theoretical asymptotic relationship between run length distributions and AR models through consideration of the zero crossing probability and the distribution of runs. We benchmark our transformation against two alternatives: the truncated Autocorrelation function (ACF) transform and the AR transformation, which involves the standard method of fitting the partial autocorrelation coefficients with the Durbin-Levinson recursions and using the Akaike Information Criterion stopping procedure. Whilst optimal in the idealized scenario, representing the data in these ways is time consuming and the representation cannot be updated online for new data. We show that for classification problems the accuracy obtained through using the run length distribution tends towards that obtained from using the full fitted models. We then propose three alternative distance measures for run length distributions based on Gower’s general similarity coefficient, the likelihood ratio and dynamic time warping (DTW). Through simulated classification experiments we show that a nearest neighbour distance based on DTW converges to the optimal faster than classifiers based on Euclidean distance, Gower’s coefficient and the likelihood ratio. We experiment with a variety of classifiers and demonstrate that although the RL transform requires more data than the best performing classifier to achieve the same accuracy as AR or ACF, this factor is at worst non-increasing with the series length, m, whereas the relative time taken to fit AR and ACF increases with m. We conclude that if the data is stationary and can be suitably modelled by an AR series, and if time is an important factor in reaching a discriminatory decision, then the run length distribution transform is a simple and effective transformation to use. Copyright Springer Science+Business Media New York 2014

Suggested Citation

  • Anthony Bagnall & Gareth Janacek, 2014. "A Run Length Transformation for Discriminating Between Auto Regressive Time Series," Journal of Classification, Springer;The Classification Society, vol. 31(2), pages 154-178, July.
  • Handle: RePEc:spr:jclass:v:31:y:2014:i:2:p:154-178
    DOI: 10.1007/s00357-013-9135-6
    as

    Download full text from publisher

    File URL: http://hdl.handle.net/10.1007/s00357-013-9135-6
    Download Restriction: Access to full text is restricted to subscribers.

    File URL: https://libkey.io/10.1007/s00357-013-9135-6?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Maharaj, E.A., 1994. "A Significance Test for Classifying ARMA Models," Monash Econometrics and Business Statistics Working Papers 18/94, Monash University, Department of Econometrics and Business Statistics.
    2. Corduas, Marcella & Piccolo, Domenico, 2008. "Time series clustering and classification by the autoregressive metric," Computational Statistics & Data Analysis, Elsevier, vol. 52(4), pages 1860-1872, January.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Patrick Toman & Nalini Ravishanker & Sanguthevar Rajasekaran & Nathan Lally, 2023. "Online Evidential Nearest Neighbour Classification for Internet of Things Time Series," International Statistical Review, International Statistical Institute, vol. 91(3), pages 395-426, December.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Beibei Zhang & Rong Chen, 2018. "Nonlinear Time Series Clustering Based on Kolmogorov-Smirnov 2D Statistic," Journal of Classification, Springer;The Classification Society, vol. 35(3), pages 394-421, October.
    2. E. Otranto, 2011. "Classification of Volatility in Presence of Changes in Model Parameters," Working Paper CRENoS 201113, Centre for North South Economic Research, University of Cagliari and Sassari, Sardinia.
    3. Liu, Shen & Maharaj, Elizabeth Ann, 2013. "A hypothesis test using bias-adjusted AR estimators for classifying time series in small samples," Computational Statistics & Data Analysis, Elsevier, vol. 60(C), pages 32-49.
    4. Otranto, Edoardo, 2008. "Clustering heteroskedastic time series by model-based procedures," Computational Statistics & Data Analysis, Elsevier, vol. 52(10), pages 4685-4698, June.
    5. Vilar, J.A. & Alonso, A.M. & Vilar, J.M., 2010. "Non-linear time series clustering based on non-parametric forecast densities," Computational Statistics & Data Analysis, Elsevier, vol. 54(11), pages 2850-2865, November.
    6. Pacifico, Antonio, 2020. "Bayesian Fuzzy Clustering with Robust Weighted Distance for Multiple ARIMA and Multivariate Time-Series," MPRA Paper 104379, University Library of Munich, Germany.
    7. Umberto Triacca, 2016. "Measuring the Distance between Sets of ARMA Models," Econometrics, MDPI, vol. 4(3), pages 1-11, July.
    8. Otranto, Edoardo, 2010. "Identifying financial time series with similar dynamic conditional correlation," Computational Statistics & Data Analysis, Elsevier, vol. 54(1), pages 1-15, January.
    9. Sonia Díaz & José Vilar, 2010. "Comparing Several Parametric and Nonparametric Approaches to Time Series Clustering: A Simulation Study," Journal of Classification, Springer;The Classification Society, vol. 27(3), pages 333-362, November.
    10. Di Iorio, Francesca & Triacca, Umberto, 2013. "Testing for Granger non-causality using the autoregressive metric," Economic Modelling, Elsevier, vol. 33(C), pages 120-125.
    11. Pierpaolo D’Urso & Livia Giovanni & Riccardo Massari & Dario Lallo, 2013. "Noise fuzzy clustering of time series by autoregressive metric," METRON, Springer;Sapienza Università di Roma, vol. 71(3), pages 217-243, November.
    12. Francesca Di Iorio & Umberto Triacca, 2014. "Testing for A Set of Linear Restrictions in VARMA Models Using Autoregressive Metric: An Application to Granger Causality Test," Econometrics, MDPI, vol. 2(4), pages 1-14, December.
    13. João A. Bastos & Jorge Caiado, 2014. "Clustering financial time series with variance ratio statistics," Quantitative Finance, Taylor & Francis Journals, vol. 14(12), pages 2121-2133, December.
    14. Liu, Shen & Maharaj, Elizabeth Ann & Inder, Brett, 2014. "Polarization of forecast densities: A new approach to time series classification," Computational Statistics & Data Analysis, Elsevier, vol. 70(C), pages 345-361.
    15. Bob Walrave, 2016. "Determining intervention thresholds that change output behavior patterns," System Dynamics Review, System Dynamics Society, vol. 32(3-4), pages 261-278, July.
    16. Paloma Taltavull de La Paz, 2021. "Predicting housing prices. A long term housing price path for Spanish regions," LARES lares-2021-4dra, Latin American Real Estate Society (LARES).
    17. Leijiao Ge & Tianshuo Du & Changlu Li & Yuanliang Li & Jun Yan & Muhammad Umer Rafiq, 2022. "Virtual Collection for Distributed Photovoltaic Data: Challenges, Methodologies, and Applications," Energies, MDPI, vol. 15(23), pages 1-24, November.
    18. Francesca Di Iorio & Umberto Triacca, 2022. "A comparison between VAR processes jointly modeling GDP and Unemployment rate in France and Germany," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 31(3), pages 617-635, September.
    19. Pietro Coretto & Michele La Rocca & Giuseppe Storti, 2020. "Improving Many Volatility Forecasts Using Cross-Sectional Volatility Clusters," JRFM, MDPI, vol. 13(4), pages 1-23, March.
    20. De Gregorio, Alessandro & Maria Iacus, Stefano, 2010. "Clustering of discretely observed diffusion processes," Computational Statistics & Data Analysis, Elsevier, vol. 54(2), pages 598-606, February.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:jclass:v:31:y:2014:i:2:p:154-178. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.