Research projects

Project CEMAPRE internal

TitleClassification and clustering of time series with data-driven fragmented statistics
ParticipantsJorge Caiado (Principal Investigator), Nuno Crato
SummaryThe classification and clustering of time series involve defining a relevant metric, employing
machine
learning algorithms or specific clustering techniques, and interpreting the results to gain insights
into the
underlying commonalities and structures and patterns within the time series data.
Many featured-based methods have been developed to address the problem of clustering noisy raw time
series data. Methods based on features extracted in the time domain, frequency domain, and from
wavelet decomposition of the time series are discussed in the literature (Maharaj, D’Urso and
Caiado,
2019). These involve extracting autocorrelation, partial autocorrelation, cross-correlation and
periodogram ordinates features from time series data to compute distance metrics.
We know that both the autocorrelation, ACF, and the periodogram of a given time series describe its
linear
dependence structure and hence they are a good representation of the dynamics of many real time
series.
For such purpose, though, it is crucial to identify the relevant autocorrelation lags or the
determinant
frequencies that contribute to the discriminative power for classifying different time series.
Along these lines, two successful approaches are the fragmented periodogram method proposed by
Caiado, Crato, and Poncela (2020) and the fragmented autocorrelation method proposed by Albino,
Caiado, and Crato (2024). The first uses the periodogram only around main driving frequencies of the
time
series; the second uses the ACF around specific lags of interest for clustering. While effective
with known
data generation processes, these methods may be less reliable when information on the time series
structure is unknown.
To overcome this limitation, we propose to develop metrics to calculate the distance between time
series
using only their significant periodogram ordinates or significant autocorrelations. This entails
defining a
significance threshold to retain relevant frequencies and autocorrelations and filter out the
noise.
For this purpose, we propose elaborating the theory of both methods to incorporate the data-driven
fragmentation and conducting a simulation study with time series generated by linear models (ARMA,
ARIMA, and SARIMA) and illustrating the concept using real data from economic and financial time
series.

References:
Albino, A., Caiado, J. and Crato, N. (2024): “Big-data time series clustering using fragmented
autocorrelations”, working paper.
Caiado, Jorge, Nuno Crato, and Pilar Poncela (2020). “A fragmented-periodogram approach for
clustering big data time series”. Advances in Data Analysis and Classification, Vol. 14: pp.
117–146
E.A. MAHARAJ, P. D'URSO and CAIADO, J., (2019). Time Series Classification and Clustering, CRC
Press,
Taylor & Francis Group, United States.