Research projects

Project CEMAPRE internal

TitleClassification and clustering of big data time series with spectral measures (continuation for year 2022)
ParticipantsJorge Caiado, Nuno Crato (Principal Investigator)
SummaryThe big data revolution is now offering researchers and analysts new possibilities and new
challenges. This is particularly true with time series, as for many domains we now have access to
very long time series and to many time series related to a given domain of interest. This happens
in
areas as diverse as astronomy, geophysics, medicine, social media, and finance.

The diversity and length of data available to researchers leads to particular challenges when
comparing and clustering time series. For these tasks it is not usually possible to use traditional
methods of analysing, selecting and estimating models, and comparing features, as these methods
imply lengthy computations, such as the inversion of extremely large matrices.

We have previously proposed a spectral method of synthesizing and comparing time series
characteristics which is nonparametric and focused on the data cyclical features. Instead of using
all the information available from data, which is computationally very expensive, this procedure
uses regularization rules in order to select and summarize the most relevant information for
clustering purposes. This method does not imply the computation of the full periodograms, but only
of the periodogram components around the frequencies of interest. It then proceeds to comparing the
periodogram ordinates for the various time series and grouping them with common clustering methods.
We called it a fragmented-periodogram approach.

Previously, we have published in an international journal the first results of our research. We
have
also directed a MS Thesis (Andreia Albino) in which a novel method was introduced: the fragmented
autocorrelation method for time series clustering. We have further compared the two methods through
a simulation exercise. In 2022, we want to extend these results including the automatic search of
the fragmentation segments and the comparison of model forecasting criteria.