Project CEMAPRE internal
Title | Classification and clustering of big data time series with spectral measures (continuation for year 2020) |
Participants | Jorge Caiado, Nuno Crato (Principal Investigator) |
Summary | The big data revolution is now offering researchers and analysts new possibilities and new challenges. This is particularly true with time series, as for many domains we now have access to very long time series and to many time series related to a given domain of interest. This happens in areas as diverse as astronomy, geophysics, medicine, social media, and finance. The diversity and length of data available to researchers leads to particular challenges when comparing and clustering time series. For these tasks it is not usually possible to use traditional methods of analysing, estimating models, and comparing features, as these methods imply lengthy computations, such as the inversion of extremely large matrices. We have proposed a spectral method of synthesizing and comparing time series characteristics which is nonparametric and focused on the data cyclical features. Instead of using all the information available from data, which is computationally very expensive, this procedure we will use regularization rules in order to select and summarize the most relevant information for clustering purposes. This method does not imply the computation of the full periodograms, but only of the periodogram components around the frequencies of interest. It then proceeds to comparing the periodogram ordinates for the various time series and grouping them with common clustering methods. We called it a fragmented-periodogram approach. In 2019, we have achieved the project goals for the year and published in an international journal the first results of our research. In 2020, we want to further disseminate our results and extend them for situations in which the main spectral frequencies are unknown and need to be estimated. This introduces further challenges to the fragmented-periodogram computations, as we should need to introduce a two-step procedure. |