I am trying to cluster dozens of time-series sampled every 30min, and which cover the period mid2016 – mid2020. Most of them have very nice "patterns", others may have missing values for a given period (eg: one whole year, severals months, etc) or be more "chaotic" (sudden variations).
Here I display some of the time-series I am handling:
If we look at a closer level (eg: weekly), it is possible to see some seasonal patterns as the graphs below show (2020/1/1 to 2020/1/8):
Ideally, I would like to make clusters where time-series share similar "shapes in time" (eg: similar shape based on time –> peaks on the morning and evening, almost null values on weekends or holidays, etc) but also, if possible, yearly seasonality when enough data are available.
I tried to apply the commonly used DTW measure + hierarchical clustering (ward linkage), but because of the number of points I have per time-series (even after doing 1hr resampling), it took too much time and I was quite disappointed with the results (though I applied on data with few amount of preprocessing).
So what I am facing is:
- I would like to extract the "nicest" part of each time series, but if I do so, they will be misaligned (do not start at the same time point) and they will be of different length.
Thus, I am quite confused to the preprocessing steps I should employ.
I would be glad if you have some advice about preprocessing / distance / clustering algorithm that I should I apply to perform clustering of these time series.
Best Answer
Preprocessing that results in misalignment or different lengths is not necessarily a problem. Have you considered Time Series Clustering – a decade review (Information Systems 53, 2015), in which Aghabozorgi et al review 38 algorithms for clustering whole time-series? See the rightmost column of Table 4 (pages 27-28) for their notes describing attributes of each of the approaches.
Similar Posts:
- Solved – Suitable distance metric for time-series clustering with respect to location of shapes
- Solved – Compare two time series
- Solved – Special method for forecasting on time-series clusters in R
- Solved – How to compare and cluster sets of daily time series
- Solved – Modeling time series with Gaussian Mixture Model