How are KShape cluster centers calculated? I thought they would be the arithmic mean the data points per time step, but it seems not #506

sjowtajm · 2024-01-23T16:53:14Z

sjowtajm
Jan 23, 2024

Hey

After implementing KMeans cluster I am now looking at KShape to compare the results between the two. I am using both through Anaconda.Navigator and I have tslearn 0.6.3 installed.

While I can see that the cluster centers from KMeans are the mean in each group, I don't understand the result from KShape.

The input data I have are daily time series with hourly resolution, i.e. 24 data points per time series.

Look at the documentation for KShape it says that the cluster centers are the centroids, which I interpret as the center = average value of the time series?

But when I try a few time series as input, the centers look wierd. See some figures below:

With 4 times series it seems to pick one of the time series in cluster 1 as the centroid

With 10 time series the centroid seems to be shifted left one step for all clusters

With 40 time series it seems to be even more shifted? The sole time series in cluster 4 is shifted three time steps wrong to the left

These three examples are on z-normalized data, which seems to be what you must input. It does not say that in the documents, but if I input non-normalized data the centroid become even more chaotic:

Is this a bug, or have I done something wrong? The data I input is a DataFrame called "cluster_data" with 24 columns and where each row is a separate time series, then I call:

 cluster_data = TimeSeriesScalerMeanVariance().fit_transform(cluster_data) # scales data to have mean = 0, std = 1 with standard settings (z-normalization)

        cluster_data = pd.DataFrame(cluster_data.squeeze())

        km = KShape(n_clusters=number_of_clusters, verbose=True, random_state=0)

        y_pred = km.fit_predict(cluster_data)

        cluster_centers = pd.DataFrame(km.cluster_centers_[:,:,0])

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How are KShape cluster centers calculated? I thought they would be the arithmic mean the data points per time step, but it seems not #506

{{title}}

Replies: 0 comments

Select a reply

How are KShape cluster centers calculated? I thought they would be the arithmic mean the data points per time step, but it seems not #506

sjowtajm Jan 23, 2024

Replies: 0 comments

sjowtajm
Jan 23, 2024