You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After implementing KMeans cluster I am now looking at KShape to compare the results between the two. I am using both through Anaconda.Navigator and I have tslearn 0.6.3 installed.
While I can see that the cluster centers from KMeans are the mean in each group, I don't understand the result from KShape.
The input data I have are daily time series with hourly resolution, i.e. 24 data points per time series.
Look at the documentation for KShape it says that the cluster centers are the centroids, which I interpret as the center = average value of the time series?
But when I try a few time series as input, the centers look wierd. See some figures below:
With 4 times series it seems to pick one of the time series in cluster 1 as the centroid
With 10 time series the centroid seems to be shifted left one step for all clusters
With 40 time series it seems to be even more shifted? The sole time series in cluster 4 is shifted three time steps wrong to the left
These three examples are on z-normalized data, which seems to be what you must input. It does not say that in the documents, but if I input non-normalized data the centroid become even more chaotic:
Is this a bug, or have I done something wrong? The data I input is a DataFrame called "cluster_data" with 24 columns and where each row is a separate time series, then I call:
cluster_data = TimeSeriesScalerMeanVariance().fit_transform(cluster_data) # scales data to have mean = 0, std = 1 with standard settings (z-normalization)
cluster_data = pd.DataFrame(cluster_data.squeeze())
km = KShape(n_clusters=number_of_clusters, verbose=True, random_state=0)
y_pred = km.fit_predict(cluster_data)
cluster_centers = pd.DataFrame(km.cluster_centers_[:,:,0])
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hey
After implementing KMeans cluster I am now looking at KShape to compare the results between the two. I am using both through Anaconda.Navigator and I have tslearn 0.6.3 installed.
While I can see that the cluster centers from KMeans are the mean in each group, I don't understand the result from KShape.
The input data I have are daily time series with hourly resolution, i.e. 24 data points per time series.
Look at the documentation for KShape it says that the cluster centers are the centroids, which I interpret as the center = average value of the time series?
But when I try a few time series as input, the centers look wierd. See some figures below:
With 4 times series it seems to pick one of the time series in cluster 1 as the centroid
With 10 time series the centroid seems to be shifted left one step for all clusters
With 40 time series it seems to be even more shifted? The sole time series in cluster 4 is shifted three time steps wrong to the left
These three examples are on z-normalized data, which seems to be what you must input. It does not say that in the documents, but if I input non-normalized data the centroid become even more chaotic:
Is this a bug, or have I done something wrong? The data I input is a DataFrame called "cluster_data" with 24 columns and where each row is a separate time series, then I call:
Beta Was this translation helpful? Give feedback.
All reactions