Fuzzy C-Means Clustering algorithm implementation (and visualization) in Processing 4
n_dp = 500 n_cr = 3 |
n_dp = 1000 n_cr = 6 |
n_dp = 3000 n_cr = 7 |
This sketch is an implementation of the FCM clustering algorithm in Processing 4.
Given a number of Data Points, their features, and the number of Centroids, it computes the features of the Centroids and the degrees of membership for each Data Point to each Centroid.
The number of Data Points and Centroids is variable, and can be set through the int n_dp
and int n_cr
variables respectively.
The number of feature each Data Point and Centroid has, however, is fixed to 2
, in order to provide a graphical visualization of the sets (each feature representing one of the two axis in a two-dimential plane) throughout the execution of the algorithm.
The parameter of fuzziness is also variable, and can be set through the float m
variable (default value is 2
).
The degree of tolerance (relative to the difference between the previous and current iterations' Centroids' features values) can too be set to any desired value through the float tolerance
variable (default value is 0.005f
).
2.mov
At first, each Data Point is instantiated with two randomly generated features (stored in ArrayList<PVector> data_points
), and a randomly generated degree of memebership for each Centroid (stored in a float[][] distr
of dimensions n_dp * n_cr
).
The Centroids' feature are initally set to 0
(and are stored in ArrayList<PVector> centroids
).
A random color is also assigned to each Centroid (stored in int[] c_colors
) in order to provide a clearer visual representation.
Each iteration of the algorithm consists of two (plus one) steps:
For each Centroid
with
For each Data Point
with
These two operations are iterated until the following condition is met:
Let
where
Lasty, the "third step" alluded to before concerns the visualization of the sets.
Each Centroid is visualized as a circle with a radius of 15 pixels, centered at its features, colored in its respective color.
Each Data Point is visualized as a circle with a radius of 10 pixels, centered at its features, which color is calculated by blending all the Centroids' colors according to the Data Point's degrees of membership.