Probabilistic D-Clustering is a new iterative method for probabilistic clustering of data. Given clusters, their centers and the distances of data points from these centers, the probability of cluster membership at any point is assumed inversely proportional to the distance from (the center of) the cluster. This assumption is the working principle of Probabilistic D-Clustering.

At each iteration, the distances (Euclidean, Mahalanobis, etc.) from the cluster centers are computed for all data points, and the centers are updated as convex combinations of these points, with weights determined by the above principle.

Progress is monitored by the joint distance function, a measure of distance from all cluster centers, that evolves during the iterations, and captures the data in its low contours.

This method is simple, fast (requiring a small number of cheap iterations) and insensitive to outliers.

Algorithm

Suppose we want to obtain $K =2$ clusters from $N=6$ points

\[\{x_1 = 1, x_2 = 2, x_3 = 3, x_4 = 10, x_5 = 12, x_6 = 13\}\]

And the initial guess of cluster centers are

\[\{c_1 = 5, c_2 = 6\}\]

Then in each iteration, we will go through

  1. Step $d$
  2. Step $p$
  3. Step $m$

to update cluster centers until these centers make no changes.

Step $d$

\[d_{k}(x) = \|x - c_k\|\]
  $x_1$ $x_2$ $x_3$ $x_4$ $x_5$ $x_6$
$d_1 = |x - c_1|$ 4 3 1 5 7 8
$d_2 = |x - c_2|$ 5 4 2 4 6 7

Step $p$

Then the membership probabilities of $x$ in cluster $C_k$ is

\[p_{k}(x) = \frac{\prod_{j \neq k} d_{j}(x)}{\sum_{i=1}^K \prod_{j \neq i} d_{j}(x)}\]

For example, if $K = 2$, then

\[p_1(x) = \frac{d_2(x)}{d_1(x) + d_2(x)}\]
  $x_1$ $x_2$ $x_3$ $x_4$ $x_5$ $x_6$
$p_1 = \frac{d_2}{d_1 + d_2}$ $\frac{5}{9}$ $\frac{4}{7}$ $\frac{2}{3}$ $\frac{4}{9}$ $\frac{6}{13}$ $\frac{7}{15}$
$p_2 = \frac{d_1}{d_1 + d_2}$ $\frac{4}{9}$ $\frac{3}{7}$ $\frac{1}{3}$ $\frac{5}{9}$ $\frac{7}{13}$ $\frac{8}{15}$

On the other hand, if $K = 3$, then

\[p_1(x) = \frac{d_2(x)d_3(x)}{d_1(x)d_2(x) + d_2(x)d_3(x) + d_1(x)d_3(x)}\]

Step $m$

\[m_{k} = p_{k}(x)^2 \times \frac{1}{d_{k}(x)}\]
  $x_1$ $x_2$ $x_3$ $x_4$ $x_5$ $x_6$
$m_1 = p_1^2 / d_1$ $(\frac{5}{9})^2 \times \frac{1}{4} = 0.077$ $(\frac{7}{15})^2 \times \frac{1}{8} = 0.027$
$m_2 = p_2^2 / d_2$ $(\frac{4}{9})^2 \times \frac{1}{5} = 0.04$ $(\frac{8}{15})^2 \times \frac{1}{7} = 0.041$

Update Cluster Center

\[C_k^{new} = \frac{\sum_{j=1}^N x_j \times m_{k}(x_j)}{\sum_{j=1}^N m_{k}(x_j)}\]

So in this iteration, the new cluster centers will be

\[C_1^{new} = \frac{\sum_{j=1}^6 x_j \times m_{1}(x_j)}{\sum_{j=1}^6 m_{1}(x_j)} = \frac{0.077 \times 1+0.109 \times 2 + ...+ 0.027 \times 13}{0.077+0.109+...+0.027} = 4.38 \\ C_2^{new} = \frac{\sum_{j=1}^6 x_j \times m_{2}(x_j)}{\sum_{j=1}^6 m_{2}(x_j)} = \frac{0.04 \times 1+0.046 \times 2 + ...+ 0.041 \times 13}{0.04+0.046+...+0.041} = 4.38\]

Note that if you repeatedly go through Step $d$, Step $p$, and Step $m$ to update cluster centers, the cluster center for each iteration will be the same as the table below.

iteration Cluster Centers
0 $(c_1 = 5, c_2 = 6)$
1 $(c_1 = 4.38, c_2 = 7.272)$
2 $(c_1 = 3.864, c_2 = 10.022)$
3 $(c_1 = 3.840, c_2 = 10.025)$
4 $(c_1 = 3.811, c_2 = 10.028)$

Comparison with K-Means and Fuzzy C-Means (FCM)

In fuzzy C-means (FCM), the summation of membership of a Cluster $C_k$ is 100%; However, this is not the same case of $p_k$ in Probabilistic D-Clustering. That is, $p_k(x)$ does not sum to 100% among all data point $x$.

To evaluate the performance of Probabilistic D-Clustering, here we first compare the average correct rate (%).

Correct Rate (%) k-means FCM PDC
Iris Data 88% 89% 93%
Ruspini Data 96% 100% 97%
Wine Data 96% 95.89% 90%

Next, we also compare the computation time on each dataset.

CPU seconds k-means FCM PDC
Iris Data 3.0 2.5 3.8
Ruspini Data 2.3 2.5 4.2
Wine Data 3.2 3.7 6.5

Discussion

Probabilistic D-Clustering is actually minimizing an objective function.

\[\min \sum_k d_k p_k^2\]

This $d$–model (instead of $d^2$-model) is suggested by the analogy between clustering and location problems, where sums of distances (not distances squared) are minimized.

The advantages of using Probabilistic D-Clustering are

  • it is simple, fast (requiring a small number of cheap iterations)
  • it is robust (insensitive to outliers)
  • it gives a high percentage of correct classifications.

References