Explain K-medoid Clustering Algorithm with example.

- December 16, 2021

K-Medoids is also a partitioning technique of clustering that clusters the data set of n objects into k clusters with k known a priori. It starts from an initial set of Medoids and iteratively replaces one of the Medoids by one of the non-Medoids if it improves the total distance of the resulting clustering. A popular realization of K-Medoids clustering is PAM (Partitioning Around Medoids) algorithm.

Advantages of K-Medoids

It is simple to understand and easy to implement. K-Medoid Algorithm is fast and converges in a fixed number of steps.
PAM is less sensitive to outliers than other partitioning algorithms.

Disadvantages of K-Medoids

Pam works efficiently for small data sets but does not scale well for large data sets. It may obtain different results for different runs on the same dataset because the first k medoids are chosen randomly.
It is not suitable for clustering arbitrary-shaped clusters because it relies on minimizing the distances between the non-medoid data points and the medoid.

K-medoid Clustering Algorithm

A medoid can be defined as the object of a cluster whose average dissimilarity to all the objects in the cluster is minimal. i.e. it is the most centrally located point in the cluster. In contrast to the K-means algorithm, the K-medoid algorithm chooses data points as centers and works with an arbitrary matrix of distances instead of l2.K-medoid is a classical partitioning technique of clustering that clusters the data set of n objects into k clusters known a priori. It is more robust to noise and outliers compared to because it may minimize a sum of pair-wise dissimilarities instead of a sum of squared Euclidean distances.

Algorithms

The most common realization of k-medoid clustering is the Partitioning around Medoid (PAM) algorithm and is as follows:

1. Initialize: randomly select (without replacement) k of the n data points as the medoid

2. Associate each data point to the closest medoid.

3. While the cost of the configuration decreases:

- For each medoid m, for each non-medoid data point o:

• Swap m and o, re-compute the cost (sum of distances of points to their medoid)

• If the total cost of the configuration increased in the previous step, undo the swap.

Example

Cluster the following data set of ten objects into two clusters i.e. k = 2. Data Points are {(1,3), (4,2), (6,2), (3,5), (4,1)}

Solution:-

Let p1=(1,3) p2=(4,5) p3=(6,3) p4=(3,4) p5=(2,1)

Initial step

Let m1=p1=(1,3) and m2=p4=(3,4) are two initial medoid

Iteration 1

Calculate distance between clusters centers and each data point

d(m1, p2) = 3 + 2 + 5 d(m2, p2) = 1+ 0 + 1

d(m1, p3) = 5 + 0 + 5 d(m2, p3) = 3 + 1 + 4

d(m1, p5) = 1 + 2 + 3 d(m2, p5) = 1 + 3 + 4

Thus after first iteration

Cluster 1= {p1, p5}

Cluster 2={p2,p3,p4}

Total Cost= {d(m1, p5)+d(m2, p2)+d(m2, p3)}=10

Iteration 2

Lets swap m1 and p2

Now m1=p2= (4,5) and m2=p4=(3,4) are two medoid

Let p1=(1,3) p2=(4,5) p3=(6,3) p4=(3,4) p5=(2,1)

Calculate distance between new cluster centers and each data points

d(m1, p1) = 3 + 2 + 5 d(m2, p1) = 2 +1 + 3

d(m1, p3) = 2 + 1 + 3 d(m2, p3) = 3 + 1 + 4

d(m1, p5) = 2 + 4 + 6 d(m2, p5) = 1 + 3 + 4

Thus after second iteration

Cluster 1= {p2,p3}

Cluster 2={p1,p4,p5}

Total Cost= {d(m1, p3)+d(m2, p1)+d(m2, p5)}=3+3+4=10

Same as previous, thus no undo swap

Repeat this process till the cost of configuration do not decrease

Search This Blog

Notes for BSc CSIT

Explain K-medoid Clustering Algorithm with example.

Comments

Post a Comment

Popular posts from this blog

What are different steps used in JDBC? Write down a small program showing all steps.

Discuss classification or taxonomy of virtualization at different levels.

Pure Versus Partial EC