What is the concept mini batch k-means? How DBSCAN works?
MINI-BATCH K-MEANS
The mini-batch k-means clustering algorithm is the modified version of the k-means algorithm. It uses min-batches to reduce the computation time in large datasets. In addition, it attempts to optimize the result of the clustering. To achieve this, the mini-batch k-means takes mini-batches as inputs which are subsets of the whole dataset, randomly.
The mini-batch k-means is considered faster thank-means and it is normally used for large datasets. Mini Batch K-means algorithm's main idea is to use small random batches of data of a fixed size, so they can be stored in memory. In each iteration, a new random sample from the dataset is obtained and used to update the clusters and this is repeated until convergence. Each mini-batch updates the clusters using a convex combination of the values of the prototypes and the data, applying a learning rate that decreases with the number of iterations. This learning rate is the inverse of the number of data assigned to a cluster during the process. As the number of iterations increases, the effect of new data is reduced, so convergence can be detected when no changes in the clusters occur in several consecutive iterations.
The algorithm takes small randomly chosen batches of the dataset for each iteration. Each data in the batch is assigned to the clusters, depending on the previous locations of the cluster centroids. It then updates the locations of cluster centroids based on the new points from the batch. The update is a gradient descent update, which is significantly faster than a normal K-Means update.
2nd part is WORKING OF DBSCAN
DENSITY-BASED-DBSCAN
- Partitioning methods and hierarchical methods are suitable for finding spherical-shaped clusters. Moreover, they are also severely affected by the presence of noise and outliers in the data. Unfortunately, real-life data contain clusters of arbitrary shapes such as oval, línear, s-shaped, etc., and many noises solution to this problem is to use density-based clustering methods.
- The basic idea behind density-based methods is to model clusters as dense regions in the data space, separated by sparse regions. The Major features of density-based methods include discovering clusters of arbitrary shape (e.g. oval, s-shaped, :), handling noise, and needing density parameters as a termination condition. One most popular density-based algorithms is DBSCAN(Density-Based Spatial Clustering of Applications with Noise).
Comments
Post a Comment