Unsupervised Learning Algorithms

Although most applications of ML are based on supervised learning, most of the data available in the real world is unlabeled . The most important unsupervised learning algorithms fall into 3 main categories :

Clustering

Clustering is the task of identifying similar instances and assigning them to clusters. Just like classification, each instance gets assigned to a particular group, however their task is unsupervised. Clustering used in a variety of applications such as:

Finding the optimal number of clusters

Inertia is not a good performance metric when we try to choose the value of ‘k’ since it keeps getting lower as we increase the value of ‘k’. A technique that is best used to find the number of ‘k’ clusters is to use the silhouette coefficient. The silhouette coefficient is given by

The silhouette coefficient can vary between -1 and 1. A value of 1 means the instance is very well in its own cluster and far from the other clusters. A coefficient value of 0 means the instance is located close to the boundary. A value of -1 means that it has been assigned to the wrong cluster. In sklearn to find the silhouette scores you can always use silhouette_score(). Silhouette diagrams are a visual representation of this and are always useful.

Limits of K-means

Despite the fact that this algorithm is fast and scalable, this method is far from accurate as the due to the fact that this has to be run multiple times and the process to find the value of ‘k’ is also quite tedious. At the same time K-means do not tend to behave properly if the clusters are of varying sizes.


Clustering can be used as an efficient approach to dimensionality reduction, in particular as a pre-processing step before a supervised algorithm


DBSCAN

This algorithm defines clusters as continuous regions of density. The algorithm is as follows:

  1. Find the points in the ε (eps) neighborhood of every point, and identify the core points with more than minPts neighbors.
  2. Find the connected components “Connected component (graph theory)”) of core points on the neighbor graph, ignoring all non-core points.
  3. Assign each non-core point to a nearby cluster if the cluster is an ε (eps) neighbor, otherwise assign it to noise.

DBSCAN does not have a predictmethod but only a fit_predict method

Other Clustering Algorithms

Agglomerative Clustering

A hierarchy of clusters is built from bottom up. At each iteration, this algorithm connects the nearest pair of clusters

Birch

Built for large datasets and faster than K-means as long as the features are less than 20

Mean Shift

Algorithm starts by placing a circle centered on each instance. For each circle it computes the mean of all instances located in it and then shifts the circle so it is centered around the mean. It keeps iterating until the circle no longer moves. The algorithm shifts circles into a direction of high density. Finally all instances whose circle have settled is the place that forms a cluster. This is not suited for large datasets and it tends to cheap clusters into pieces if they have internal density variations

Spectral Clustering

Algorithm take a similarity matrix between instances creates a low dimensional embedding from it. If then uses another clustering algorithm in this low dimensional space