Unsupervised Learning, Let’s Learn

Ajay n Jain
3 min readNov 1, 2019

My favorite movies are: The Dark Knight, Joker, Bridge Of Spies and
Forrest Gump.

I asked my friends can you group these movies!, one of them grouped into Comic Book Movies and Non Comic Book Movies and another of them grouped these into Tom Hanks Movies and Non Tom Hanks Movies.
Hmm Interesting!, both of them are right.

Let’s see a technique, using which Machine can group these movies.

Know what Machine Learning is? If Yes, go ahead, If No, checkout my other story.

Unsupervised Learning

Unsupervised Learning is a Machine Learning technique that deals with unlabeled data i.e data here do not have a correct answer or a predefined output.
In this technique, the machine is given some data and it uses algorithms to make sense of it and organize them into similar groups.
These groups are then analyzed by ML practitioners who give them a label so that these groups can be later used in Supervised Learning.

There are two types of Unsupervised Learning
1. Clustering
2. Association Rules

Clustering

Clustering is a type of Unsupervised Learning. It organizes the data in similar groups based on similarities. The data points which are more closer to each other are organized in a single group and the data points which are far away from each other are organized in separate groups. These groups are called Clusters and hence the name Clustering.
Consider an example where we have to organize the following dataset.

Clustering Sample Dataset
Clustering Sample Dataset

This sample dataset has 4 rows. When this data is passed to a machine, it can organize them into the following ways:

  1. It can consider Actor as the most significant feature and organize the data into two clusters, Tom Hanks movies, and non Tom Hank movies.
  2. It can consider Genre as a most significant feature and organize the data into two clusters, Comic-Book movies, and Non-Comic-Book movies.

The actual grouping depends on the Clustering algorithms and it depends on ML practitioner to decide which clusters are of use and which are not.

Some of the Clustering Algorithms: K-Means, DBSCAN, EM Clustering.
Examples of Clustering: Customer Segmentation, Enhance Search Engines, DNA Analysis

Update: Practical implementation using Python: Click Here

Association

Association Rules are a type of Unsupervised Learning. It detects patterns in data to find frequently occurring data points in a dataset. It also finds which data points occur together to find a relationship between them.
Let’s take an example of Market Basket Analysis which is one of the most popular examples in Association.
This analysis is very useful to people who own a retail shop and they want to find out what variety of items are people buying and establish a relation between them.

Let’s say a machine is given sales data to find a relation between different data points. It finds out that people buy bread and butter or PlayStation and games frequently together, what retailers can do is, offer discounts or combo offer for these items so that more people buy these without giving it a second thought.
Retailers can also place frequently bought items together to provide a better user experience when buying.

Some of Association Rules Algorithms: Apriori, Eclat, FP-Growth
Examples of Association Rules: Market Basket Analysis, Medical Diagnosis.

Summary

Unsupervised Learning: Deals with Unlabeled data.
Types: Clustering and Association Rules.
Clustering: Organizes data into different groups.
Association Rules: Finds out a relationship between different data points.

--

--

Ajay n Jain

Frontend Engineer! I observe, I write, follow for my deductions. I hope to be a Sherlock in Engineering