Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
What Is Unique About Clustering? | What is Clustering?
Cluster Analysis
course content

Course Content

Cluster Analysis

Cluster Analysis

1. What is Clustering?
2. Basic Clustering Algorithms
3. How to choose the best model?

bookWhat Is Unique About Clustering?

Clustering is a type of machine learning in which the model is trained on unlabelled data without any predefined target variable or correct output (it is called unsupervised learning). The goal is to identify hidden patterns or structures in the data without any prior knowledge of the output.
By this, the approach to learning also changes: in supervised learning, we have to minimize the difference between the predicted value and the actual value (label), while in unsupervised, we must determine which function we will minimize to solve a specific problem (it can be cross entropy when working with images, different kinds of mathematical norms for working with numerical data, density when using statistical methods, etc.).
Simply, we need to choose by what criteria we will consider objects close to each other for clustering. In most of the algorithms, usual euclidean distance is used for this:

Also, there are often used intra-cluster (the distance between a data item and the cluster centroid within a cluster) and inter-cluster (the distance between the data items in distinct clusters) distances: the smaller is intra-cluster and the greater the inter-cluster distance, the better our algorithm coped with the clustering task.

Now, let`s discuss the advantages and disadvantages of clustering.

Pros:

  • clustering helps to solve machine learning problems without requiring us to label data, which can be time-consuming;
  • clustering algorithms can help us to enhance data quality by detecting outliers, reducing data dimensions, and engineering features;
  • clustering can help us identify valuable patterns and insights in our data;
  • clustering algorithms can work with data that doesn't follow a consistent pattern over time.

Cons:

  • clustering can be expensive because it may require human experts to interpret the patterns and connect them to domain knowledge;
  • there's no guarantee that clustering will provide useful results since we don't have labeled data to validate the outcomes;
  • the accuracy of clustering results can vary depending on the method used.
Can we use clustering for data preprocessing to improve accuracy of supervised learning algorithms?

Can we use clustering for data preprocessing to improve accuracy of supervised learning algorithms?

Select the correct answer

Everything was clear?

How can we improve it?

Thanks for your feedback!

Section 1. Chapter 2
We're sorry to hear that something went wrong. What happened?
some-alt