Lab: Applying and Comparing Different Clustering Algorithms, Interpreting Their Results - 5.7 | Module 5: Unsupervised Learning & Dimensionality Reduction (Weeks 9) | Machine Learning
K12 Students

Academics

AI-Powered learning for Grades 8–12, aligned with major Indian and international curricula.

Academics
Professionals

Professional Courses

Industry-relevant training in Business, Technology, and Design to help professionals and graduates upskill for real-world careers.

Professional Courses
Games

Interactive Games

Fun, engaging games to boost memory, math fluency, typing speed, and English skillsβ€”perfect for learners of all ages.

games

5.7 - Lab: Applying and Comparing Different Clustering Algorithms, Interpreting Their Results

Practice

Interactive Audio Lesson

Listen to a student-teacher conversation explaining the topic in a relatable way.

Introduction to Clustering Algorithms

Unlock Audio Lesson

Signup and Enroll to the course for listening the Audio Lesson

0:00
Teacher
Teacher

Today, we are diving into clustering algorithms! Can anyone tell me why clustering is important in data analysis?

Student 1
Student 1

It's useful for discovering hidden patterns in data without predefined labels.

Teacher
Teacher

Exactly! Clustering helps to find natural groupings within data. Remember, clustering algorithms can be broadly categorized into K-Means, Hierarchical Clustering, and Density-Based Clustering like DBSCAN.

Student 2
Student 2

What's the main difference between these techniques?

Teacher
Teacher

Great question! K-Means requires you to specify the number of clusters, while hierarchical clustering builds a tree-like structure of clusters, and DBSCAN can find clusters of arbitrary shapes without requiring a predefined number. We'll explore each in detail today. Think of the acronym KHD: K-Means, Hierarchical, and DBSCAN!

Student 3
Student 3

How do we know what K to choose for K-Means?

Teacher
Teacher

We can use techniques like the Elbow Method and Silhouette Analysis. We'll get into those shortly!

Teacher
Teacher

In summary, today, you'll apply clustering algorithms, analyze their results, and interpret clusters from real datasets. Let's get started!

K-Means Clustering

Unlock Audio Lesson

Signup and Enroll to the course for listening the Audio Lesson

0:00
Teacher
Teacher

Let's now focus on K-Means. Can anyone outline the K-Means algorithm process?

Student 4
Student 4

You first choose K, then place initial centroids, assign points to the nearest centroid, and update the centroids until they stabilize.

Teacher
Teacher

Perfect! Remember, we need to find the optimal K value. The Elbow Method visualizes the trade-off between K and WCSS. Who can explain this method?

Student 1
Student 1

We plot WCSS against K values and look for an 'elbow' point where adding more clusters doesn’t improve WCSS much!

Teacher
Teacher

Exactly! Sometimes, however, the elbow isn't clear. That's where Silhouette Analysis helps too. It gives a numeric measure of how well each point is clustered. Let’s practice implementing K-Means using both methods!

Teacher
Teacher

Remember the memory aid: β€˜K is for K-Means, C is for Centroids, and S is for Silhouette!’

Hierarchical Clustering and Dendrograms

Unlock Audio Lesson

Signup and Enroll to the course for listening the Audio Lesson

0:00
Teacher
Teacher

Now, let’s dive into Hierarchical Clustering. What do you know about this algorithm?

Student 2
Student 2

It builds clusters in a hierarchy, allowing us to decide the number of clusters later by cutting a dendrogram.

Teacher
Teacher

Correct! This method does not need K specified upfront. Can someone explain how a dendrogram is structured?

Student 3
Student 3

The X-axis represents individual data points or clusters, and the Y-axis shows the distance at which clusters are merged.

Teacher
Teacher

Right! Remember: 'Short merges mean strong connections and long merges mean weak links'. After that, how do we extract clusters from it?

Student 4
Student 4

By cutting the dendrogram at a certain height!

Teacher
Teacher

Exactly! Let’s visualize some dendrograms and practice reading them.

DBSCAN: Density-Based Clustering

Unlock Audio Lesson

Signup and Enroll to the course for listening the Audio Lesson

0:00
Teacher
Teacher

Finally, let's discuss DBSCAN, a density-based algorithm. What are the key parameters it uses?

Student 1
Student 1

Eps and MinPts! Eps determines the neighborhood radius, and MinPts defines how many points are needed to form a dense region.

Teacher
Teacher

Great! Remember, DBSCAN can find arbitrary shapes. Why is identifying noise points useful?

Student 2
Student 2

It helps to separate outliers that might skew data interpretation!

Teacher
Teacher

Exactly! The strength is allowing more algorithmic freedom regarding shapes. Let’s implement DBSCAN and tune the parameters using the K-distance graph!

Comparative Analysis of Clustering Algorithms

Unlock Audio Lesson

Signup and Enroll to the course for listening the Audio Lesson

0:00
Teacher
Teacher

We’ve implemented various clustering algorithms! How can we compare their effectiveness?

Student 3
Student 3

We can compare based on cluster shape consistency, noise identification, and computational efficiency.

Teacher
Teacher

Great! Let’s tabulate our findings. What’s a critical limitation of unsupervised clustering?

Student 4
Student 4

There’s no ground truth to evaluate results against!

Teacher
Teacher

Exactly! In summary, each clustering algorithm has its strengths and weaknesses based on data characteristics. Always consider the context you’re working in!

Introduction & Overview

Read a summary of the section's main ideas. Choose from Basic, Medium, or Detailed.

Quick Overview

This lab focuses on practical applications and comparisons of various clustering algorithms, including K-Means, Hierarchical Clustering, and DBSCAN, amidst hands-on data analysis.

Standard

In this lab, students will gain valuable experience in implementing, analyzing, and interpreting the results of different clustering algorithms. They will prepare datasets, select optimal parameters, and explore the strengths and weaknesses of each algorithm in deriving meaningful insights from data.

Detailed

Lab: Applying and Comparing Different Clustering Algorithms, Interpreting Their Results

In this lab session, students will engage deeply with clustering algorithms, enhancing their understanding of unsupervised learning through practical experience. The objectives are structured into five primary tasks:

  1. Data Preparation for Clustering: Students will load and explore suitable datasets, handle missing values, encode features, and perform feature scaling to ensure the correctness of distance-based algorithms. Attention will be paid to the significance of each preprocessing step in achieving unbiased and meaningful clustering results.
  2. K-Means Clustering with Optimal K Selection: Using the Elbow method and Silhouette analysis, students will determine the optimal number of clusters (K) for K-Means. They will visualize results and interpret the characteristics of the clusters formed, discussing potential implications derived from these insights.
  3. Hierarchical Clustering with Dendrogram Interpretation: Students will compute distance matrices and apply various linkage methods to perform hierarchical clustering. They will generate and interpret dendrograms, drawing connections to the data's structure.
  4. DBSCAN Implementation: The lab will introduce students to DBSCAN, focusing on the identification of noise points and the impact of parameter tuning. Students will visualize results and discuss DBSCAN’s advantages for detecting clusters of arbitrary shapes, compared to K-Means and hierarchical methods.
  5. Comprehensive Performance Comparison: Students will summarize findings in a tabulated form, discussing strengths and weaknesses of the algorithms and practical implications of the clusters produced, while acknowledging the inherent limitations of unsupervised clustering methods. They will reflect on how clustering aids in uncovering hidden structures within unlabeled data.

Audio Book

Dive deep into the subject with an immersive audiobook experience.

Lab Objectives Overview

Unlock Audio Book

Signup and Enroll to the course for listening the Audio Book

By the successful conclusion of this lab, you will be able to proficiently:

  1. Prepare Data for Clustering with Precision:
    • Load and Thoroughly Explore a Dataset: Begin by loading a suitable dataset for clustering. Ideal datasets are those where you might anticipate inherent groupings but lack explicit labels. Examples include:
      • Customer Segmentation Data: (e.g., spending habits, demographics, website activity) to identify distinct customer groups.
      • Gene Expression Data: To group genes with similar expression patterns.
      • Image Pixel Data: (e.g., for color quantization or object segmentation).
      • Geospatial Data: (e.g., identifying hot spots of criminal activity or areas of high population density).
      • Synthetically Generated Data: (e.g., using sklearn.datasets.make_blobs or make_moons to create clusters of known shapes for algorithm testing).
    • Perform initial exploratory data analysis (EDA): inspect data types, identify numerical and categorical features, check for outliers, and visualize feature distributions.
    • Handle Missing Values Systematically: Implement appropriate and justifiable strategies to address any missing data points within your chosen dataset. Clearly articulate your rationale for selecting methods like mean imputation, median imputation, mode imputation, or strategic row/column deletion.
    • Encode Categorical Features (If Necessary and Thoughtfully): Convert any non-numeric, categorical features into a numerical representation. Employ techniques such as One-Hot Encoding (for nominal/unordered categories, understanding its impact on dimensionality) or Label Encoding (for ordinal/ordered categories). Crucially, consider if your chosen clustering algorithms can handle categorical features directly (e.g., CatBoost for supervised tasks, but for clustering, manual encoding is often required) and discuss the implications of high-dimensional one-hot encoded features on distance metrics.
    • Feature Scaling (Absolutely Critical for Distance-Based Algorithms): This is a non-negotiable and crucial preprocessing step for most distance-based clustering algorithms (K-Means, Hierarchical Clustering). Apply feature scaling (e.g., using StandardScaler to achieve zero mean and unit variance, or MinMaxScaler for a specific range, from Scikit-learn) to all your numerical features. Provide a detailed explanation of why this step is essential: features with larger numerical ranges can disproportionately influence distance calculations, leading to biased clustering results where the algorithm prioritizes features with larger scales, regardless of their actual importance.

Detailed Explanation

This chunk provides a comprehensive overview of the objectives for the clustering lab, focusing on data preparation. Students are expected to load and explore datasets that may exhibit natural groupings but do not have any pre-defined labels. It emphasizes the importance of understanding the dataset through exploratory data analysis (EDA), which includes inspecting data types, identifying numerical or categorical features, checking for outliers, and visualizing distributions. Moreover, handling missing values wisely is crucial; students should justify their imputation strategies. Encoding categorical features to a numerical format is also highlighted, making sure the clustering algorithm can process them effectively. Finally, the significance of feature scaling is stressed, particularly for distance-based clustering algorithms like K-Means and Hierarchical Clustering, as features with larger ranges can skew results.

Examples & Analogies

Think of preparing for a camping trip where you need to pack different kinds of supplies. Loading your supplies (dataset) is akin to gathering all the essentials you need. You must categorize them into food, gear, and clothing (numerical and categorical features) to ensure you have the right items. If you realize you forgot some things (missing values), you need to find replacements that fit your needs (imputation). Once everything is in place, making sure each category is well-represented and balanced (feature scaling) is like ensuring that you have not over-packed one type of supply, which would make your backpack harder to carry (distorted clustering results).

Implementing K-Means Clustering

Unlock Audio Book

Signup and Enroll to the course for listening the Audio Book

  1. Implement K-Means Clustering with Optimal K Selection:
    • Initial K-Means Run and Baseline: Begin by applying KMeans from Scikit-learn (sklearn.cluster.KMeans) with an arbitrarily chosen, reasonable number of clusters (K, e.g., K=3 or K=4) to get an initial feel for the output. This serves as a starting point.
    • Determine Optimal K via Elbow Method (Visual Heuristic):
      • Systematically run K-Means clustering for a range of K values (e.g., from 1 to 15, or a range appropriate for your data).
      • For each K, meticulously record the WCSS (Within-Cluster Sum of Squares), also known as inertia_ in Scikit-learn.
      • Generate a line plot with K on the X-axis and WCSS on the Y-axis.
      • Visually identify the "elbow point" – the point where the rate of decrease in WCSS significantly slows down, indicating diminishing returns for adding more clusters.
      • Discuss the inherent subjectivity and potential ambiguity involved in interpreting the "elbow" in real-world datasets.
    • Determine Optimal K via Silhouette Analysis (Quantitative Measure):
      • For the same range of K values used for the Elbow method, calculate the average Silhouette Score for each clustering result (sklearn.metrics.silhouette_score).
      • Plot these average scores against K.
      • Choose the K that yields the highest average Silhouette Score, as this indicates the best-defined and most well-separated clusters.
      • Discuss how this method provides a more quantitative and less subjective measure of clustering quality compared to the Elbow method, often leading to a more robust selection of K.

Detailed Explanation

This chunk focuses on implementing K-Means clustering. Initially, students will run K-Means with a guessed number of clusters (K), like 3 or 4, just to obtain a baseline understanding of the clustering output. After this starting point, two methods are introduced to identify the optimal number of clusters. The Elbow Method entails running K-Means for a series of K values (for instance, from 1 to 15), tracking within-cluster sum of squares (WCSS) for each run. The plot of K against WCSS should reveal an 'elbow' point, where adding more clusters yields diminishing returns. Attention is drawn to the subjective nature of this method. Conversely, Silhouette Analysis quantitatively measures how well-separated the clusters are by calculating the average silhouette score across those K values. This holistic approach can pinpoint the optimal K based on which configuration yields the highest score, ultimately suggesting that analysis using differences in these methods can lead to more reliable choices for K.

Examples & Analogies

Imagine organizing a group of friends for a party. Initially, you might group them arbitrarily into a few teams to play games (initial K). After playing, you notice some teams didn't get along well, leading to a chaotic game experience (K-Means output). To improve future games, you evaluate how the teams performed, first by simply checking where there were arguments and where everyone had fun (Elbow Method) – at some point adding more teams doesn't help anymore. Then, you compare team cohesion (Silhouette Analysis), calculating how well each person bonded with their teammates to ensure a great party atmosphere in future gatherings.

Exploring Hierarchical Clustering

Unlock Audio Book

Signup and Enroll to the course for listening the Audio Book

  1. Implement Hierarchical Clustering with Dendrogram Interpretation:
    • Compute Distance Matrix: Start by computing the pairwise distance matrix between all your data points. This is a prerequisite for hierarchical clustering (e.g., using scipy.spatial.distance.pdist or sklearn.metrics.pairwise.euclidean_distances).
    • Perform Linkage with Different Methods: Apply different linkage methods (e.g., 'single', 'complete', 'average', and 'ward' using scipy.cluster.hierarchy.linkage). Discuss the theoretical implications of each linkage method on cluster shape and sensitivity.
    • Generate and Interpret Dendrograms: For at least one illustrative linkage method (e.g., 'ward' which often produces aesthetically pleasing and compact clusters), generate and plot the dendrogram (scipy.cluster.hierarchy.dendrogram).
      • Detailed Interpretation: Explain precisely how to read and interpret the dendrogram:
        • How the X-axis leaves represent individual data points.
        • How the Y-axis height represents the dissimilarity/distance at which merges occur.
        • How short vertical lines indicate highly similar clusters merging early.
        • How long vertical lines indicate merges between more dissimilar clusters.
        • Demonstrate how drawing a horizontal line across the dendrogram at a chosen height (distance threshold) yields a specific number of clusters. Illustrate with examples of different cuts resulting in different cluster counts.
    • Extract Clusters from Dendrogram: Use the fcluster function from scipy.cluster.hierarchy to explicitly extract cluster assignments based on your chosen distance threshold or by specifying a desired number of clusters derived from your dendrogram interpretation.

Detailed Explanation

In this chunk, the focus shifts to hierarchical clustering, a method that does not require pre-specifying the number of clusters. First, the distance matrix needs to be computedβ€”a matrix that shows the distances between every pair of data points. This is crucial for facilitating the merging process inherent in hierarchical clustering. Different linkage methods (single, complete, average, and ward) determine how closely data points are merged, impacting the resulting cluster shapes. Students generate a dendrogram, a tree-like diagram displaying the entire merging journey of clusters. Interpretation of the dendrogram enables students to visualize and understand how close data clusters are based on the chosen linkage method, identifying the number of clusters by cutting the branches at a certain height. Finally, the 'fcluster' function enables the extraction of defined clusters from this visual representation.

Examples & Analogies

Think of organizing a wedding seating chart. First, you have each guest (data point) represented separately. You then identify who knows each other well (computing the distance matrix) and begin grouping them. As you work through grouping (linkage steps), you assess which tables (clusters) should merge based on how closely related each group is. As you finalize the seating arrangement, the dendrogram visualizes your process, showing how you brought people together smoothly. Drawing a line across this visualization helps you determine how many tables you truly need to reflect guests' preferences and relationships.

Implementing DBSCAN for Dense Clusters

Unlock Audio Book

Signup and Enroll to the course for listening the Audio Book

  1. Implement DBSCAN (Density-Based Spatial Clustering of Applications with Noise):
    • Initial DBSCAN Run: Initialize and apply DBSCAN from Scikit-learn (sklearn.cluster.DBSCAN). Start with initial, arbitrary values for eps and MinPts to understand its basic output.
    • Strategic Parameter Tuning (eps, MinPts): DBSCAN's performance is critically dependent on its parameters. Discuss and apply common strategies for selecting eps and MinPts:
      • For MinPts: Discuss rules of thumb (e.g., MinPts = 2 * dimensions for low-dimensional data; larger values like 20 for high-dimensional data). Explain the reasoning behind these rules.
      • For eps: Emphasize the importance of the K-distance graph method. Plot the distance to the k-th nearest neighbor for each data point (where k is MinPts - 1), sorted in ascending order. Identify the "knee" or "elbow" in this graph, which suggests a good eps value where the density of points significantly drops.
    • Identify Noise Points: Crucially, observe how DBSCAN automatically identifies and labels "noise" points (outliers) with a special cluster label (typically -1). This is a distinct advantage.
    • Visualize DBSCAN Results: If feasible (2D/3D data or after dimensionality reduction), visualize the DBSCAN clusters, ensuring to distinctly color or mark the noise points identified by the algorithm.
    • Interpret Cluster Shapes and Noise: Discuss how DBSCAN effectively finds clusters of arbitrary, non-spherical shapes, demonstrating this capability if your dataset allows. Analyze if the detected clusters align with any inherent density variations in your data. Provide examples of what types of data are well-suited for DBSCAN.

Detailed Explanation

This chunk details the implementation of the DBSCAN algorithm, a powerful tool for clustering in dense regions. The initial step involves running the DBSCAN with arbitrary values for its two main parameters (eps and MinPts) to get a baseline result. Fine-tuning these parameters is crucial as they significantly affect clustering results. Students will learn to apply strategies for setting MinPts and eps, with a focus on how to produce a K-distance graph to find the optimal values where the density sharply declines. The algorithm’s ability to distinguish noise points (or outliers) is one of its advantages, marking them as -1, thereby enabling students to understand the robustness of DBSCAN against noisy data. Visualizing the resulting clusters can further amplify DBSCAN's unique ability to identify non-spherical shapes, allowing students to explore the suitability of various datasets for this method.

Examples & Analogies

Imagine you are exploring a crowded festival. Walk close to larger groups of friends (core points) while also noting individuals standing apart (noise). To find your own group, you must understand how far you're willing to walk to find others (eps), and how many friends you want to gather (MinPts). As you group together, the clusters reflect friendships, while the loners represent noise that doesn’t belong to any tight-knit group. Ultimately, your ability to measure distance (analyze parameters), identify groups, and distinguish between clustered friends and loners (i.e., DBSCAN’s robust outlier detection) makes your exploration efficient.

Comprehensive Performance Comparison

Unlock Audio Book

Signup and Enroll to the course for listening the Audio Book

  1. Comprehensive Performance Comparison and In-Depth Discussion:
    • Tabulate and Summarize Results: Create a clear, well-structured summary table comparing the key characteristics, benefits, limitations, and outcomes of each clustering algorithm (K-Means, Agglomerative Hierarchical Clustering, DBSCAN). Include considerations such as:
      • How the number of clusters was determined (or if it was an output).
      • The algorithm's ability to handle varying cluster shapes (spherical vs. arbitrary).
      • Its inherent capability to identify outliers/noise.
      • Sensitivity to initial conditions or specific parameters.
      • Computational considerations (conceptual discussion, e.g., O(N^2) vs. O(N) complexity, memory requirements for distance matrices).
    • Detailed Strengths and Weaknesses Analysis: Based on your direct observations from the lab, provide a detailed discussion of the specific strengths and weaknesses of each algorithm. For example:
      • When would K-Means be the most appropriate choice (e.g., known K, spherical clusters, large datasets)?
      • When would Hierarchical clustering be more insightful (e.g., need for dendrogram, understanding nested relationships, smaller datasets)?
      • When is DBSCAN the best choice (e.g., arbitrary cluster shapes, outlier detection is critical, varying densities not too extreme)?
    • Interpreting Cluster Insights for Actionable Knowledge: For your best-performing or most insightful clustering result (regardless of the algorithm), delve deeply into what the clusters actually mean in the specific context of your dataset. Go beyond simply stating "Cluster 1 is this" and "Cluster 2 is that." Instead, describe the key characteristics and defining attributes of each cluster in relation to your original features. Translate these technical findings into potential business or scientific implications (e.g., "Cluster A represents our 'high-value, highly engaged' customer segment, suggesting targeted loyalty programs," or "Cluster B indicates a novel sub-type of disease, warranting further medical research").
    • Acknowledging Limitations of Unsupervised Clustering: Conclude with a critical reflection on the inherent limitations of unsupervised clustering techniques. Emphasize that there is no "ground truth" for direct quantitative evaluation (unlike supervised learning), and the interpretation of results often requires subjective human judgment and strong domain expertise. Discuss the challenges of evaluating the "correctness" of clusters.

Detailed Explanation

This final chunk emphasizes the significance of comparing various clustering algorithms. It advocates for creating a summary table that neatly captures each algorithm’s characteristics, benefits, and limitations, allowing for quick reference. Students are encouraged to analyze situations where specific algorithms excel, such as K-Means for known cluster numbers and typical spherical shapes, or why hierarchical clustering provides insightful visual data representations. Further discussion focuses on translating the outcome of clustering findings into actionable insights, reflecting on their meanings and implications in real-world scenarios. Finally, students are prompted to recognize the inherent limitations of unsupervised clustering, including the challenge presented by the lack of labeled data for evaluation, requiring subjective interpretations.

Examples & Analogies

Imagine you are assessing different workout routines at a gym. With each routine representing a clustering algorithm, you note down their strengths (e.g., easy to follow, requires equipment), weaknesses (e.g., not suitable for large groups), and the results they provide (gains, endurance). Comparing these routines helps you choose what fits your fitness goals (the best algorithms for your data). Ultimately, translating these choices into actionable results akin to a nutritional plan (the clusters' implications in real contexts) allows you to maximize your gym experience while you become aware of the workout’s limitations (evaluation challenges present in unsupervised clustering).

Definitions & Key Concepts

Learn essential terms and foundational ideas that form the basis of the topic.

Key Concepts

  • Clustering Algorithms: Techniques for grouping data.

  • K-Means Clustering: A method that requires specifying the number of clusters.

  • DBSCAN: An algorithm that identifies clusters based on density without pre-specifying cluster numbers.

  • Elbow Method: A technique for determining optimal cluster count visually.

  • Dendrogram: A visual representation of clusters formed in hierarchical clustering.

Examples & Real-Life Applications

See how the concepts apply in real-world scenarios to understand their practical implications.

Examples

  • K-Means can be used for customer segmentation based on purchasing behavior data to identify distinct customer groups.

  • DBSCAN is ideal for geographical data to identify hotspots of activity, as it can accommodate clusters of various shapes.

Memory Aids

Use mnemonics, acronyms, or visual cues to help remember key information more easily.

🎡 Rhymes Time

  • K, M, and D - Clustering Algorithms we see. K-Means fixes K with glee, DBSCAN finds shape, simple as can be!

πŸ“– Fascinating Stories

  • Imagine a group of friends trying to organize a messy room. K-Means represents them picking a number of boxes (K) to keep things neat; Hierarchical Clustering shows how they stack boxes inside one another; DBSCAN helps them gather items spread across the room into a perfect shape without dictating how many items to put in each box.

🧠 Other Memory Gems

  • To remember types of clustering: K for K-Means, D for Density in DBSCAN, H for Hierarchical's tree-like splendor!

🎯 Super Acronyms

Remember KHD for clustering algorithms

  • K-Means
  • Hierarchical
  • Density (DBSCAN).

Flash Cards

Review key concepts with flashcards.

Glossary of Terms

Review the Definitions for terms.

  • Term: Clustering

    Definition:

    A method of grouping data points into subsets where members of the same subset are more similar to each other than to those in other subsets.

  • Term: KMeans

    Definition:

    A popular clustering algorithm that partitions data into K clusters by assigning each data point to the nearest cluster centroid.

  • Term: Dendrogram

    Definition:

    A tree-like diagram that visually represents the arrangement of clusters in hierarchical clustering.

  • Term: DBSCAN

    Definition:

    A density-based clustering algorithm that identifies clusters of arbitrary shape and detects outliers.

  • Term: Silhouette Score

    Definition:

    A measure used to evaluate the quality of a clustering, reflecting how well-separated clusters are.

  • Term: Elbow Method

    Definition:

    A heuristic for determining the number of clusters by plotting the WCSS against various values of K and identifying an 'elbow' point.

  • Term: Core Point

    Definition:

    A data point in DBSCAN that has at least a specified number of points in its neighborhood, thus considered part of a dense area.

  • Term: Noise Point

    Definition:

    In DBSCAN, a data point that does not belong to any cluster, considered an outlier.