Industry-relevant training in Business, Technology, and Design to help professionals and graduates upskill for real-world careers.
Fun, engaging games to boost memory, math fluency, typing speed, and English skillsβperfect for learners of all ages.
Listen to a student-teacher conversation explaining the topic in a relatable way.
Signup and Enroll to the course for listening the Audio Lesson
Let's start with the first metric: support. Can anyone tell me what support means in the context of association rules?
Isn't it about how often an itemset appears in the dataset?
Exactly, great point! Support measures the frequency of the itemset in transactions. The formula is Support(A) = (Number of transactions containing A) / (Total number of transactions). Why do you think understanding support is important?
To know if an itemset is common enough to be significant?
Correct! High support means the itemset is common. Now letβs visualize: if we had 100 transactions and item A appeared in 20, what would be the support for item A?
That would be 0.2 or 20%!
Spot on! Remember, a minimum support threshold can filter out infrequent itemsets. Let's move on to confidence.
Signup and Enroll to the course for listening the Audio Lesson
Now, letβs talk about confidence. Who can share what confidence represents in an association rule?
It indicates how reliably we can expect that B occurs when A occurs?
Correct! The formula is Confidence(A β B) = Support(A βͺ B) / Support(A). Why do you think confidence is significant?
It helps us determine if a rule is actually useful or just coincidental.
Exactly! It filters out unreliable rules. Can someone give me an example of how confidence works?
If 30 transactions had both A and B and 50 included A, the confidence would be 30/50, which would be 0.6.
Well done! A confidence of 0.6 suggests strong reliability that A leads to B. Let's summarize before we move forward.
Signup and Enroll to the course for listening the Audio Lesson
Lastly, let's discuss lift. Who can explain what lift means?
Lift shows how much more likely B is purchased when A is purchased than when A is not?
Exactly! The formula is Lift(A β B) = Confidence(A β B) / Support(B). Which situations indicate strong associations?
Lift values greater than 1 suggest a positive association!
Right! And values less than 1 indicate a negative association. Can anyone think of a real-world example where lift would apply?
In retail, if customers who buy bread also often buy butter, a high lift value shows a strong association!
Great example! Let's recap what we learned about support, confidence, and lift.
Read a summary of the section's main ideas. Choose from Basic, Medium, or Detailed.
The section highlights three primary metricsβsupport, confidence, and liftβthat are essential in evaluating the strength and usefulness of association rules in datasets. These metrics help determine the relevance of itemsets and the reliability of the rules derived from data.
In the realm of association rule mining, especially in applications like Market Basket Analysis, it is crucial to have a framework for evaluating the strength of the rules we derive from our datasets. The three key metricsβSupport, Confidence, and Liftβhelp in assessing whether association rules are interesting and reliable.
Support measures how frequently an itemset appears within the dataset. Specifically, it answers the question: "What proportion of transactions contain this itemset?" A high support value indicates that the itemset is common enough to be of interest. The formula to calculate support for an itemset A is:
Support(A) = (Number of transactions containing A) / (Total number of transactions)
For an association rule A β B, the support can be defined as:
Support(A β B) = Support(A βͺ B)
Confidence represents the likelihood that items in the consequent (B) are also present in transactions that contain the antecedent (A). It assesses how reliable a rule is by indicating the proportion of transactions containing A that also contain B. The formula for calculating confidence is:
Confidence(A β B) = Support(A βͺ B) / Support(A)
A high confidence value implies that when A is present, B is likely to follow.
Lift is conveyed as a measure of how much more likely B is purchased when A is purchased, compared to the likelihood of purchasing B independently. It shows the strength of the association between A and B beyond what would be expected by chance. The calculation is:
Lift(A β B) = Confidence(A β B) / Support(B)
Understanding these metrics is fundamental for filtering out less interesting itemsets and rules, ensuring that the derived insights are both significant and actionable in a business context.
Dive deep into the subject with an immersive audiobook experience.
Signup and Enroll to the course for listening the Audio Book
Support helps us understand how often a particular combination of items appears in the dataset. Specifically, it measures the frequency of an itemset, or how often a group of items are bought together by customers. We calculate support by dividing the number of transactions that include the itemset by the total number of transactions. For example, if 100 customers bought something, and 20 of them bought both bread and butter, then the support for the itemset {bread, butter} would be 20/100 = 0.2. In practice, we want to set a threshold for support so that we only consider the most common itemsets or rules, as very rare combinations are often not useful for making business decisions.
Imagine a grocery store keeping track of sales. If they find that 30% of the customers who buy apples also buy oranges, the support metric tells them that apples and oranges are popular together. If they set a threshold that only itemsets with a support of above 0.25 are interesting, this means that the store focuses on combinations likely to lead to additional sales.
Signup and Enroll to the course for listening the Audio Book
Confidence tells us how reliable a rule is, essentially providing the probability that a customer who bought item A will also buy item B. We calculate confidence by dividing the support of the joint itemset (A and B together) by the support of item A alone. For instance, if we find that 25 out of the 30 customers that bought bread also bought butter, the rule {bread} implies {butter} has a confidence of 25/30 = 0.83, suggesting a strong likelihood that purchasing bread leads to purchasing butter.
Going back to our grocery store example, if customers who purchased bread and butter together constitute 83% when they buy bread, this statistical measure suggests to the store that butter is a common follow-up purchase after bread. They might decide to put these items closer together in the store to boost sales.
Signup and Enroll to the course for listening the Audio Book
Lift helps us to understand the strength of an association rule. While confidence tells us how frequently items A and B appear together, lift considers the overall popularity of the item B in the dataset. It provides additional context by showing whether the association is more than a mere coincidence. For example, if the confidence of buying butter given bread is high but the lift is low, it suggests customers are likely buying butter whether or not they buy bread. In this case, the store might reconsider how much weight to give the rule {bread} implies {butter} since butter's popularity isn't influenced by bread.
Returning to the grocery store scenario, if the lift for the rule {bread} implies {butter} is 2, this means that customers who buy bread are twice as likely to buy butter compared to customers selected at random. This information could prompt the store to create special promotions for these items together or display them next to each other for increased visibility, knowing that there's a true benefit to promoting them together based on customer buying behavior.
Learn essential terms and foundational ideas that form the basis of the topic.
Key Concepts
Support: Measures the frequency of an itemset in a dataset.
Confidence: Represents the reliability of the inference from A to B.
Lift: Measures the strength of association between A and B, considering their individual frequencies.
See how the concepts apply in real-world scenarios to understand their practical implications.
In a dataset of 100 transactions, if 40 include both bread and butter, the support for the rule {Bread} β {Butter} is 0.4.
If 25 out of 50 transactions that include bread also include butter, the confidence for the rule {Bread} β {Butter} is 0.5.
A lift value of 2 for the rule {Bread} β {Butter} means that buying bread doubles the likelihood of buying butter compared to chance.
Use mnemonics, acronyms, or visual cues to help remember key information more easily.
With support, we measure the crowd, Whatβs often bought is shouted loud.
Once in a store, a person bought bread. Everyone else bought butter instead. The shopkeeper said, 'Lift the lid, see how often they fit together!'
For memory: 'SCL' stands for Support, Confidence, Lift; the three metrics we must never miss!
Review key concepts with flashcards.
Review the Definitions for terms.
Term: Support
Definition:
A measure of how frequently an itemset appears in the dataset.
Term: Confidence
Definition:
A measure of how often items in B appear in transactions that also contain A, indicating the reliability of the rule.
Term: Lift
Definition:
A measure of how much more likely items in B are to be purchased when items in A are purchased.
Term: Association Rule
Definition:
An if-then statement that implies a relationship between an antecedent (A) and a consequent (B).
Term: Itemset
Definition:
A collection of one or more items considered in association rule mining.