Explain Association Rule with example.

 Association Rule

  • Proposed by Agrawal et al in 1993.
  • It is an important data mining model studied extensively by the database and data mining community.
  • Assume all data are categorical.
  • No good algorithm for numeric data.
  • Initially used for Market Basket Analysis to find how items purchased by customers are related.
  • Given a set of records each of which contains some number of items from a given collection;

– Produce dependency rules which will predict the occurrence of an item based on occurrences of other items.

                                      OR,

  • Association is one of the best-known data mining techniques. In association, a pattern is discovered based on a relationship between items in the same transaction. That’s is the reason why association technique is also known as relation technique. The association technique is used in market basket analysis to identify a set of products that customers frequently purchase together.
  • Retailers are using association technique to research customers’ buying habits. Based on historical sale data, retailers might find out that customers always buy crisps when they buy beers, and, therefore, they can put beers and crisps next to each other to save time for the customer and increase sales.



Applications:

Basket data analysis, cross-marketing, catalog design, loss-leader analysis, clustering, classification, etc.

E.g., 98% of people who purchase tires and auto accessories also get automotive services done 


Concepts:

An item: an item/article in a basket

I: the set of all items sold in the store

A transaction: items purchased in a basket; it may have TID (transaction ID)

A transactional dataset: A set of transactions







Mining Association Rules:

What We Need to Know

Goal: Rules with high support/confidence
How to compute?
Support: Find sets of items that occur frequently
Confidence: Find frequency of subsets of supported itemsets

If we have all frequently occurring sets of items (frequent itemsets), we can compute support and confidence!

Comments

Popular posts from this blog

Pure Versus Partial EC

Suppose that a data warehouse for Big-University consists of the following four dimensions: student, course, semester, and instructor, and two measures count and avg_grade. When at the lowest conceptual level (e.g., for a given student, course, semester, and instructor combination), the avg_grade measure stores the actual course grade of the student. At higher conceptual levels, avg_grade stores the average grade for the given combination. a) Draw a snowflake schema diagram for the data warehouse. b) Starting with the base cuboid [student, course, semester, instructor], what specific OLAP operations (e.g., roll-up from semester to year) should one perform in order to list the average grade of CS courses for each BigUniversity student. c) If each dimension has five levels (including all), such as “student < major < status < university < all”, how many cuboids will this cube contain (including the base and apex cuboids)?

Short note on E-Government Architecture