Explain APRIORI ALGORITHM.
APRIORI ALGORITHM
- Apriori algorithm was the first algorithm that was proposed for frequent itemset mining. It was later improved by R Agarwal and R Srikant and came to be known as Apriori. This algorithm uses two steps "join" and "prune" to reduce the search space. It is an iterative approach to discover the most frequent itemsets.
- Apriori algorithm is a candidate generation & test approach for frequent itemset generation. It is based on the Apriori principle. Apriori principle states that "If there is any item set which is infrequent, its superset should not be generated or tested".
- Therefore, the Apriori algorithm initially, scan the transaction dataset once to get candidate 1-itemset and then frequent 1-itemsets, generate length (k+1) candidate itemset from length k frequent itemset, test the candidates against transaction dataset to filter frequent itemset from infrequent itemsets present in candidate itemset, this process terminates when no frequent or candidate set can be generated.
The steps followed in the Apriori Algorithm of data mining are
Input:
- Transaction database D
- Minimum support count threshold
Output:
F frequent itemset of size k in D.
Join Step: This step generates (K+1) itemset from K-itemsets by joining each item with itself.
Prune Step: This step scans the count of each item in the database. If the candidate item does not meet minimum support, then it is regarded as infrequent and thus it is removed. This step is performed to reduce the size of the candidate itemsets.
Method
Apriori Algorithm: Finding Frequent Itemsets by Confined Candidate Generation
Apriori is a seminal algorithm proposed by R. Agrawal and R. Srikant in 1994 for mining frequent itemsets for Boolean association rules [AS94b]. The name of the algorithm is based on the fact that the algorithm uses prior knowledge of frequent itemset properties, as we shall see later. Apriori employs apr iterative approach known as a level-wise search, where k-itemsets are used to explore (k+1)-itemsets. First, the set of frequent 1-itemsets is found by scanning the database to accumulate the count for each item and collecting those items that satisfy minimum support. The resulting set is denoted by L₁. Next, Li is used to find L2, the set of frequent 2-itemsets, which is used to find L3, and so on, until no more frequent k-itemsets can be found. The finding of each Lk requires one full scan of the database.
To improve the efficiency of the level-wise generation of frequent itemsets, an important property called the Apriori property is used to reduce the search space.
Comments
Post a Comment