Apriori Algorithm

PUBLISHED: MAY 2, 20262 MIN READ

Apriori Algorithm in Machine LearningThe Apriori Algorithm is used for association rule learning on transactional databases. It identifies frequent itemsets and

Divya Sachan
Divya SachanAuthor
totodile
#158
totodile

Apriori Algorithm in Machine Learning

The Apriori Algorithm is used for association rule learning on transactional databases. It identifies frequent itemsets and uses them to generate association rules that show how strongly items are related.

It uses Breadth-First Search (BFS) and a Hash Tree to count itemset efficiently.

Proposed by: R. Agrawal & Srikant (1994)
Applications:
  • Market Basket Analysis
  • Healthcare (e.g., Drug Interaction Prediction)

🔸 What is a Frequent Itemset?

A frequent itemset is a group of items whose support is greater than a minimum support threshold.

💡 If {A, B} is frequent, then both A and B must be frequent individually.

Example:

  • Transactions:
    • A = {1,2,3,4,5}
    • B = {2,3,7}
  • Frequent itemsets = {2, 3} (appear in both)

🔹 Important Terms:

Support = Frequency of occurrence

Support(AB)=Transactions containing both A and BTotal number of transactions\text{Support}(A\Rightarrow B)=\frac{\text{Transactions containing both } A \text{ and } B}{\text{Total number of transactions}}

  • Confidence = Strength of implication
Confidence(AB)=Support(AB)Support(A)\text{Confidence}(A\Rightarrow B)=\frac{\text{Support}(A \cup B)}{\text{Support}(A)}
  • Lift = Strength of association

🔸 Apriori Algorithm Steps

  1. Determine the support of itemsets in the transactional database, and select the minimum support and confidence.
  2. Take all supports in the transaction with a higher support value than the minimum or selected support value.
  3. Find all the rules of these subsets that have a higher confidence value than the threshold or minimum confidence.
  4. Sort the rules in decreasing order of lift.

🔹 Apriori Example

We will understand the Apriori algorithm using an example and mathematical calculation:

Example:

Suppose we have the following dataset that has various transactions, and from this

dataset, we need to find the frequentitemsetsand generate theassociationrulesusingthe

Apriori algorithm

TIDITEMSETS
T1A,B
T2B,D
T3B,C
T4A,B,D
T5A,C
T6B,C
T7A,C
T8A,B,C,E
T9A,B,C
Minimum Support = 2
Minimum Confidence = 50%

🧠 Solution

✅ Step 1: C1 and L1 (Single Items)

itemsetSupport_Count
A6
B7
C5
D2
E1 ❌ (Removed)

✅ Step 2: C2 and L2 (Pairs)

ItemsetSupport
{A, B}4
{A, C}4
{A, D}1 ❌
{B, C}4
{B, D}2
{C, D}0 ❌

✔️ Frequent Pairs (L2):
{A, B}, {A, C}, {B, C}, {B, D}

✅ Step 3: C3 and L3 (Triplets)

ItemsetSupport
{A, B, C}2 ✔️
{B, C, D}1 ❌
{A, C, D}0 ❌
{A, B, D}0 ❌

✔️ Only Frequent Triplet (L3): {A, B, C}

✅ Step 4: Generate Association Rules

From {A, B, C} with Support = 2:

Confidence(AB)=Support(AB)Support(A)\text{Confidence}(A\Rightarrow B)=\frac{\text{Support}(A \cup B)}{\text{Support}(A)}
RuleSupportConfidence
A, B → C22/4 = 50% ✔️
A, C → B22/4 = 50% ✔️
B, C → A22/4 = 50% ✔️
A → B, C22/6 = 33.33% ❌
B → A, C22/7 = 28.57% ❌
C → A, B22/5 = 40% ❌

✔️ Strong Rules (≥ 50% Confidence):
A, B → C, A, C → B, B, C → A

✅ Advantages of Apriori

  • Simple and easy to understand
  • Effective join and prune steps
  • Good for interpretable rules in large datasets

❌ Disadvantages of Apriori

  • Slow performance for large datasets
  • Multiple database scans reduce efficiency.
  • Time & space complexity: O(2^D) → exponential in large itemsets (D = item count)