Apriori Algorithm in Machine Learning

The Apriori Algorithm is used for association rule learning on transactional databases. It identifies frequent itemsets and uses them to generate association rules that show how strongly items are related.

It uses Breadth-First Search (BFS) and a Hash Tree to count itemset efficiently.

Proposed by: R. Agrawal & Srikant (1994)
Applications:

Market Basket Analysis
Healthcare (e.g., Drug Interaction Prediction)

🔸 What is a Frequent Itemset?

A frequent itemset is a group of items whose support is greater than a minimum support threshold.

💡 If {A, B} is frequent, then both A and B must be frequent individually.

Example:

Transactions:
- A = {1,2,3,4,5}
- B = {2,3,7}
Frequent itemsets = {2, 3} (appear in both)

🔹 Important Terms:

Support = Frequency of occurrence

\text{Support}(A\Rightarrow B)=\frac{\text{Transactions containing both } A \text{ and } B}{\text{Total number of transactions}}

Confidence = Strength of implication

\text{Confidence}(A\Rightarrow B)=\frac{\text{Support}(A \cup B)}{\text{Support}(A)}

Lift = Strength of association

🔸 Apriori Algorithm Steps

Determine the support of itemsets in the transactional database, and select the minimum support and confidence.
Take all supports in the transaction with a higher support value than the minimum or selected support value.
Find all the rules of these subsets that have a higher confidence value than the threshold or minimum confidence.
Sort the rules in decreasing order of lift.

🔹 Apriori Example

We will understand the Apriori algorithm using an example and mathematical calculation:

Example:

Suppose we have the following dataset that has various transactions, and from this

dataset, we need to find the frequentitemsetsand generate theassociationrulesusingthe

Apriori algorithm

TID	ITEMSETS
T1	A,B
T2	B,D
T3	B,C
T4	A,B,D
T5	A,C
T6	B,C
T7	A,C
T8	A,B,C,E
T9	A,B,C

Minimum Support = 2
Minimum Confidence = 50%

🧠 Solution

✅ Step 1: C1 and L1 (Single Items)

itemset	Support_Count
A	6
B	7
C	5
D	2
E	1 ❌ (Removed)

✅ Step 2: C2 and L2 (Pairs)

Itemset	Support
{A, B}	4
{A, C}	4
{A, D}	1 ❌
{B, C}	4
{B, D}	2
{C, D}	0 ❌

✔️ Frequent Pairs (L2):
→ {A, B}, {A, C}, {B, C}, {B, D}

✅ Step 3: C3 and L3 (Triplets)

Itemset	Support
{A, B, C}	2 ✔️
{B, C, D}	1 ❌
{A, C, D}	0 ❌
{A, B, D}	0 ❌

✔️ Only Frequent Triplet (L3): {A, B, C}

✅ Step 4: Generate Association Rules

From {A, B, C} with Support = 2:

\text{Confidence}(A\Rightarrow B)=\frac{\text{Support}(A \cup B)}{\text{Support}(A)}

Rule	Support	Confidence
A, B → C	2	2/4 = 50% ✔️
A, C → B	2	2/4 = 50% ✔️
B, C → A	2	2/4 = 50% ✔️
A → B, C	2	2/6 = 33.33% ❌
B → A, C	2	2/7 = 28.57% ❌
C → A, B	2	2/5 = 40% ❌

✔️ Strong Rules (≥ 50% Confidence):
→ A, B → C, A, C → B, B, C → A

✅ Advantages of Apriori

Simple and easy to understand
Effective join and prune steps
Good for interpretable rules in large datasets

❌ Disadvantages of Apriori

Slow performance for large datasets
Multiple database scans reduce efficiency.
Time & space complexity: O(2^D) → exponential in large itemsets (D = item count)