ID3 Algorithm

PUBLISHED: MAY 2, 20262 MIN READ

AlgorithmConstruct the decision tree using the algorithm for the given data set.S.noAgeCompetitionTypeProfit (Class)1OldYesSoftDown2OldNoSoftDown3OldNoHardDown

Divya Sachan
Divya SachanAuthor
spheal
#363
spheal

Algorithm

Construct the decision tree using the algorithm for the given data set.

S.noAgeCompetitionTypeProfit (Class)
1OldYesSoftDown
2OldNoSoftDown
3OldNoHardDown
4MidYesSoftDown
5MidYesHardDown
6MidNoHardUp
7MidNoSoftUp
8NewYesSoftUp
9NewNoHardUp
10NewNoSoftUp

🔧 Step 1: Initial Entropy of Dataset

Entropy(S)=pUplog2(pUp)pDownlog2(pDown)Entropy(S^{\prime})=-p_{Up}log_2(p_{Up})-p_{Down}log_2(p_{Down})

Where:

  • Total = 10
  • Down = 5 (1 to 5)
  • Up = 5 (6 to 10)

So:

Entropy(S)=510log2(510)510log2(510)Entropy(S^{\prime})=-\frac{5}{10}\log_2\left(\frac{5}{10}\right)-\frac{5}{10}\log_2\left(\frac{5}{10}\right)
=510log2(12)510log2(12)= -\frac{5}{10}log_2\left(\frac{1}{2}\right)-\frac{5}{10}log_2\left(\frac{1}{2}\right)
=510log2(2)1510log2(2)1= -\frac{5}{10}log_2\left(2\right)^{-1} - \frac{5}{10}log_2\left(2\right)^{-1}
=510log2(2)+510log2(2)=\frac{5}{10}log_2\left(2\right)+\frac{5}{10}log_2\left(2\right)
=5101+5101=\frac{5}{10} \cdot 1 + \frac{5}{10} \cdot 1
=0.5+0.5=0.5+0.5

Entropy() = 1

Step 2: Calculate Information Gain for all attributes

Attribute 1: Age (Old, Mid, New)

Old: = [Down, Down, Down]

E=33log233E=-\frac{3}{3}\log_2\frac{3}{3}
E=0E = 0

Mid : = [Down, Down, Up, Up]

=24log22424log224=-\frac{2}{4}log_2\frac{2}{4}-\frac{2}{4}log_2\frac{2}{4}
E=0.5log20.50.5log20.5E = -0.5log_{2}0.5 - 0.5log_{2}0.5

New: = [Up, Up, Up]

E=33log233E=-\frac{3}{3}log_2\frac{3}{3}
E=0E = 0

=Entropy(S)SoldSEntropy(Sold)SmidSEntropy(Smid)SnewSEntropy(Snew)=\text{Entropy}(S^{\prime})-\frac{|S_{old}|}{|S|}\cdot\text{Entropy}(S_{old})\\-\frac{|S_{mid}|}{|S|}\cdot\text{Entropy}(S_{mid})-\frac{|S_{new}|}{|S|}\cdot\text{Entropy}(S_{new})
=1(3100+4101+3100)= 1 - \left(\frac{3}{10}\cdot0 + \frac{4}{10} \cdot 1 + \frac{3}{10}\cdot 0\right)
10.4=0.61 - 0.4 = 0.6

Gain(Age) = 0.6

Attribute 2: Competition (Yes, No)

Yes : = [Down, Down, Down, Up]

=34log23414log214=-\frac{3}{4}log_2\frac{3}{4}-\frac{1}{4}log_2\frac{1}{4}
E=0.75log20.750.25log20.25E = -0.75log_{2}0.75 - 0.25log_{2}0.25

No: = [Down, Down, Up, Up, Up, Up]

=26log22646log246=-\frac{2}{6}log_2\frac{2}{6}-\frac{4}{6}log_2\frac{4}{6}
E=0.333log20.3330.666log20.666E = -0.333log_{2}0.333 - 0.666log_{2}0.666
E0.918E \approx 0.918

=1(4100.811+6100.918)=1-\left(\frac{4}{10}\cdot0.811+\frac{6}{10}\cdot0.918\right)
1(0.3244+0.5508)=0.1248\approx 1 - (0.3244 + 0.5508) = 0.1248

Gain(Competition) = 0.125

Attribute 3: Type (Soft, Hard)

Soft: = [Down, Down, Down, Up, Up, Up]

E=36log23636log236E=-\frac{3}{6}log_2\frac{3}{6}-\frac{3}{6}log_2\frac{3}{6}
E=12log21212log212E = -\frac{1}{2}\log_{2}\frac{1}{2} -\frac{1}{2}\log_{2}\frac{1}{2}
E=0.5log2210.5log221E = -0.5\log_{2}2^{-1} -0.5\log_{2}2^{-1}
E=0.51+0.51E = 0.5 \cdot 1 + 0.5 \cdot 1
E=1E=1

Hard : = [Down, Down, Up, Up]

E=24log22424log224E=-\frac{2}{4}log_2\frac{2}{4}-\frac{2}{4}log_2\frac{2}{4}
E=12log21212log212E=-\frac{1}{2}\log_2\frac{1}{2}-\frac{1}{2}\log_2\frac{1}{2}
E=0.5log2210.5log221E=-0.5\log_22^{-1}-0.5\log_22^{-1}
E=0.51+0.51E = 0.5 \cdot 1 + 0.5 \cdot 1
E=1E=1

=1(6101+4101)=1-\left(\frac{6}{10}\cdot1+\frac{4}{10}\cdot1\right)
=1(0.6+0.4)=0=1 - (0.6 + 0.4) = 0

Gain(Type) = 0

Find the maximum Gain

Gain(Age) = 0.6

Gain(Competition) = 0.125

Gain(Type) = 0

max(0.6,0.125,0)=0.6\text{max}\left(0.6,0.125,0\right)=0.6

Age gives the highest gain (0.6).

Thus, "Age" will be placed at the root of the Decision Tree.

The Decision Tree Formation

ID3 decision tree initial structure showing root node Age splitting into categories old, mid, and new

, a subset of for which Age = Old

AgeCompetitionTypeProfit (Class)
OldYesSoftDown
OldNoSoftDown
OldNoHardDown

Observation:

  • All 3 examples → Profit = Down
    ✅ Pure group!

thus

Entropy(S1)=0Entropy(S^1)=0

Leaf Node = "Down"

a subset of for which Age = Mid

AgeCompetitionTypeProfit (Class)
MidYesSoftDown
MidYesHardDown
MidNoHardUp
MidNoSoftUp

Observation:

  • 2 examples → Down
  • 2 examples → Up
Entropy(S2)=24log22424log224Entropy\left(S^2\right)=-\frac{2}{4}log_2\frac{2}{4}-\frac{2}{4}log_2\frac{2}{4}

✅ Mid group is impure (entropy = 1).
Further splitting is needed.

🔷Attribute 1: Competition(Yes, No)

Yes: = [Down, Down]

Entropy(Yes)=0Entropy(\text{Yes})=0

No: = [Up, Up]

Entropy(No)=0Entropy(\text{No})=0

=1(240+240)=1-\left(\frac{2}{4}\cdot0+\frac{2}{4}\cdot0\right)
10=11-0=1

Gain(Competition) = 1

🔷Attribute 2: Type(Soft, Hard)

Soft: = [Down, Up]

12log21212log212-\frac{1}{2}log_2\frac{1}{2}-\frac{1}{2}log_2\frac{1}{2}
=1=1

Hard : = [Down, Up]

12log21212log212-\frac{1}{2}log_2\frac{1}{2}-\frac{1}{2}log_2\frac{1}{2}
=1=1

=1(241+241)=1-\left(\frac{2}{4}\cdot1+\frac{2}{4}\cdot1\right)
11=01-1=0

Gain(Type) = 1

Since the Gain is maximum for the attribute, putting the competition at Node 2

a subset of for which Age = New

AgeCompetitionTypeProfit (Class)
NewYesSoftUp
NewNoHardUp
NewNoSoftUp
  • All 3 examples → Profit = Up
    ✅ Pure group!
Entropy(New)=0Entropy(New)=0

Leaf Node = "Up"

ID3 decision tree final structure with Age as root and Competition as child node leading to Up and Down classification

All the corresponding class labels are down in
✅ Pure group!

Entropy(New)=0Entropy(New)=0

Leaf Node = "Down"

All the corresponding class labels are up in
✅ Pure group!

Entropy(New)=0Entropy(New)=0

Leaf Node = "Up"