Random Forest (ID3 algorithm)

PUBLISHED: MAY 2, 20261 MIN READ

Random Forest (ID3 algorithm)Random Forest is a machine-learning algorithm that builds many decision trees and then combines their results to make a better fina

Divya Sachan
Divya SachanAuthor
floatzel
#419
floatzel

Random Forest (ID3 algorithm)

Random Forest is a machine-learning algorithm that builds many decision trees and then combines their results to make a better final prediction.
It reduces overfitting and increases accuracy by taking the “majority vote” (classification) or “average” (regression) from all trees

Numerical For Constructing Random Forest

DayOutlookTempHumidityWindCan Play
D1SunnyHotHighWeakNo
D2SunnyHotHighStrongNo
D3OvercastMildHighWeakYes
D4RainCoolHighWeakYes
D5RainCoolNormalWeakYes
D6RainCoolNormalStrongNo
D7OvercastCoolNormalStrongYes
D8SunnyMildHighWeakNo
D9SunnyCoolNormalWeakYes
D10RainMildNormalWeakYes
D11SunnyMildNormalStrongYes
D12OvercastMildHighStrongYes
D13OvercastHotNormalWeakYes
D14RainMildHighStrongNo

Find the Class for the unseen data point

OutlookTempHumidityWind
OvercastMildNormalWeak

Model 1

DayOutlookTempHumidityWindCan Play
D1SunnyHotHighWeakNo
D2SunnyHotHighStrongNo
D3OvercastMildHighWeakYes
D4RainCoolHighWeakYes
D5RainCoolNormalWeakNo
D6RainCoolNormalStrongYes
D7OvercastCoolNormalStrongYes
D8SunnyMildHighWeakNo
D9SunnyCoolNormalWeakYes
D10RainMildNormalWeakYes

Set (S)

Entropy(S)=i=1cpilog2piEntropy(S)=-\sum_{i=1}^{c}p_{i}\log_2p_{i}
(610log2610)(410log2410)- \left( \frac{6}{10} \log_2 \frac{6}{10} \right) - \left( \frac{4}{10} \log_2 \frac{4}{10} \right)
0.442+0.5290.442 + 0.529
0.971\boxed{0.971}

Calculate information gain for all attributes.

Attribute 1 OUTLOOK

  • Sunny [No, No, No, Yes]
    E =
    E = 0.811
  • Overcast [yes, yes]
    E = 0 pure
  • Rain [Yes, Yes, Yes, No]
    E = 0.811

Gain(S, outlook)

0.9711(410×0.811+210×0+410×0.811)0.9711-\left(\frac{4}{10}\times0.811+\frac{2}{10}\times0+\frac{4}{10}\times0.811\right)
gain(S, outlook) = 0.322\boxed{\text{gain(S, outlook) = } 0.322}

Attribute 2 Temp

Hot[No, NO]E = 0
Mild[Yes, Yes, No]E = 0.918
Cool[Yes, Yes, Yes, Yes, Yes, No]E = 0.722

Gain(S, Temp)

0.9711(210×0+310×0.918+510×0.722)0.9711-\left(\frac{2}{10}\times0+\frac{3}{10}\times0.918+\frac{5}{10}\times0.722\right)
gain(S, Temp) = 0.3347\boxed{\text{gain(S, Temp) = } 0.3347}

Attribute 3 Humidity

High[No, No, No, Yes, Yes]E = 0.9711
Normal[No, Yes, Yes, Yes, Yes]E = 0.722

Gain(S, Humidity)

0.9711(510×0.971+510×0.722)0.9711-\left(\frac{5}{10}\times0.971+\frac{5}{10}\times0.722\right)
gain(S, Humidity) = 0.125\boxed{\text{gain(S, Humidity) = } 0.125}

Attribute 4 Wind

Weak[No, No, Yes, Yes, Yes, Yes, Yes]E = 0.9711
Strong[No, No, Yes]E = 0.722

Gain(S, Wind)

0.9711(410×0.86+310×0.92)0.9711-\left(\frac{4}{10}\times0.86+\frac{3}{10}\times0.92\right)
gain(S, wind) = 0.093\boxed{\text{gain(S, wind) = }0.093}

Temp has Max gain

Random forest decision tree initial split based on temperature feature showing branches hot, mild, and cool

Branch Temp - mild ->

Set (S1)

OutlookTempHumidityWindCan Play
OvercastMildHighWeakYes
SunnyMildHighWeakNo
RainMildNormalWeakYes

Attribute 1 OUTLOOK

Overcast E = 0 (pure)sunny E = 0 (pure)Rain E = 0 (pure)Gain = 0.918

Attribute 2 Humidity

High E = 1Normal E = 0Gain = Gain = 0.251

Attribute 3 Wind

Gain = 0

Wind has only one value, which is "weak"; thus, gain will be 0

Outlook has max gain

Random forest decision tree with multiple splits using temperature and outlook features leading to final classification outcomes

Branch Temp - Cool ->

Set (S2)

OutlookHumidityWindCan Play
RainHighWeakYes
RainNormalWeakYes
RainNormalStrongNo
OvercastNormalStrongYes
SunnyNormalWeakYes

Attribute 1 OUTLOOK

Overcast E = 0 (pure)sunny E = 0 (pure)Rain E = 0.918Gain = Gain = 0.1712

Attribute 2 Humidity

High E = 0Normal E = 0Gain = Gain = 0.251

Attribute 3 Wind