Decision Tree Algorithm

Place the best feature (attribute) of the dataset at the root of the tree.
Split the training set into subsets, where each subset contains data with the same value for a feature.
Repeat steps 1 and 2 recursively for each subset until:
- All branches lead to leaf nodes.
- Leaf nodes contain class labels (decisions).

📌 The ID3 Algorithm (based on Information Gain)

Used when:

Symbol	Description
	Set of examples (training data)
	Set of class labels {+, -}
	Set of features (attributes)
	feature in
	Set of possible values of feature
	A single value from
	Subset of where

Create a root node for the tree.
If all examples in 𝑆 are positive, → return leaf node labeled "+"
If all examples in 𝑆 are negative, → return leaf node labeled "−"
If there are no more features, return a leaf node with the most common class in 𝑆.
Else:
- Choose feature A with the highest Information Gain.
- Assign A to the root node.
- For each value v in V(A):
  - Create a branch below the root labeled A = v
  - If 𝑆ᵥ is empty, add a leaf node with the most common label in 𝑆.
  - Else, recursively apply the algorithm on 𝑆ᵥ and features F \ {A}

Entropy(S)=-\left(\frac{9}{14}\right)\log_2\left(\frac{9}{14}\right)-\left(\frac{5}{14}\right)\log_2\left(\frac{5}{14}\right)

=0.940

✅ Entropy() = 0.940

Sunny: = [No, No, No, Yes, Yes]

E=-\left(\frac{3}{5}\log_2\frac{3}{5}\right)-\left(\frac{2}{5}\log_2\frac{2}{5}\right)

E=-\left(0.6\log_2{0.6}\right)-\left(0.4\log_2{0.4}\right)

E = -(0.6 \times - 0.737) - (0.4 \times - 1.322)

E=0.971

Sunny: = [Yes, Yes, Yes, Yes]

✅ Pure Node

E=0

Sunny: = [Yes, Yes, Yes, No, No]

E=-\left(\frac{3}{5}\log_2\frac{3}{5}\right)-\left(\frac{2}{5}\log_2\frac{2}{5}\right)

E=-\left(0.6\log_2{0.6}\right)-\left(0.4\log_2{0.4}\right)

E = -(0.6 \times - 0.737) - (0.4 \times - 1.322)

E=0.971

=Entropy(S^{\prime})-\frac{|S_{sunny}|}{|S|}\cdot Entropy(S_{sunny})\\-\frac{|S_{overcast}|}{|S|}\cdot Entropy(S_{overcast})\\-\frac{|S_{rain}|}{|S|}\cdot Entropy(S_{rain})

=0.94-\left(\frac{5}{14}\cdot0.971+\frac{4}{14}\cdot0+\frac{5}{14}\cdot0.971\right)

=0.2464

✅ Gain(Outlook) = 0.2464