Decision Tree

PUBLISHED: MAY 2, 2026β€’6 MIN READ

Decision TreeA decision tree is a tree-shaped diagram used to determine a course of action. Each branch of the tree represents a possible decision, occurrence,

Abhishek Singh Rajput
Abhishek SinghAuthor
tyranitar
#248
tyranitar

Decision Tree

A decision tree is a tree-shaped diagram used to determine a course of action. Each branch of the tree represents a possible decision, occurrence, or reaction.

Example

Consider a situation where someone is searching for a job. At the beginning of the process, they decide to consider only those jobs offering a monthly salary of at least β‚Ή50,000. Additionally, they dislike spending excessive time commuting and are only comfortable if the travel time to work is less than an hour. They also expect the company to provide free coffee every morning.

The decision-making process regarding whether to accept or reject a job offer can be schematically represented using a decision tree.

This is a figure representing a decision tree:

Decision tree example showing classification based on salary, commute time, and incentives leading to accept or decline decision

Structure of a Decision Tree

A decision tree is a graph-theoretical tree, where:

  • Leaf nodes (represented as ellipses) indicate final decisions or outcomes.
  • Internal nodes (all nodes except the root) represent intermediate decisions based on various conditions.

Types of Decision Trees

  1. Classification Trees: These are tree models where the target variable can take a discrete set of values. In a classification tree:
    • Leaves represent class labels.
    • Branches represent conjunctions of features leading to specific class labels.
  2. Regression Trees: These trees handle target variables that take continuous values (i.e., real numbers). Examples include:
    • Predicting the price of a house.
    • Estimating the duration of a phone call.

Classification Tree,

See, we'll explain this with the help of an example.

A dataset is given to us, with features and class labels.

Advantages of Decision Tree

  1. It is simple to understand, interpret, and visualize.
  2. Little effort required for data preparation
  3. Can handle both categorical and numerical data
  4. Non-linear parameters do not affect its performance

Disadvantages of decision trees

  1. Overfitting occurs when the A algorithm captures noise in the data.
  2. High variance can cause the model to become unstable due to even slight variations in the data.
  3. highly complicated decision tree tends to have low buyers, which makes it typical for the model to work with new data

Entropy

entropy is a measure of impurity in a data set, sets with high entropy are very diverse and provide little information about other items that may also belongs in the set, as there is no apparent commonality entropy measured in bits, if there are only two possible classes, entropy value range from 0 to 1.
For classes, entropy ranges from 0 to . In each case, the minimum value indicates that the sample is completely homogenous, which the maximum value indicates that the data is as diverse as possible

Definition

consider a segment of dataset having number of class labels, let be the proportion of the examples m having the class label. then the Entropy of is defined as:

Entropy(Sβ€²)=βˆ’βˆ‘i=1cpilog⁑2(pi)\text{Entropy}(S^{\prime})=-\sum_{i=1}^{c}p_{i}\log_2\left(p_{i}\right)

πŸ” Where:

  • = the dataset (or subset of examples),
  • = number of classes (e.g., "Up", "Down"),
  • = proportion of examples in class ,
  • ​ = logarithm base 2.
Decision tree visualization showing probability curve or decision boundary used to evaluate splits in classification

in this expression for Entropy, the value

πŸ’‘ Example:

If you have:

  • 4 "Up"
  • 6 "Down"

Then:

pUp=410,pDown=610p_{\text{Up}}=\frac{4}{10},\quad p_{\text{Down}}=\frac{6}{10}
Entropy=βˆ’(410log⁑2410+610log⁑2610)Entropy=-\left(\frac{4}{10}\log_2\frac{4}{10}+\frac{6}{10}\log_2\frac{6}{10}\right)

That’s how you get the uncertainty of a dataset πŸ”₯

Special Case

Let the data segment has only two class labels says "" and "" if is proportion of examples having class label "", then proportion of examples have label "", will be . in this case Entropy of is given by

Entropy(Sβ€²)=βˆ’plog2(p)βˆ’(1βˆ’p)log2(1βˆ’p)Entropy\left(S^{\prime}\right)=-plog_2\left(p\right)-\left(1-p\right)log_2\left(1-p\right)

πŸ’‘ Example:

Let be some class Label, then we denote the proportion of examples with class label i

S.noNameGive BirthAquatic AnimalAerial AnimalHas LegsClass Label
1HumanYesNoNoYesMammal
2PythonNoNoNoNoReptile
3SalmonNoYesNoNoFish
4FrogNoSemiNoYesAmphibian
5BatYesNoYesYesBird
6PigeonNoNoYesYesBird
7CatYesNoNoYesMammal
8SharkYesYesNoNoFish

Let be the data in given table, the class label are

Class LabelCount
Mammal22/8
Reptile11/8
Fish22/8
Amphibian11/8
Bird22/8

Entropy(Sβ€²)=βˆ’βˆ‘forΒ allΒ classesΒ ipilog⁑2(pi)\text{Entropy}(S^{\prime})=-\sum_{\text{for all classes i}}p_{i}\log_2(p_{i})
βˆ’(28log⁑228)βˆ’(18log⁑218)βˆ’(28log⁑228)βˆ’(18log⁑218)βˆ’(28log⁑228)-\left(\frac{2}{8}\log_2\frac{2}{8}\right)-\left(\frac{1}{8}\log_2\frac{1}{8}\right)-\left(\frac{2}{8}\log_2\frac{2}{8}\right)-\left(\frac{1}{8}\log_2\frac{1}{8}\right)-\left(\frac{2}{8}\log_2\frac{2}{8}\right)
=0.5+0.375+0.5+0.375+0.5= 0.5 + 0.375 + 0.5 + 0.375 + 0.5
=2.25=2.25

Information Gain:

Let be a set of Examples, A be a feature (or an attribute), be the subset of with , and value of be the set of all possible values of A. then the information gain of an attribute A relative to the set , denoted by

πŸ“˜ Gain Formula:

Gain(S,A)=Entropy(Sβ€²)βˆ’βˆ‘v∈Values(A)∣Sv∣∣Sβˆ£β‹…Entropy(Sv)\text{Gain}(S,A)=\text{Entropy}(S^{\prime})-\sum_{v\in\text{Values}(A)}\frac{|S_v|}{|S|}\cdot\text{Entropy}(S_{v})

πŸ” Where:

  • = full dataset
  • = attribute (like Age, Type, etc.)
  • = all possible values of attribute A
  • ​ = subset of S where attribute A has value v
  • = size of that subset
  • = size of total dataset
  • = entropy of that subset

Computation of

πŸ”· Attribute : gives birth(Yes, No)

Yes : = [Mammal, Mammal, Fish, Bird]

E=βˆ’(24log⁑224)βˆ’(14log⁑214)βˆ’(14log⁑214)E=-\left(\frac{2}{4}\log_2\frac{2}{4}\right)-\left(\frac{1}{4}\log_2\frac{1}{4}\right)-\left(\frac{1}{4}\log_2\frac{1}{4}\right)
E=0.5+0.5+0.5E=0.5+0.5+0.5
E=1.5E=1.5

No : = [Reptile, Fish, Amphibian, Bird]

E=βˆ’(14log⁑214)βˆ’(14log⁑214)βˆ’(14log⁑214)βˆ’(14log⁑214)E=-\left(\frac{1}{4}\log_2\frac{1}{4}\right)-\left(\frac{1}{4}\log_2\frac{1}{4}\right)-\left(\frac{1}{4}\log_2\frac{1}{4}\right)-\left(\frac{1}{4}\log_2\frac{1}{4}\right)
E=0.5+0.5+0.5+0.5E=0.5+0.5+0.5+0.5
E=2E=2

=Entropy(S)βˆ’βˆ£Syes∣∣Sβˆ£β‹…Entropy(Syes)β€…β€Šβˆ’βˆ£Sno∣∣Sβˆ£β‹…Entropy(Sno)β‹…Entropy(Sno)=\text{Entropy}(S)-\frac{|S_{yes}|}{|S|}\cdot\text{Entropy}(S_{yes})\;-\frac{|S_{no}|}{|S|}\cdot\text{Entropy}(S_{no})\cdot\text{Entropy}(S_{no})
=2.25βˆ’(48β‹…1.5+48β‹…2)=2.25-\left(\frac{4}{8}\cdot1.5+\frac{4}{8}\cdot2\right)
2.25βˆ’1.75=0.52.25 - 1.75= 0.5

Gini Indices

The gini slit index of a dataset is another feature selection measure in the construction of classification tree. This measure is used in the cart algorithm.

Consider a data Set having Tau class labels let be the proportion of examples having the class label , The Gini index of the data set , denoted by Gini is defined by:

πŸ“˜ Gain Formula:

Gini(Sβ€²)=1βˆ’βˆ‘i=1Ο„pi2Gini(S^{\prime})=1-\sum_{i=1}^{\tau}p_{i}^2

construct the decision tree using algorithm for given data set.

S.noAgeCompetitionTypeProfit (Class)
1OldYesSoftDown
2OldNoSoftDown
3OldNoHardDown
4MidYesSoftDown
5MidYesHardDown
6MidNoHardUp
7MidNoSoftUp
8NewYesSoftUp
9NewNoHardUp
10NewNoSoftUp

πŸ”§ Step 1: Initial Entropy of Dataset

Entropy(Sβ€²)=βˆ’pUplog2(pUp)βˆ’pDownlog2(pDown)Entropy(S^{\prime})=-p_{Up}log_2(p_{Up})-p_{Down}log_2(p_{Down})

Where:

  • Total = 10
  • Down = 5 (1 to 5)
  • Up = 5 (6 to 10)

So:

Entropy(Sβ€²)=βˆ’510log⁑2(510)βˆ’510log⁑2(510)Entropy(S^{\prime})=-\frac{5}{10}\log_2\left(\frac{5}{10}\right)-\frac{5}{10}\log_2\left(\frac{5}{10}\right)
=βˆ’510log2(12)βˆ’510log2(12)=-\frac{5}{10}log_2\left(\frac{1}{2}\right)-\frac{5}{10}log_2\left(\frac{1}{2}\right)
=βˆ’510log2(2)βˆ’1βˆ’510log2(2)βˆ’1= -\frac{5}{10}log_2\left(2\right)^{-1} - \frac{5}{10}log_2\left(2\right)^{-1}
=510log2(2)+510log2(2)=\frac{5}{10}log_2\left(2\right)+\frac{5}{10}log_2\left(2\right)
=510β‹…1+510β‹…1=\frac{5}{10}\cdot1+\frac{5}{10}\cdot1
=0.5+0.5=0.5 + 0.5

βœ… Entropy() = 1

Step 2: Calculate Information Gain for all attributes

Attribute 1: Age (Old, Mid, New)

Old : = [Down, Down, Down]

E=βˆ’33log⁑233E=-\frac{3}{3}\log_2\frac33

E = 0

Mid : = [Down, Down, Up, Up]

=βˆ’24log224βˆ’24log224=-\frac{2}{4}log_2\frac{2}{4}-\frac{2}{4}log_2\frac24

$$$

New : = [Up, Up, Up]

E=βˆ’33log233E=-\frac{3}{3}log_2\frac33

$

=Entropy(Sβ€²)βˆ’βˆ£Sold∣∣Sβˆ£β‹…Entropy(Sold)βˆ’βˆ£Smid∣∣Sβˆ£β‹…Entropy(Smid)βˆ’βˆ£Snew∣∣Sβˆ£β‹…Entropy(Snew)=\text{Entropy}(S^{\prime})-\frac{|S_{old}|}{|S|}\cdot\text{Entropy}(S_{old})-\frac{|S_{mid}|}{|S|}\cdot\text{Entropy}(S_{mid})-\frac{|S_{new}|}{|S|}\cdot\text{Entropy}(S_{new})
=1βˆ’(310β‹…0+410β‹…1+310β‹…0)=1-\left(\frac{3}{10}\cdot0+\frac{4}{10}\cdot1+\frac{3}{10}\cdot0\right)
1βˆ’0.4=0.61 - 0.4 = 0.6

βœ… Gain(Age) = 0.6

Attribute 2: Competition (Yes, No)

Yes : = [Down, Down, Down, Up]

=βˆ’34log234βˆ’14log214=-\frac{3}{4}log_2\frac{3}{4}-\frac{1}{4}log_2\frac14
E=βˆ’0.75β‹…log⁑20.75βˆ’0.25β‹…log⁑20.25E=-0.75\cdot\log_20.75-0.25\cdot\log_20.25
Eβ‰ˆ0.811E \approx 0.811

No : = [Down, Down, Up, Up, Up, Up]

=βˆ’26log226βˆ’46log246=-\frac{2}{6}log_2\frac{2}{6}-\frac{4}{6}log_2\frac46
E=βˆ’0.333.log⁑20.333βˆ’0.666.log⁑20.666E=-0.333.\log_20.333-0.666.\log_20.666
Eβ‰ˆ0.918E \approx 0.918

=1βˆ’(410β‹…0.811+610β‹…0.918)=1-\left(\frac{4}{10}\cdot0.811+\frac{6}{10}\cdot0.918\right)
β‰ˆ1βˆ’(0.3244+0.5508)=0.1248\approx1-(0.3244+0.5508)=0.1248

βœ… Gain(Competition) = 0.125

Attribute 3: Type (Soft, Hard)

Soft : = [Down, Down, Down, Up, Up, Up]

E=βˆ’36log236βˆ’36log236E=-\frac{3}{6}log_2\frac{3}{6}-\frac{3}{6}log_2\frac{3}{6}
E=βˆ’12log⁑212βˆ’12log⁑212E=-\frac12\log_2\frac12-\frac12\log_2\frac12
E=βˆ’0.5log⁑22βˆ’1βˆ’0.5log⁑22βˆ’1E = -0.5\log_{2}2^{-1} -0.5\log_{2}2^{-1}
E=0.5β‹…1+0.5β‹…1E=0.5\cdot1+0.5\cdot1
E=1E=1

Hard : = [Down, Down, Up, Up]

E=βˆ’24log224βˆ’24log224E=-\frac{2}{4}log_2\frac{2}{4}-\frac{2}{4}log_2\frac{2}{4}
E=βˆ’12log⁑212βˆ’12log⁑212E=-\frac{1}{2}\log_2\frac{1}{2}-\frac{1}{2}\log_2\frac12
E=βˆ’0.5log⁑22βˆ’1βˆ’0.5log⁑22βˆ’1E = -0.5\log_{2}2^{-1} -0.5\log_{2}2^{-1}
E=0.5β‹…1+0.5β‹…1E=0.5\cdot1+0.5\cdot1
E=1E=1

=1βˆ’(610β‹…1+410β‹…1)=1-\left(\frac{6}{10}\cdot1+\frac{4}{10}\cdot1\right)
=1βˆ’(0.6+0.4)=0=1-(0.6+0.4)=0

βœ… Gain(Type) = 0

Find the maximum Gain

βœ… Gain(Age) = 0.6

βœ… Gain(Competition) = 0.125

βœ… Gain(Type) = 0

max(0.6,0.125,0)=0.6\text{max}\left(0.6,0.125,0\right)=0.6

Age gives the highest gain (0.6).

Thus, "Age" will be placed at the root of Decision Tree.

The Decision Tree Formation

Decision tree diagram with root node β€œAge” splitting into branches (Old, Mid, New) leading to child nodes

, a subset of for which Age = Old

AgeCompetitionTypeProfit (Class)
OldYesSoftDown
OldNoSoftDown
OldNoHardDown

Observation:

  • All 3 examples β†’ Profit = Down
    βœ… Pure group!

thus

Entropy(S1)=0Entropy(S^1)=0

Leaf Node = "Down"

a subset of for which Age = Mid

AgeCompetitionTypeProfit (Class)
MidYesSoftDown
MidYesHardDown
MidNoHardUp
MidNoSoftUp

Observation:

  • 2 examples β†’ Down
  • 2 examples β†’ Up
Entropy(S2)=βˆ’24log224βˆ’24log224Entropy\left(S^2\right)=-\frac{2}{4}log_2\frac{2}{4}-\frac{2}{4}log_2\frac24
Entropy(S2)=1Entropy\left(S^2\right)=1

βœ… Mid group is impure (entropy = 1).
Further splitting needed.

πŸ”·Attribute 1: Competition(Yes, No)

Yes : = [Down, Down]

Entropy(Yes)=0Entropy(\text{Yes})=0

No: = [Up, Up]

Entropy(No)=0Entropy(\text{No})=0

=1βˆ’(24β‹…0+24β‹…0)=1-\left(\frac{2}{4}\cdot0+\frac{2}{4}\cdot0\right)
1βˆ’0=11-0=1

βœ… Gain(Competition) = 1

πŸ”·Attribute 2: Type(Soft, Hard)

Soft: = [Down, Up]

βˆ’12log212βˆ’12log212β€…β€Š=β€…β€Š1-\frac{1}{2}log_2\frac{1}{2}-\frac{1}{2}log_2\frac12\;=\;1

Hard : = [Down, Up]

βˆ’12log212βˆ’12log212β€…β€Š=β€…β€Š1-\frac{1}{2}log_2\frac{1}{2}-\frac{1}{2}log_2\frac{1}{2}\;=\;1

=1βˆ’(24β‹…1+24β‹…1)=1-\left(\frac{2}{4}\cdot1+\frac{2}{4}\cdot1\right)
1βˆ’1=01-1=0

βœ… Gain(Type) = 1

Since the Gain is maximum for the attribute, so putting the competition at Node 2

a subset of for which Age = New

AgeCompetitionTypeProfit (Class)
NewYesSoftUp
NewNoHardUp
NewNoSoftUp
  • All 3 examples β†’ Profit = Up
    βœ… Pure group!
Entropy(New)=0Entropy(New)=0

Leaf Node = "Up"

Decision tree with root β€œAge” and secondary split β€œCompetition,” leading to final outcomes Down or Up.

All the corresponding class labels are down in
βœ… Pure group!

Entropy(New)=0Entropy(New)=0

Leaf Node = "Down"

All the corresponding class labels are up in
βœ… Pure group!

Entropy(New)=0Entropy(New)=0

Leaf Node = "Up"