Entropy is an indicator of how messy your data is. It decreases as you reach closer to the leaf node.
The Information Gain is based on the decrease in entropy after a dataset is split on an attribute. It keeps on increasing as you reach closer to the leaf node.

Entropy and Information Gain are concepts commonly used in decision tree algorithms, particularly in the context of feature selection. Here’s a breakdown of the differences between them:

Entropy:
- Entropy measures the impurity or disorder in a group of examples.
- In the context of decision trees, entropy is used to quantify the uncertainty of a given dataset.
- Mathematically, for a set S with two classes (e.g., positive and negative examples), entropy is calculated using the formula:Entropy (S) = – p₁ log₂ (p₁) – (p₂) log₂(p₂)
  where p₁and p₂are the proportions of positive and negative examples in set S.
Information Gain:
- Information Gain measures the effectiveness of a feature in classifying the data.
- It quantifies the reduction in entropy (or uncertainty) after splitting a dataset on a particular feature.
- A feature with higher information gain is considered more useful for splitting the dataset.
- Mathematically, Information Gain for a feature A with respect to a dataset S is calculated as follows:IG (S , A) = Entropy(S) – ∑_ⱴϵValues(A) Entropy(S_v)
  where $Va l u es (A)$ are the possible values of feature A, $S_{v}$ is the subset of examples in S with value $v$ for feature A, and $∣ S ∣$ denotes the number of examples in set S.

In summary, entropy measures the uncertainty in a dataset, while information gain measures the reduction in uncertainty achieved by splitting the dataset on a particular feature. In decision tree algorithms, features with higher information gain are preferred for splitting, as they lead to more significant reductions in entropy and thus better classification of the data.