You’re asked to build a random forest model with 10000 trees. During its training, you got training error as 0.00. But, on testing the validation error was 34.23. What is going on? Haven’t you trained your model perfectly?

  • The model is overfitting the data.
  • Training error of 0.00 means that the classifier has mimicked the training data patterns to an extent.
  • But when this classifier runs on the unseen sample, it was not able to find those patterns and returned the predictions with more number of errors.
  • In Random Forest, it usually happens when we use a larger number of trees than necessary. Hence, to avoid such situations, we should tune the number of trees using cross-validation.