Name a few hyper-parameters of decision trees?

The most important features which one can tune in decision trees are:

  • Splitting criteria
  • Min_leaves
  • Min_samples
  • Max_depth

In machine learning interviews, when asked about hyper-parameters of decision trees, you can mention several key hyper-parameters that are commonly used to tune and optimize the performance of decision trees. Some of these include:

  1. Maximum Depth (max_depth): This hyper-parameter controls the maximum depth of the decision tree. A deeper tree may capture more complex patterns in the training data, but it also increases the risk of overfitting.
  2. Minimum Samples Split (min_samples_split): It represents the minimum number of samples required to split an internal node. This parameter helps control the growth of the tree and prevent overfitting.
  3. Minimum Samples Leaf (min_samples_leaf): This hyper-parameter sets the minimum number of samples required to be in a leaf node. It works similarly to min_samples_split but applies to the leaves. It can also help prevent overfitting.
  4. Maximum Features (max_features): It determines the maximum number of features considered for splitting a node. Setting it to a lower value can help reduce overfitting.
  5. Criterion: The function used to measure the quality of a split. Common values include “gini” for the Gini impurity and “entropy” for information gain.
  6. Splitter: The strategy used to choose the split at each node. It can be “best” to choose the best split or “random” to choose the best random split.
  7. Class Weight: This parameter is used to assign weights to classes, which is useful in imbalanced datasets. It helps the model give more importance to minority classes.

When discussing these hyper-parameters, you may also want to mention their impact on the model, how they help in controlling overfitting, and the trade-offs involved in tuning them.