How do you ensure you’re not overfitting with a model?

This is a simple restatement of a fundamental problem in machine learning: the possibility of overfitting training data and carrying the noise of that data through to the test set, thereby providing inaccurate generalizations.

There are three main methods to avoid overfitting:

  • Keep the model simpler: reduce variance by taking into account fewer variables and parameters, thereby removing some of the noise in the training data.
  • Use cross-validation techniques such as k-folds cross-validation.
  • Use regularization techniques such as LASSO that penalize certain model parameters if they’re likely to cause overfitting.