What are the advantages and disadvantages of neural networks?

Advantages: Neural networks (specifically deep NNs) have led to performance breakthroughs for unstructured datasets such as images, audio, and video. Their incredible flexibility allows them to learn patterns that no other ML algorithm can learn. Disadvantages: However, they require a large amount of training data to converge. It’s also difficult to pick the right architecture, … Read more

What are the advantages and disadvantages of decision trees?

Advantages: Decision trees are easy to interpret, nonparametric (which means they are robust to outliers), and there are relatively few parameters to tune. Disadvantages: Decision trees are prone to be overfit. However, this can be addressed by ensemble methods like random forests or boosted trees. For an interview question about the advantages and disadvantages of … Read more

How much data should you allocate for your training, validation, and test sets?

You have to find a balance, and there’s no right answer for every problem. If your test set is too small, you’ll have an unreliable estimation of model performance (performance statistic will have high variance). If your training set is too small, your actual model parameters will have high variance. A good rule of thumb … Read more

What are 3 data preprocessing techniques to handle outliers?

Winsorize (cap at threshold). Transform to reduce skew (using Box-Cox or similar). Remove outliers if you’re certain they are anomalies or measurement errors. There are several data preprocessing techniques to handle outliers in machine learning. Here are three commonly used ones: Removing outliers: One straightforward approach is to remove the data points that are identified … Read more

What is the Box-Cox transformation used for?

The Box-Cox transformation is a generalized “power transformation” that transforms data to make the distribution more normal. For example, when its lambda parameter is 0, it’s equivalent to the log-transformation. It’s used to stabilize the variance (eliminate heteroskedasticity) and normalize the distribution. The Box-Cox transformation is a statistical technique used primarily in data preprocessing for … Read more