Do you think 50 small decision trees are better than a large one? Why?

Another way of asking this question is “Is a random forest a better model than a decision tree?” And the answer is yes because a random forest is an ensemble method that takes many weak decision trees to make a strong learner. Random forests are more accurate, more robust, and less prone to overfitting. Whether … Read more

What are the drawbacks of a linear model?

There are a couple of drawbacks of a linear model: A linear model holds some strong assumptions that may not be true in application. It assumes a linear relationship, multivariate normality, no or little multicollinearity, no auto-correlation, and homoscedasticity A linear model can’t be used for discrete or binary outcomes. You can’t vary the model … Read more

Why is Naive Bayes so bad? How would you improve a spam detection algorithm that uses naive Bayes?

One major drawback of Naive Bayes is that it holds a strong assumption in that the features are assumed to be uncorrelated with one another, which typically is never the case. One way to improve such an algorithm that uses Naive Bayes is by decorrelating the features so that the assumption holds true. The question … Read more

What is principal component analysis? Explain the sort of problems you would use PCA for.

In its simplest sense, PCA involves project higher dimensional data (eg. 3 dimensions) to a smaller space (eg. 2 dimensions). This results in a lower dimension of data, (2 dimensions instead of 3 dimensions) while keeping all original variables in the model. PCA is commonly used for compression purposes, to reduce required memory and to … Read more

When would you use random forests Vs SVM and why?

There are a couple of reasons why a random forest is a better choice of model than a support vector machine: Random forests allow you to determine the feature importance. SVM’s can’t do this. Random forests are much quicker and simpler to build than an SVM. For multi-class classification problems, SVMs require a one-vs-rest method, … Read more