Is it beneficial to perform dimensionality reduction before fitting an SVM? Why or why not?

When the number of features is greater than the number of observations, then performing dimensionality reduction will generally improve the SVM. Whether it’s beneficial to perform dimensionality reduction before fitting a Support Vector Machine (SVM) depends on the specific dataset and the goals of the analysis. Here are some considerations: Curse of Dimensionality: In high-dimensional … Read more

How to check if the regression model fits the data well?

There are a couple of metrics that you can use: R-squared/Adjusted R-squared: Relative measure of fit. This was explained in a previous answer F1 Score: Evaluates the null hypothesis that all regression coefficients are equal to zero vs the alternative hypothesis that at least one doesn’t equal zero RMSE: Absolute measure of fit. To assess … Read more

What is collinearity and what to do with it? How to remove multicollinearity?

Multicollinearity exists when an independent variable is highly correlated with another independent variable in a multiple regression equation. This can be problematic because it undermines the statistical significance of an independent variable. You could use the Variance Inflation Factors (VIF) to determine if there is any multicollinearity between independent variables — a standard benchmark is … Read more

What are the assumptions required for linear regression? What if some of these assumptions are violated?

The assumptions are as follows: The sample data used to fit the model is representative of the population The relationship between X and the mean of Y is linear The variance of the residual is the same for any value of X (homoscedasticity) Observations are independent of each other For any value of X, Y … Read more

Why is mean square error a bad measure of model performance? What would you suggest instead?

Mean Squared Error (MSE) gives a relatively high weight to large errors — therefore, MSE tends to put too much emphasis on large deviations. A more robust alternative is MAE (mean absolute deviation). Mean squared error (MSE) is not necessarily a “bad” measure of model performance, but it has some limitations and may not always … Read more