Which algorithms can be used for important variable selection?

Random Forest, Xgboost and plot variable importance charts can be used for variable selection.   The choice of algorithms for feature or variable selection in machine learning depends on the specific characteristics of the data and the problem at hand. Here are some commonly used algorithms for variable selection: Recursive Feature Elimination (RFE): RFE is … Read more

When should ridge regression be preferred over lasso?

We should use ridge regression when we want to use all predictors and not remove any as it reduces the coefficient values but does not nullify them.   In machine learning interviews, if you’re asked when ridge regression should be preferred over lasso, you can provide the following explanation: Ridge regression and Lasso are both … Read more

Which algorithm can be used in value imputation in both categorical and continuous categories of data?

KNN is the only algorithm that can be used for imputation of both categorical and continuous variables.   One commonly used algorithm for value imputation in both categorical and continuous categories of data is the k-Nearest Neighbors (k-NN) algorithm. For categorical data, the algorithm considers the majority class of the k-nearest neighbors, and for continuous … Read more

Which metrics can be used to measure correlation of categorical data?

Chi square test can be used for doing so. It gives the measure of correlation between categorical predictors.   When measuring the correlation of categorical data, you typically use metrics that are suitable for categorical variables. One commonly used metric for this purpose is Cramér’s V. Cramér’s V is a measure of association between two … Read more

What distance metrics can be used in KNN?

Following distance metrics can be used in KNN. Manhattan Minkowski Tanimoto Jaccard Mahalanobis In K-Nearest Neighbors (KNN), various distance metrics can be used to measure the similarity or dissimilarity between data points. The choice of distance metric depends on the nature of the data and the problem at hand. Some commonly used distance metrics in … Read more