We know that one hot encoding increases the dimensionality of a dataset, but label encoding doesn’t. How?

When we use one hot encoding, there is an increase in the dimensionality of a dataset. The reason for the increase in dimensionality is that, for every class in the categorical variables, it forms a different variable. Example: Suppose, there is a variable ‘Color.’ It has three sub-levels as Yellow, Purple, and Orange. So, one … Read more

Why rotation is required in PCA? What will happen if you don’t rotate the components?

Rotation is a significant step in PCA as it maximizes the separation within the variance obtained by components. Due to this, the interpretation of components becomes easier. The motive behind doing PCA is to choose fewer components that can explain the greatest variance in a dataset. When rotation is performed, the original coordinates of the … Read more

How do you handle the missing or corrupted data in a dataset?

In Python Pandas, there are two methods that are very useful. We can use these two methods to locate the lost or corrupted data and discard those values: isNull(): For detecting the missing values, we can use the isNull() method. dropna(): For removing the columns/rows with null values, we can use the dropna() method. Also, … Read more

Imagine, you are given a dataset consisting of variables having more than 30% missing values. Let’s say, out of 50 variables, 8 variables have missing values, which is higher than 30%. How will you deal with them?

To deal with the missing values, we will do the following: We will specify a different class for the missing values. Now, we will check the distribution of values, and we would hold those missing values that are defining a pattern. Then, we will charge these into a yet another class, while eliminating others. When … Read more

Explain Logistic Regression.

Logistic regression is the proper regression analysis used when the dependent variable is categorical or binary. Like all regression analyses, logistic regression is a technique for predictive analysis. Logistic regression is used to explain data and the relationship between one dependent binary variable and one or more independent variables. Also, it is employed to predict … Read more