What is the difference between R-squared and adjusted R-squared?

R-squared measures the proportion of variation in the dependent variables explained by the independent variables.

Adjusted R-squared gives the percentage of variation explained by those independent variables that in reality affect the dependent variable.

R-squared (R2) and adjusted R-squared are both metrics used to evaluate the goodness of fit of a regression model. However, they have different interpretations and purposes:

  1. R-squared (R2):
    • R-squared is a measure of how well the independent variables in a regression model explain the variability of the dependent variable.
    • It ranges from 0 to 1, where 0 indicates that the independent variables do not explain any of the variability of the dependent variable, and 1 indicates that they explain all of the variability.
    • R-squared increases as you add more independent variables to the model, even if they are not statistically significant, which can lead to overfitting.
  2. Adjusted R-squared:
    • Adjusted R-squared adjusts the R-squared value for the number of predictors in the model.
    • It penalizes the addition of unnecessary predictors that do not significantly improve the model’s fit.
    • Adjusted R-squared takes into account the degrees of freedom, or the number of predictors and the sample size, to provide a more accurate estimate of the model’s predictive power.
    • Unlike R-squared, adjusted R-squared can decrease as you add more predictors if those predictors do not improve the model significantly.

In summary, while R-squared gives an overall measure of how well the independent variables explain the variability of the dependent variable, adjusted R-squared provides a more conservative measure that adjusts for the number of predictors in the model, thus helping to guard against overfitting. In general, adjusted R-squared is considered a more reliable metric when comparing models with different numbers of predictors.