What is homoscedastic?
Homoskedastic (also spelled “homoscedastic”) refers to a condition in which the variance of the residual term, or error term, in a regression model is constant. That is, the error term does not vary much when the value of the predictor variable changes. However, the lack of homoscedasticity may suggest that the regression model may need to include additional predictive variables to explain the performance of the dependent variable.
Key points to remember
- Homoscedasticity occurs when the variance of the error term in a regression model is constant.
- If the variance of the error term is homoscedastic, the model was well defined. If there is too much variance, the model may not be well defined.
- Adding additional predictive variables can help explain the performance of the dependent variable.
- In contrast, heteroskedasticity occurs when the variance of the error term is not constant.
How Homoskedastic works
Homoscedasticity is a hypothesis of linear regression modeling. If the variance of the errors around the regression line varies a lot, the regression model may be poorly defined. The opposite of homoscedasticity is heteroscedasticity, just as the opposite of “homogeneous” is “heterogeneous”. Heteroscedasticity (also spelled “heteroscedasticity”) refers to a condition in which the variance of the error term in a regression equation is not constant.
When considering that the variance is the measured difference between the expected outcome and the actual outcome of a given situation, determining homoskedasticity can help determine which factors need to be adjusted for more precision.
A simple regression model, or equation, consists of four terms. On the left is the dependent variable. It represents the phenomenon that the model seeks to “explain”. On the right side are a constant, a predictor and a residual or error term. The error term indicates the amount of variability in the dependent variable that is not explained by the predictor variable.
For example, suppose you wanted to explain student test results using the time each student spent studying. In this case, the test results would be the dependent variable and the time spent studying would be the predictor variable.
The error term would show the amount of variance in the test results that was not explained by the study time. If this variance is uniform or homoscedastic, this would suggest that the model may be an adequate explanation for the performance of the test – explaining it in terms of time spent studying.
But the variance can be heteroscedastic. Plotting the error term data may show that a large amount of study time corresponded very closely to high test scores, but that low study time test scores varied considerably and even included very high scores. Thus, the variance of scores would not be well explained simply by a predictive variable – study time. In this case, another factor is probably at work and the model may need to be improved in order to identify it or identify them. Further investigation may reveal that some students had seen the answers to the test in advance or had already taken a similar test, and therefore did not need to study for this particular test.
To improve the regression model, the researcher would therefore add another explanatory variable indicating whether a student had seen the answers before the test. The regression model would then have two explanatory variables – study time and whether the student had prior knowledge of the answers. With these two variables, a larger part of the variance of the test results would be explained and the variance of the error term could then be homoscedastic, suggesting that the model was well defined.