Residuals Analysis

In regression analysis, errors (residuals) are assumed to be normally distributed with zero mean and constant (homogeneous) variance, and uncorrelated.

Normality Analysis

Any software, including MS Excel will produce a normal probability plot (pp-plot) to test the normality of the data. If most points follow a straight line of the pp-plot, the data set is normally distributed. In the following example pp-plot, the residuals are normally distributed. Even though a pp-plot is highly subjective in nature in determining to what extend the data is normally distributed, it is the most widely used normality test. However, an obvious deviation can be easily detected by the plot. If the data points are not too away from the straight line, it can be considered as a normally distributed data set.

Figure 10. Normal Probability Plot for Residuals

Constant (Homogeneous) Variance Check

Any software, including MS Excel produce the fitted value vs the residual plots, which can be utilized to test the homogeneousness of variance (Figure 11). Any pattern in the residual plot is a violation of the assumptions on the residuals (Figure 11). While the top-left graph looks perfect, the other three residual plots show some pattern or some predictability. Any predictability (=any pattern) of residuals is considered a violation of the homogeneousness (constancy) of the residuals (Figure 11). The data must be reinvestigated for remedial actions before drawing any conclusion from this regression analysis.

Figure 11. Residual Analysis for Homogeneousness (Constancy) of Variance

Uncorrelated (Non-independence) Variance Assumptions Check

Any time data is collected with a sequence with respect to times, places, processes, etc.; the observation order vs. the residual plot shall be investigated for any correlation between the order of the data collection and the residuals. Fortunately, all statistical software, including MS Excel produces this plot. The left plot shows a positive correlation of the residuals with the observation order, while the right graph shows a cyclic correlation between the residuals and the observation order (Figure 12). An uncorrelated residual would look like the top-left plot in Figure 11. The data must be reinvestigated for remedial actions before drawing any conclusion from a regression analysis if a correlation between residuals and the observation order is detected.

Figure 12. Residual Analysis for Uncorrelated Variance Violation