What are the Steps in Regression Analysis?
At the learning stage, the following steps could be suggested for an easier understanding of the regression analysis process.
Step # 0
The Most Useful Scatter Plot
Before performing any statistical analysis, simple scattered plot(s) between the dependent and the independent variable(s) can be performed to check if there is any major issue with the data, especially the linearity of the data and any extremely usual observations. Detail discussion on the data quality can be found in the Regression Analysis diagnostic section.
The first step of the regression analysis is to check whether there is any statistical significance between the dependent and the independent variables. If there is no statistically significant relationship between the dependent and the independent variables, the data diagnostic analysis (step #4) can be performed to check whether there is any problem/issue with the data that is causing the results to be statistically insignificant. If data are observed to be okay, step 2 and 3 are considered unnecessary, and the analysis may stop here.
The second step of the regression analysis is to check whether the statistically significant results have any practical significance. Often, there is statistical significance. However, the relationship may not be strong enough to predict the dependent variable well. If there is no practical significance of the results, the data diagnostic analysis (step #4) can be performed to check whether any problem/issue with the data that is causing the results to be practically insignificant. If data is observed to be okay, step # 3 is considered unnecessary, and the analysis may stop here.
When both step #1, and step #2 are significant, in step #3, the analysis results are explained in the context of the problem, particularly the explanation of the regression relationship, the slope parameter and the intercept.
Finally, in step #4, the diagnostic analysis is performed to check whether there is any problem in the data such as any outlier and influential points that may skew the results. Ideally, this step could be performed at first. However, the amount of time and resources it takes to perform this step does not justify this step first if there is no statistical significance between the dependent and the independent variables. Nevertheless, using any statistical software, (including MS Excel), this step can be performed within a couple of mouse clicks. The outliers and the influential points could be removed if justified from the analysis first before doing any steps in regression analysis at all. If this step is performed at the last step, the analysis must be rerun if the outliers and the influential points are removed. Finally, step 1, 2, and 3 must be performed again after the diagnostic analysis step. Though it sounds like the diagnostic should be performed first, many diagnostic analyses are impossible to perform without performing the analysis first, whether manually using formulas or using any software. Therefore, the regression analyses are performed a couple of times to produce the best analysis results, including the test statistics and the predicted fitted regression.