Let’s analyze the data for the human comfort study provided in Video 3. Video 6 provides detailed analysis of RSM using Minitab, and its explanations.
Human Comfort vs the Temperature and Humidity Study Data
*Note. The data set used in the video is different from this data set.
Video 6. Response Surface Methodology Design of Experiments Analysis Explained Example using Minitab
Analysis Results Explained
Video 6 provides the detailed analysis of RSM using Minitab, and its explanations. The Minitab Analysis Output is provided below. The output sequences vary from software to software, even within a software from version to version. However, the explanation sequence will be provided as suggested in the earlier Module in Applied Regression Analysis.
Figure 10. Minitab Analysis Output for Comfort vs Temperature and Humidity
Step # 1
The Statistical Significance
The statistical significance is checked using the analysis of variance (ANOVA) table. The overall model p-value (0.000) is less than the level of significance (0.05). Therefore, we reject the null hypothesis of no relationship between the dependent and the independent variables. Therefore, the full quadratic model of the temperature and the humidity factors (independent variables) significantly affect the response comfort (dependent variable).
The p-value for the linear terms for both factors, the temperature, and the humidity, are also lower than the level of significance. Therefore, the linear terms significantly affect comfort.
The p-value for the quadratic terms for both factors is also observed to be lower than the level of significance. Therefore, the quadratic terms for the temperature and the humidity significantly affect comfort.
The interaction between the temperature and the humidity is observed to be insignificant with respect to comfort.
The model suffers no lack-of-fit because the p-value (0.051) is larger than the level of significance (0.05). Therefore, the quadratic model with the predictor variable temperature and humidity significantly predict human comfort.
Step # 2
The Practical Significance
The practical significance test is performed using the model summary output table. The coefficient of determination, the adjusted R-square value is observed to be 98% indicating that the model parameters can explain variation in the dependent variable, the comfort response very well. Therefore, the model has good practical significance.
Step # 3
The coefficient table and the response equation are utilized to explain the coefficients. The response equations for the coded and the uncoded levels are provided in Equation 3 and Equation 4, respectively. The regression equation in Equation 3 for the coded units are developed from the coefficient table of the Minitab output in Figure 10.
The sign of the coefficients indicates the direction of the relationship while the coefficient value represents the strength of the relationship. To explain the coefficients, let’s use the uncoded level equations which will be easier to understand in the context of the problem.
Explanation of the Constant
If all other terms are set zero (0), the comfort value is equal to -241.4.
Explanation of the Linear Coefficients
The linear coefficient value of the temperature is 6.869, which indicates that if all other terms are held constant in the model, the comfort will increase by 6.869 (=6.869*1) if the temperature increases by 1-degree Fahrenheit. Comfort will increase by 10 times (68.69 = 6.869*10) if the temperature increases by 10-degree Fahrenheit, and so on. Nevertheless, the explanation for coefficients one by one like this will be completely misleading. The comfort was measured on a scale from 0 to 10. Therefore, comfort of 68.69 does not make any sense. Therefore, the interpretation of a single term is NOT an interest in RSM or any multiple polynomial regression models. Nevertheless, the interpretation of a single term is understandably misleading once we know how to explain it!
Explanation of the Quadratic Coefficients
The quadratic coefficient of the Temperature*Temperature is - 0.04837, which indicates that, if other terms are held constant in the model, the comfort will decrease by .04837 (=0.04837*1*1) if the temperature increase by 1-degree Fahrenheit. The comfort will decrease 100 times (-4.837=-0.04837*10*10) if the temperature increases by 10-degree Fahrenheit. The negative comfort does not exist in the scale. Therefore, the explanation for the individual terms does not make much sense, which is not an interest in the response surface methodology.
Step # 4
The Model Diagnostic
The multicollinearity between the independent variables (predictor variables) is checked using the Variance Inflation Factor (VIF). The Minitab output Table # 1 in Figure 10 provides the value for the VIF. The following guideline is used for checking the multicollinearity between predictors.
· 1 = not correlated.
· Between 1 and 5 = moderately correlated.
· Greater than 5 = highly correlated.
Variance Inflation Factor (VIF) for all predictors are observed to be around 1, meaning that there is no multicollinearity between a predictor and the other predictors.
Normality, Constant, and Uncorrelated Variance
The residuals follow approximately a straight line in the pp-plot (normal probability plot) in Figure 11, indicating normal distribution for the residuals.
Figure 11. RSM Diagnostic the Normality of the Residuals
The residuals vs fitted plot in Figure 12 shows no obvious pattern, indicating no predictability of the residuals. Therefore, the residuals are considered homogenous (constant).
Figure 13 shows residuals vs the observation order plot, which shows no obvious pattern. Therefore, there is no violation of the uncorrelated variance in this model.
Figure 12. RSM Diagnostic Homogeneousness (Constancy) of Variance
Figure 13. RSM Diagnostic Uncorrelated Variance
Outlier, Leverage, and Influential Point
Outlier – an outlier is defined by a point whose residuals are relatively higher as compared to other data points. The residuals for the point # 1 and #5 are observed to be higher than usual (Table 4). Therefore, these two points are considered outliers for this data set.
Leverage – a leverage point is considered whose x value is large, but the y-value follows the fitted response. The diagonal element of the hat matrix, HI, is used to determine the x-outlier (or the leverage point). As this is a systematically designed experiment, there is no reason to have an x-outlier or leverage point. There is no unusually large HI value observed in the residual diagnostic analysis in Table 4. Therefore, no leverage point is observed in the residual analysis output.
Influential – a point is considered influential if the probability value (p-value) calculated from the Cook’s distance is over 50%. The probability value (p-value) is calculated using the Cook’s distance as the f-value with p and n-p degrees of freedom for the numerator and the denominator, respectively. The point number #1 and #5 shows the probability over 50% indicating these two points with large influence on the response surface. If the DFIT value is larger than 1 for small to medium data set and 2√(p⁄n) for large data set, they are considered influential points. The value for the 2√(p⁄n)=2√(6⁄13)=1.358 for this data set. This data set is considered small. Therefore, any DFIT value over 1 will be considered to have some influence on the data. According to the DFIT value, point # 1, 3, 5, 7, and 8 could be considered as large with respect to the residuals.
Delete Outlier, Leverage, and Influential Points?
The decision, whether to delete or to keep the outlier and influential data points varies from situations to situations. Usually, influential points are recommended to be deleted from the data. In this human subject study, they are considered normal that individuals could have experienced a very wide range of comfort with respect to the temperature and humidity. Moreover, response surface analysis would not be possible if the single observation for these points are deleted. Therefore, rather than deleting these influential points, it is recommended to collect more data to see whether these data points are really unusual or not. If any points are deleted from the study, the analysis must be rerun.
Residual Analysis for Outlier, Leverage, and Influential Data Point
p-value = FDIST(COOK, df1, df2); df1=p = 6 (five parameters + constant), df2=n-p = 11-6 = 7