Multiple Regression
Source: Data from OECD (“The Organisation for Economic Co-operation and Development”)
Variables Used:
*Employment Rate - It is the number of employed persons aged 15 to 64 over the population of the same age. (source: OECD)
*Life Satisfaction - This indicator considers people's evaluation of their life as a whole (source: OECD)
*GDP - Gross Domestic Product per capita (source: OECD)
*GDP - Gross Domestic Product per capita (source: OECD)
I centered the explanatory variables
and checked the coding by using the means procedure.
Introduction:
My previous data analysis revealed a strong positive
correlation between Life Satisfaction (response variable) and Employment Rate
(explanatory variable).
This time, I added one more explanatory variable (GDP per capita) in order to run a multiple linear regression.
Hypothesis: There is a significant association between two explanatory variables and one response variable.
Code:
After adding the second explanatory variable, the correlation between Life Satisfaction (response variable) and
Employment Rate (initial explanatory variable) remained significantly and positively associated (b=0.056, p=0.0014). However, it
appeared that that there is no significant relationship between GDP per capita of a country (second explanatory variable) and the Life Satisfaction of its citizens (b=0.000018, p=0.0531). This would suggest that this explanatory variable is
confounding the results. It can ruin the experiment and give useless results.
Results which I obtained did not support my hypothesis. The assumption that both explanatory variables are significantly correlated with the response variable proved to be wrong. Only one of them (Employment Rate) has a strong relationship with the response variable (Life Satisfaction).
Results which I obtained did not support my hypothesis. The assumption that both explanatory variables are significantly correlated with the response variable proved to be wrong. Only one of them (Employment Rate) has a strong relationship with the response variable (Life Satisfaction).
Using a second explanatory variable slightly increases the R-squared value of the model. The R-square value of 0.532020 indicates that the proportion of variance in the
response variable that can be attributed to the explanatory variable is 53.2%.
Q-Q Plot
The Q-Q Plot shows that the residuals generally follow a straight line, but deviate somewhat at lower and middle quantiles, i.e. the residuals do not follow perfect normal distribution.
Standard Residuals
This procedure shows that almost the same number of countries have standard residuals grater and lower than 0. Only one of them is greater than 2 and one other lower than -2, making this model acceptable.
Outliers and Leverage
The Outlier and Leverage Diagnostics plot shows that the majority of the points have close to zero leverage and are within a residual standardized value of 2. That is, the majority of the observations have no leverage on the model. However, there are 2 observations that are outliers (red) and 2 that have high leverage (green). There are no points which are both an outlier and have high leverage.
Q-Q Plot
The Q-Q Plot shows that the residuals generally follow a straight line, but deviate somewhat at lower and middle quantiles, i.e. the residuals do not follow perfect normal distribution.
Standard Residuals
This procedure shows that almost the same number of countries have standard residuals grater and lower than 0. Only one of them is greater than 2 and one other lower than -2, making this model acceptable.
Outliers and Leverage
The Outlier and Leverage Diagnostics plot shows that the majority of the points have close to zero leverage and are within a residual standardized value of 2. That is, the majority of the observations have no leverage on the model. However, there are 2 observations that are outliers (red) and 2 that have high leverage (green). There are no points which are both an outlier and have high leverage.
Brak komentarzy:
Prześlij komentarz