piątek, 22 kwietnia 2016

Data Analysis Tools: WEEK 3

Pearson Correlation


Assignment: Generating a Correlation Coefficient
Source: GapMinder
Variables usedincomeperperson, femaleemployrate, suicideper100TH (each of these variables is quantitative)

*incomeperperson – Gross Domestic Product (GDP) per capita
*femaleemployrate – Female Employment rate -Percentage of female population, age above 15, that has been employed during the given year.
*suicideper100TH - Suicide per 100th - Mortality due to self-inflicted injury, per 100 000 standard population, age adjusted


Introduction:

In this assignment, I decided to analyze the correlation in each pair of  two quantitative variables from the followings: GDP (per capita), Female Employment Rate and Suicide per 100th.

In my first attempt, I observed all the countries from the GapMinder set and it appeared that there is no significant relationship between any of the chosen variables.

After looking at the bivariate graphs, I noticed that in countries with the lowest GDP there is a much bigger spread of the results than in countries with higher GDP. Therefore, I decided to repeat the analysis and test the correlations for countries with GDP higher than 5000$ per capita:

CODE:


OUTPUT:










Interpretation:

GDP vs. Female Employment Rate


In countries with GDP higher than 5000$ per capita, the relationship between GDP and Female Employment Rate is significant. The p value is 0.0001. The Correlation Coefficient R is 0.48080 which means that there´s a positive association between the variables – the higher GDP, the higher Female Employment Rate.

R2 equals 0.23116864, indicating that if we know GDP (explanatory variable) we can predict 23% of the variability we will see in the Female Employment Rate (response variable).



GDP vs. Suicide per 100th


In countries with GDP higher than 5000$ per capita (similarly to all other countries), the relationship between GDP and Suicide per 100th is NOT significant as the p value is equal to 0.8545 and the Correlation Coefficient R is 0.02440.


Female Employment Rate vs. Suicide per 100th


Interestingly, in countries with GDP higher than 5000$, there is a statistically significant relationship between Female Employment Rate and Suicide per 100th. The p value is equal to 0.0405. This relatively high p value (almost 0.05) indicates that the correlation between the two variables is not very strong.
The Correlation Coefficient R is 0.26759 so the association between the variables is positive, i.e. the Suicide per 100th increases with the increase of Female Employment rate.
R2 equals 0.0716 which means that if we know the Female Employment Rate (explanatory variable) we can predict 7% of the variability we will see in the Suicide per 100th (response variable).





Brak komentarzy:

Prześlij komentarz