środa, 30 marca 2016

Data Management and Visualization: WEEK 4

Project:
DATA VISUALIZATION

Source: New Data Sheet from OECD (“The Organisation for Economic Co-operation and Development)

The objective of this program is to visualize data both by creating charts of individual variables and pairs of variables.

The source which I used is a new, imported data sheet with 34 developed countries (including 24 European countries) with GDP per capita variable and various variables responsible for Life Quality. All the data comes from OECD.org

The variables observed in this assignment are as follows:

*Countries - Australia, Austria, Belgium, Canada, Chile, Czech Republic, Denmark, Estonia, Finland, France, Germany, Greece, Hungary, Iceland, Ireland, Israel, Italy, Japan, Korea, Luxembourg, Mexico, Netherlands, New Zealand, Norway, Poland, Portugal, Slovak Republic, Slovenia, Spain, Sweden, Switzerland, Turkey, United Kingdom, United States.                                                                                                             

*GDP – Gross Domestic Product per capita

*LifeSat = Satisfaction Index – “The indicator considers people's evaluation of their life as a whole. It is a weighted-sum of different response categories based on people's rates of their current life relative to the best and worst possible lives for them on a scale from 0 to 10, using the Cantril Ladder (known also as the "Self-Anchoring Striving Scale")” (source: OECD)

*PearsEarn = Personal Earnings – “This indicator refers to the average annual wages per full-time equivalent dependent employee, which are obtained by dividing the national-accounts-based total wage bill by the average number of employees in the total economy, which is then multiplied by the ratio of average usual weekly hours per full-time employee to average usually weekly hours for all employees. It considers the employees’ gross remuneration, that is, the total before any deductions are made by the employer in respect of taxes, contributions of employees to social security and pension schemes, life insurance premiums, union dues and other obligations of employee (source: OECD)

*House Income = Household Disposable Income - “It's the maximum amount that a household can afford to consume without having to reduce its assets or to increase its liabilities. It's obtained adding to people’s gross income (earnings, self-employment and capital income, as well as current monetary transfers received from other sectors) the social transfers in-kind that households receive from governments (such as education and health care services), and then subtracting the taxes on income and wealth, the social security contributions paid by households as well as the depreciation of capital goods consumed by households. Available data refer to the sum of households and non-profit institution serving households” (source: OECD)

*Edu= Education – “Educational attainment considers the number of adults aged 25 to 64 holding at least an upper secondary degree over the population of the same age, as defined by the OECD-ISCED classification” (source: OECD)

* Work – Percentage of the working-age population (aged 15-64); "It is the number of employed persons aged 15 to 64 over the population of the same age. Employed people are those aged 15 or more who report that they have worked in gainful employment for at least one hour in the previous week, as defined by the International Labour Organization – ILO." (source: OECD)

In order to create charts, most of the above variables are categorized and new variables are produced:

“GDP2” with 4 categories for “GDP”
“SAT” with 4 categories for “LifeSat”
“Earn” with 5 categories for “PersEarn”
“House” with 3 categories for “House Income”
“ED” with 3 categories for “EDU”
And “Work2” with 3 categories for “Work”

Lower Categories in new variables correspond to lower numbers, therefore category “1” will always mean the lowest value.

CODE:

GRAPHS:

GDP

This graph is unimodal, with its highest peak (center) at the category of 30,000 to 50,000 $ GDP per capita.

*Out of 34 developed countries from the data, 20 countries (58.82%) fall into the above category.

The average GDP is $36023.2353.
And the standard deviation (spread) is 13220.128. This means that the differences between results are quite high.
The graph seems to be skewed to the right as there are higher frequencies in lower categories than the higher categories.


Personal Earnings

It´s a bimodal graph, with its highest (centers) peaks at the category $20,000 to $30,000 per capita and $40,000 to $50,000 personal earnings per capita.
*In 23.53% countries from the data, personal earnings are between $20,000 and $30,000, and
in 32.35% countries between $40,000 and $50,000.

The standard deviation (spread) for this variable is 12724.



Household Disposable income
This graph is unimodal, with its highest peak (center) at the category of 20,000 to 30,000 $.
It´s slightly skewed to the right as there are higher frequencies in lower values.

The average household disposable income is 22949.47
and the standard deviation (spread) is 6693. It is much lower than the spread of GDP or Personal Income which means that the results for household disposable income are much closer to each other.


Education

This graph is unimodal, with its highest peak (center) at the category of 70-92%.
It´s skewed to the left, which means that there´s higher frequency in higher categories.

* 76.47% of countries have more than 70% of people with at least high-school graduation. 23.53% of countries are below this category.

The average percentage is 74.5%
and the standard deviation is 16.26.



































Life Satisfaction

This graph has the highest peak (center) at categories "3" & "4", i.e. the highest Life Satisfaction categories (more than 6/10 index points).

The graph is skewed to the left, which means that there´s higher frequency in higher categories.

The average life satisfaction is 6.59 out of ten index points.
And the standard deviation is 0.8.


Work
The graph is almost flat which means that it does not have any particular center. There is almost the same number of low, middle and high values.

*There is very similar number of countries with 48-60%, 60-70% and 70-80% of employed people between the age 15 and 65.

The average work percentage is 66%.
The standard deviation (spread) is 7.35.

BIVARIATE GRAPHS:





Life Satisfaction vs. GDP

The graphs show the relationship between Life Satisfaction Index of a country and the country’s corresponding GDP.

We can see a trend that there´s more life satisfaction of people with the higher GDP of the country.

What´s interesting is that the highest income country does not seem to follow the trend. Its life satisfaction score is still reasonably high (6.9/10; Category 3/4) but lower than in countries with lower GDP category.

The said country is Luxembourg with GDP per capita of $83,394.4 – the only country in the category (“4”) of GDP per capita higher than $70,000.

Another interesting fact is that all countries with GDP category “3” have the highest Life Satisfaction index category (“4”). The countries in this category are Norway and Switzerland.

The lowest Life Satisfaction category (“1”) is seen only in countries with the lowest GDP category (“1”). The countries with both the lowest category of GDP (“1”) and Life Satisfaction (“1”) are Greece and Hungary.




GDP vs. Personal Earnings

The second plot proves that GDP and Personal Earnings have a very high correlation. Without any doubt, the higher GDP the higher personal Earnings.

However, the most interesting thing is how much of this money actually stays at home. To check it, I compared Personal Earnings and Household Income variables:






Personal Earnings vs. Household Income
In the third plot, I decided to check how Household Income depends on Personal Earnings. And once again, the dependency is very high. The higher Personal Earnings, the higher average Household Disposable Income.
Interestingly, in the highest Personal Earnings group, one country seems to have much lower Household Income than other countries in the group. This country is Iceland with $55,716 Personal Earnings (Category “5/5”) and $21,201 Household Income (Category “2/3”).
This means that the Personal Income in Iceland is highly decreased by such costs as taxes on income and wealth, the social security contributions paid by households as well as the capital goods consumed by households.
In other countries, as we can see in the plot chart, the results are much closer to each other.


GDP vs. Education

The dependence of Education on GDP is not as obvious as with other variables.

High percentage of people with at least high-school diploma is both observed in countries with lower and higher GDP.

However, it must be noted that the lowest Education category “1” appears only in countries with the lowest GDP category “1”. The countries with the lowest categories for both variables are Turkey, Mexico and Portugal.

Another pattern is that countries with the highest GDP categories (“3”&“4”) have only the highest percentage of high-school graduates’ category (“1”).

Therefore, the relationship between the Education and GDP, even if it´s not very strong and not apparent in all countries, exists. In countries with higher GDP, the average percentage of high-school graduates is higher than in countries with lower GDP.

GDP vs. WORK

The plot of WORK on GDP is very similar to the plot of Life Satisfaction on GDP.

The slope is rising. The higher GDP, the higher percentage of working people.

The exception of the pattern is also the same as in Life Satisfaction on GDP plot. It is Luxembourg with 65% of employed people at the age between 15 and 65. This score is lower than 47% of countries in the data.

After looking at these results, I decided to check the correlation between WORK and Life Satisfaction:



WORK vs. LIFE SATISFACTION

The above plot shows the relationship between percentage of working people and Life Satisfaction.

Higher Work percentage, higher Life Satisfaction. It is especially visible in countries with the work percentage over 70%:

There are 11 countries with work percentage higher than 70%. 9 of those countries have the highest Life Satisfaction category (“4”) and 2 remaining ones have category “3”.


CONCLUSION:

After analyzing and visualizing given variables, it appears that GDP has a strong relationship with Earnings, Income, Work and Life Satisfaction. The higher GDP, the higher the said variables.

The richest countries (Category “3&4”) have also high categories in other variables.

Countries with the lowest GDP (Category “1”) have more low scores in other variables.

The plot of Education on GDP was a little different. The spread of the results was much wider. Even in some of the countries with the lowest GDP, the percentage of people with at least high-school education was very high. However, the average percentage was, again, higher in countries with higher GDP.

In general, according to this analysis, the hypothesis that in countries with higher GDP there is better quality of life is correct. In higher GDP countries, people seem to have better material situation, good education, more work opportunities and higher life satisfaction.

The Countries with the highest sum of high categories for all variables are:
Switzerland, Norway, Luxembourg, Australia and United States.

Additionally, it appeared that there´s a strong correlation between Work and Life Satisfaction. The countries with both the highest percentage of working people (at the age of 15-65) and highest Life Satisfaction are:
Norway, Sweden, Iceland, New Zealand, Netherlands, Switzerland, Australia, Denmark and Canada.                                          

wtorek, 22 marca 2016

Data Management and Visualization: WEEK 3

DATA MANAGEMENT

Source: GapMinder

Variables observed: incomeperperson, oilperperson, relectricperperson
*incomeperperson – Gross Domestic Product (GDP) per capita
*oilperperson – Oil Consumption per capita
* relectricperperson -  Consumption of electricity per capita

Introduction:
Numerous researches conducted by leading economists show an interdependence between GDP, oil consumption and electricity consumption.

In this assignment, I decided to manage the data and observe the results in the following way:
  1. Collapse incomeperperson into 3 ranges, i.e. Low income, Middle Income and High Income Countries (according to the WORLD BANK thresholds), and check what percent of countries falls into each of these categories.
  2. Observe Oil and Electricity consumption across the world by collapsing oilperperson, relectricperperson values into LOW, AVERAGE and HIGH consumption.
  3.  Run frequency distribution for 3 new variables.
  4. Observe what are the differences in Oil and Electricity consumption according to the GDP range of countries.
  5. Observe how much Oil and Electricity consumption data is missing according to the GDP range of countries.
CODE:


Results:


Categories for each variable:
Category
GDP
OIL
*oil consumption
ELECTRO
*electricity consumption
1
High Income Countries
High
High
2
Middle Income Countries
Average
Average
3
Low income Countries
Low
Low


Summary:

I collapsed the responses for incomeperperson, oilperperson, relectricperperson to create three new variables: GDP, OIL, and ELECTRO. The table above shows what each of categories stands for.
I deleted all rows with missing incomeperperson and ended up with 190 results.
I kept missing data for oilperperson, relectricperperson and incorporated it into the results table, as "NO DATA", in order to see what percent of information is provided and, therefore, check how meaningful my research is.
  •  All countries with GDP data
The results in GDP table show that 21.58% of countries from the data (Category “1”) are High Income Countries, 50.00% countries (Category “2”) are Middle Income Countries and 28.42% countries (Category “3”) are Low Income Countries.

For Oil, the highest oil consumption is found in 7.37% countries (Category “1”), average consumption in 16.32% countries (Category “2”), and low consumption in 8.42% countries (Category “3”). 67.89% of Oil data is missing which means that all results and assumptions are based on very limited data.

For Electro, the highest electricity consumption is found in 16.32% countries (Category “1”), average consumption in 34.21% countries (Category “2”), and low consumption in 17.89% countries (Category “3”). 31.58% of electricity consumption data is missing which means that there´s much more data than in case of oil consumption.

  •  Comparison of oil and electricity consumption according to different ranges of GDP:
Variable
Low Income Countries
Middle Income Countries
High Income Countries
No. of countries
observed
54 countries
95 countries
41 countries
Oil Consumption
There are only 4 countries (7.41 %) with available data for this variable. And all of them fall into category “3”,
meaning low oil consumption .
In this range, data is provided for 34.74 % of countries:
1.05% with high
21.05% with average
12.63% with low
oil consumption.
From the available data (58.54%),
most countries (31.71%) hadhigh oil consumption. And 26.83%had average oil consumption.
There were no cases of low consumption.
Electricity
Consumption
For Electricity, there was much more data than for oil: 31 out of 54 countries (57.41%).
12.96% of all results are countries with average consumption and
44.44% percent with low
consumption. The rest is missing.
Again, For Electricity, there was much more data than for oil (70.53%):
4.21 %with high
55.79%with average
10.53% with low
electricity consumption.


Available data for 78.05% of countries including:
Big majority (65.85%) of highconsumption results
and 12.20% of averageconsumption results.
Similarly to Oil results, there were no cases of low electricity consumption.
Missing Data
92.59% for Oil and
42.59% for Electricity Consumption
65.26% for Oil and
29.47% for Electricity Consumption
41.46% for Oil and
21.95% for Electricity Consumption

If we look at the above table, we can see that in GapMinder data sheet there´s much more data concerning developed countries than developing countries as it comes to Oil and Electricity Consumption. It shows that it would be much better to collect more data before making meaningful assumption, especially regarding oil consumption and low income countries. My whole analysis is based on 32.11% of data for oil and 68.42% for electricity. The amount of data is especially scarce in low-income countries with 7.41% for oil and 57.41% for electricity.

However, even with such limited data, it is still possible to see some patterns in Oil and Electricity Consumption across different GDP ranges of countries.

The data provided proves that there´s a strong correlation between Oil and Electricity Consumption and the GDP of a country. The higher average oil and electricity consumption, the higher GDP range:

For Oil, in low income countries, there was only low consumption. In middle income countries, there were all ranges of consumption, however, the high consumption was found only in one out of 33 countries with available data. In high income countries, there were more cases of high consumption than average consumption and no evidence of low consumption.

For electricity, in low income countries, most results were low consumption. In middle countries, the majority was average consumption with few cases of low and high consumption. And in high income countries, most of them had high consumption with few cases of average consumption and no evidence of low consumption.