czwartek, 28 kwietnia 2016

Regression Modeling in Practice: WEEK 1

Introduction to Regression

Assignment: Writing About Your Data

Sample

Step 1: Describe your sample. Provide enough detail so that your reader can clearly understand the population that the study sample came from. Use meaningful labels. Do not use abbreviations (“PPM100”) or variable names.

a) Describe the study population (who or what was studied).

b) Report the level of analysis studied (individual, group, or aggregate).

c) Report the number of observations in the data set.

d) Describe your data analytic sample (the sample you are using for your analyses).

ANSWERS to Step 1:

a) The sample comes from the Organisation for Economic Co-operation and Development (OECD). The main goal of OECD is to promote policies that will improve the economic and social well-being of people around the world. 

It collects and provides important data concerning “the quality of life” in 34 OECD member countries, i.e.: Australia, Austria, Belgium, Canada, Chile, Czech Republic, Denmark, Estonia, Finland, France, Germany, Greece, Hungary, Iceland, Ireland, Israel, Italy, Japan, Korea, Luxembourg, Mexico, Netherlands, New Zealand, Norway, Poland, Portugal, Slovak Republic, Slovenia, Spain, Sweden, Switzerland, Turkey, United Kingdom, United States.

The project includes a wide variety of data, concerning both material well-being (such as income, jobs and housing) and the broader quality of people’s lives (such as their health, education, work-life balance, environment, social connections, civic engagement, subjective well-being and safety).

b) The level of the analysis is aggregate.

c) Number of observations: 34 countries and 24 corresponding variables. No data is missing.

d) The data analytic sample for this study includes 34 countries and the following 6 variables: Gross Domestic Product (GDP) per capita, Employment Rate, Personal Earnings, Household Income, Percent of People with at least High School Education and Life Satisfaction Index. Gross Domestic Product is the indicator of a country´s wealth, while the other variables are the indicator´s of people´s well-being.

Procedure


Step 2: Describe the procedures that were used to collect the data.

a) Report the study design that generated that data (for example: data reporting, surveys, observation, experiment).

b) Describe the original purpose of the data collection.

c) Describe how the data were collected.

d) Report when the data were collected.

e) Report where the data were collected.

ANSWERS to Step 2:

a) The data on the most interesting variable, i.e. Life Satisfaction Index were obtained by the internet survey on the OECD website. The participants were asked to give subjective opinion about their life quality on the scale of 0 to 10 using the Cantril Ladder (known also as the "Self-Anchoring Striving Scale"). Additionally, other questions were given, concerning participants´ economical, educational and employment background, in order to make sure that the sample used for statistical assessment of Life Satisfaction is representative. Life Satisfaction Index is updated on the basis of new surveys every year.
The data on other variables were calculated by using existing data from national and international statistical databases and applying special mathematical formulas.
The data on GDP per capita is based on GDP data from the OECD Annual National Accounts. It is expenditure on final goods and services minus imports.
The data on Employment Rate comes from OECD Labour Force Statistics Database. It is the number of employed people in the working age, i.e. 15 to 65, over the population of the same age.
Personal Earnings are calculated combining data from the OECD Earnings distribution database and OECD average annual earnings per full-time and full-year equivalent dependent employee database. It is total wage bill divided by the average number of employees, which is then multiplied by the ratio of usual weekly hours per full-time employee to average usually weekly hours for all employees.
Household Disposable Income variable is calculated by OECD calculations on the basis of OECD National Accounts at a Glance and Statistics New Zealand. It's obtained adding to people’s gross income, the social transfers in-kind that households receive from governments, and then subtracting the taxes on income and wealth, the social security contributions paid by households as well as the depreciation of capital goods consumed by households.
Education variable comes from OECD Education at glance database. It is the number of adults aged 25 to 64 holding at least an upper secondary degree over the population of the same age.

b) The purpose of the original data collection was to compare the quality of life around the world.

c) One variable, i.e. "Life Satisfaction index" was collected by the on-line survey on OECD website. The other variables were collected and calculated using existing data from the databases of national and international statistical institutions.

d) The data were collected by trained OECD statisticians during 2012, 2013 and 2014.

e) The data were collected in 34 member countries of the Organization for Economic Cooperation and Development OECD.
The names of these countries are mentioned at the beginning of this blog entry.

No further detail concerning the procedure is provided by OECD.


Measures

Step 3: Describe your variables.

a) Describe what your explanatory and response variables measured.

b) Describe the response scales for your explanatory and response variables.

c) Describe how you managed your explanatory and response variables.

ANSWERS to Step 3:

a) The variables from the data analytic sample measure the followings:

1 - GDPGross domestic product per capita. It is used as an indicator of a country´s wealth. 

2 - Employment rate - is a number of employed people at the working age, i.e. 15 to 64, over the population of the same age. Employed people are those who report that they worked for at least one hour in the previous week.

3 - Personal Earnings - refer to the average annual wages per full-time equivalent dependent employee.

4 -Household disposable income - It´s the maximum amount that a household can afford to consume without having to reduce its assets or to increase its liabilities.

5 - Percent of people with at least High School Education considers the number of adults aged 25 to 64 holding at least an upper secondary degree over the population of the same age, as defined by the OECD-ISCED classification.

6 - Life Satisfaction- considers people's evaluation of their life as a whole. It is a weighted-sum of different response categories based on people's rates of their current life relative to the best and worst possible lives for them on a scale from 0 to 10, using the Cantril Ladder (known also as the "Self-Anchoring Striving Response Scale").


b) Self-Anchoring Striving Response Scale (used for measuring the "Life Satisfaction index") was developed by a social researcher Dr. Hadley Cantril. It is an example of wellbeing assessment. It uses following steps:
  • Please imagine a ladder with steps numbered from zero at the bottom to 10 at the top.
  • The top of the ladder represents the best possible life for youand the bottom of the ladder represents the worst possible life for you.
  • On which step of the ladder would you say you personally feel you stand at this time? (ladder present)
  • On which step do you think you will stand about five years from now? (ladder-future)

No other response scales were used.


c) In my research,  I´ve been analyzing the relationship between different pairs or groups of three variables in order to check the strength of their correlation.

For the reason of broad range of data, I categorized the explanatory and response variables  and created new variables with 4 to 6 levels.

Initially, I tested how the country´s wealth (GDP) - explanatory variable - affects the indicators of people´s well-being - response variables - i.e. Personal Earnings, Household Income, Employment, Education and Life Satisfaction Index.


The most important variable in my research is the "Life Satisfaction Index" as it refers to how happy people are with their life. Therefore, I have also done a number of tests analyzing the relationship between Employment Rate, Education, Personal Earnings, Household Income (as explanatory variables) and Life Satisfaction (as response variable).


The main purpose of my study is to observe what is the most important for people to be happy - whether it is money, education, work or other aspects?!

Until now, the well-being analysis has brought me a lot of interesting results. The finding might be observed in the following entries:

http://mygapminder.blogspot.pt/2016/03/assignment-week-4.html

http://mygapminder.blogspot.pt/2016/04/data-analysis-tools-week-1.html

http://mygapminder.blogspot.pt/2016/04/data-analysis-tools-week-4.html

Data Analysis Tools: WEEK 4

Exploring Statistical Interactions


Assignment: Testing a Potential Moderator
Source: New data from OECD (“The Organisation for Economic Co-operation and Development)

Variables Used: 

*Employment Rate -  It is the number of employed persons aged 15 to 64 over the population of the same age. (source: OECD)

*Life Satisfaction - This indicator considers people's evaluation of their life as a whole (source: OECD)

*Education - percent of people with at least High School Education. (source: OECD)


Introduction:

In the first week of Data Analysis Tools, I examined the correlation between Employment rate (explanatory variable) and Life Satisfaction (response variable). The analysis proved a strong positive relationship between the two variables. The higher Employment Rate, the higher Life Satisfaction.

This week I decided to test if the Education variable (percent of people with at least high school education) is a moderator of the said relationship. I categorized the Education into two levels: countries with low percent (1) and high percent (2) of educated people.
Similarly, I categorized Employment rate (explanatory variable) into two levels: low employment rate (1) and high employment rate (2). After categorization, I ran a new Anova procedure for the Employment Rate and Life Satisfaction relationship in two  "Education" moderator sub-groups:


CODE:



Output:

[Table 1]



[Table 2] 



Interpretation:

In this analysis, I was interested to see if the moderator “Education” affects the relationship between Employment Rate (explanatory variable) and Life Satisfaction (response variable).

It appeared that the moderator does not change the relationship between the two variables. The level of Life Satisfaction was still higher with the increase of Employment Rate in both moderator sub-groups.

Moreover, in each sub-group the relationship between Employment Rate and Life Satisfaction remained statistically significant.

Below, there are more details concerning the results of Anova procedure:

[Table 1]
ANOVA on Employment Rate compared with Life Satisfaction– in subgroup moderator “Countries with LOW percent of people with at least high school education”

F-statistic: 11.65
Prob (F-statistic):
0.0039

Since p is less than 0.05, we can reject the null hypotheses and say that there is a significant relationship between Employment Rate and Life Satisfaction in countries with LOW percent of people with at least high school education.

[Table 2]
ANOVA on Employment Rate compared with Life Satisfaction– in subgroup moderator “Countries with HIGH percent of people with at least high school education”

F-statistic: 7.11
Prob (F-statistic):
0.0176

Again, as p is less than 0.05, there is a significant relationship between Employment Rate and Life Satisfaction in countries with HIGH percent of people with at least high school education.

***

Taking everything into consideration, we can assume that the Education variable (percent of people with at least High School education) does not moderate the relationship between Employment Rate and Life Satisfaction. The said relationship remains positive and statistically significant in both sub-groups of the moderator.