HW 5 IS DUE SUNDAY (OCT.15)!
Example from HW4:
H0: How likely is that the work experience of Gary and Indiana workers is similar?
High p-value: the weaker evidence against null hypothesis
Smaller p-value: the stronger evidence against null hypothesis
One-sample t-test = test if the population statistic (e.g. mean) equals to a specific value.
For example, let’s test if the mean of x (which is a random variable with mean=5 and sd=10) equals 5:
\[H_0:m=5\] \[H_a:m≠5 (two-sided)\]
## [1] 5.45297
## 
##  One Sample t-test
## 
## data:  x
## t = 1.4098, df = 999, p-value = 0.1589
## alternative hypothesis: true mean is not equal to 5
## 95 percent confidence interval:
##  4.822449 6.083491
## sample estimates:
## mean of x 
##   5.45297
t statistic: 1.41
p-value: 0.158
We got p-value 0.158, which is higher than the convenience threshold (p = 0.05), which means that our evidence against null hypothesis is weak.
Now let’s test if the mean of x (which is a random variable with mean=5 and sd=10) equals 0:
\[H_0:m=0\] \[H_a:m≠0 (two-sided)\]
## [1] 5.45297
## 
##  One Sample t-test
## 
## data:  x
## t = 16.971, df = 999, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  4.822449 6.083491
## sample estimates:
## mean of x 
##   5.45297
t statistic: 16.971
p-value: 2.2e-16
Now, we’ve got p-value <0.001, which means that our evidence against null hypothesis is strong (–> reject null hypothesis)!
Two-sample t-test = test if the test statistics in group A equals to that of group B
Example 1: income difference between males and females
Example 2: amount of precipitation between Indiana and California
Example 3: your turn?
According to the Bureau of Labor Statistics (2022), gender gap between men and women earnings is 17%. For example, if men median monthly income is $1000, women’s income would be $830.
Let’s test if the mean income difference between males and females is statistically significant.
\[H_0: µ_{female\ income}=µ_{male\ income}\]
\[H_a: µ_{female\ income}≠µ_{male\ income}\]
## [1] 999.8919
## [1] 833.3138
## 
##  Welch Two Sample t-test
## 
## data:  female_inc and male_inc
## t = 30.555, df = 195121, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  252.0903 286.6485
## sample estimates:
## mean of x mean of y 
##   1644.15   1374.78
t statistic: 32.282
p-value: 2.2e-16
As you can see from the small p-value, it is highly unlikely that female and male incomes are equal (p<0.001)!
H0 is always “equal to”
One-sample test: population mean equals to a specific value: Ha: µ=X
Two-sample test: population mean in group A equals to that of group B: Ha: µA=µB
Ha is specified is several ways:
Two-sided test:
population mean does not equal to a specific value: Ha: µ≠X
population mean in group A does not equals to that of group B: Ha: µA≠µB
Left-sided test:
population mean is less than a specific value: Ha: µ<X
population mean in group A is less than to that of group B: Ha: µA<µB
Right-sided test:
population mean is greater than a specific value: Ha: µ>X
population mean in group A is greater than to that of group B: Ha: µA>µB
This is the data on the household composition of married and divorced respondents from the 2018 GSS. Specifically, it includes the total home population (hompop), the number of babies (babies), the number of preteens (preteen), the number of teens (teens), and the number of adults (adults) and marital status (divorced=1,married=2). You are interested in whether divorced respondents and married respondents live in different households and how they are different. Submit your R code along with your answers to the questions below.
Q1: 1. Calculate the mean and standard deviations of the five outcome variables (hompop, babies, preteen, teens, adults) (5 points)
table(hhdata$divorce) #Use '$' sign to choose the variable within dataset
#Variable divorce contains information on divorced (=1) and married (=2) participants.To calculate mean and standard deviation of five outcome variables, please use this code template:
mean(hhdata$variable) #replace variable with the name of variable of interest: hompop, babies, preteen, teens, adults
sd(hhdata$variable) #replace variable with the name of variable of interest: hompop, babies, preteen, teens, adultsWrite down the output after Q1.
Q2: 2. Using subsetting as explained in the script, calculate the mean and standard deviations of the five outcome variables (hompop, babies, preteen, teens, adults) separately for divorced and married couples. (10 points)
What is subsetting? Selecting a part of the dataset based on some criteria (e.g. marital status)
Using subsetting you can calculate any statistics you want for this group (e.g. mean, sd, variance):
## 
##   1   2 
## 116 112
## 
##   1   2 
## 317 396
You can also subsetting based on continuous variable:
## 
##  1  2 
## 18 16
Q2: 2. Using subsetting as explained in the script, calculate the mean and standard deviations of the five outcome variables (hompop, babies, preteen, teens, adults) separately for divorced and married couples. (10 points)
Use this template to finish Q2:
mean(hhdata$variable[hhdata$divorce==1]) #replace variable with the name of variable of interest: hompop, babies, preteen, teens, adults
sd(hhdata$variable[hhdata$divorce==2]) #replace variable with the name of variable of interest: hompop, babies, preteen, teens, adults
mean(hhdata$variable[hhdata$divorce==2]) #replace variable with the name of variable of interest: hompop, babies, preteen, teens, adults
sd(hhdata$variable[hhdata$divorce==2]) #replace variable with the name of variable of interest: hompop, babies, preteen, teens, adults3. For each of the five outcome variables, test whether the means for divorced and married couples are equal. (10 points)
For this question you will need to do two-sample t-test.
Use this code as a template:
Example:
Let’s test the difference in sex between divorced and married people:
\[H_0: µ_{gender\ divorced}=µ_{gender\ married}\]
\[H_a: µ_{gender\ divorced}≠µ_{gender\ married}\]
## 
##  Welch Two Sample t-test
## 
## data:  hhdata$sex[hhdata$divorce == 1] and hhdata$sex[hhdata$divorce == 2]
## t = -1.6865, df = 380.5, p-value = 0.09252
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.13898607  0.01064277
## sample estimates:
## mean of x mean of y 
##  1.491228  1.555400
t statistic: -1.68
p-value: 0.09
p-value is higher than 0.05, so we cannot reject the null hypothesis (H0: µDivorced=µMarried)