HW6 IS DUE SUNDAY (OCT.29) 11.59 PM!
We do not accept null hypothesis
We reject or fail to reject the null hypothesis. We have weak or strong support for null hypothesis.
Sample distribution refers to the distribution of a particular characteristic or variable among the individuals or units selected from a population.
Sampling distribution refers to the distribution of a statistic (such as the mean, standard deviation, etc.) calculated from multiple random samples of the same size drawn from a population.
Do not confuse the sampling distribution with the sample distribution. The sampling distribution considers the distribution of sample statistics (e.g. mean), whereas the sample distribution is basically the distribution of the sample taken from the population
The goal of doing one-sample proportion test:
Note: the variable should only have two response options
In the command,
x should be the number of individuals in one of the response options
n should be the total number of observation
Y should be a proportion (between 0 and 1). It is the hypothesized population proportion in the null hypothesis.
XXX can be two.sided, less, or greater
The default is two.sided
if you don’t type the option for alternative, R would assume you want two-sided p-value (by default, we use two-sided p-value in this class)
For example, how can we test if the population proportion of Snapchat users equals to 0.5 (50%) or not?
\[H_0:p=0.5\]
The population proportion of Snapchat users is 0.5
\[H_a:p≠0.5\]
The population proportion of Snapchat users is not 0.5
First, we need to know the number of people who use Snapchat (x in the command) and the total number of people in the sample (n in the command)
Let’s find the number of people, who use SNAPCHAT:
## snapchat
##   NO  YES 
## 1037  308
Then, we can type to do proportion test:
## 
##  1-sample proportions test with continuity correction
## 
## data:  308 out of 1345, null probability 0.5
## X-squared = 394.04, df = 1, p-value < 2.2e-16
## alternative hypothesis: true p is not equal to 0.5
## 95 percent confidence interval:
##  0.2069730 0.2525886
## sample estimates:
##         p 
## 0.2289963
p-value: 2.2e-16
95%CI: 0.2069, 0.2525
prop. of SNAPCHAT users: 0.2289
Other than table() function, you can specify the number of observations in this way:
## [1] 308
This function counts the number of observations in a variable X based on a condition
If the variable X contains characters, then sum(X==“HELLO”) gives you the number of observations in variable X with characters “HELLO”
If the variable X contains numbers, then sum(X==i) gives you the number of observations in variable X with the value of i
## [1] 308
## snapchat
##   NO  YES 
## 1037  308
How to get the total number of observations?
## [1] 1345
length() function counts the total number of observations in a list of numbers (variable)
Then, we can combine the sum() and length() functions together:
## 
##  1-sample proportions test with continuity correction
## 
## data:  sum(snapchat == "YES") out of length(snapchat), null probability 0.5
## X-squared = 394.04, df = 1, p-value < 2.2e-16
## alternative hypothesis: true p is not equal to 0.5
## 95 percent confidence interval:
##  0.2069730 0.2525886
## sample estimates:
##         p 
## 0.2289963
sum(snapchat=="YES") is the the number of observations
in the variable SNAPCHAT with the answer of YES
length(snapchat) is the total number of observations
Is the population proportion of Facebook users equals to 0.7 (70%) or not?
## 
##  1-sample proportions test with continuity correction
## 
## data:  sum(facebook == "YES") out of length(facebook), null probability 0.7
## X-squared = 14.052, df = 1, p-value = 0.0001778
## alternative hypothesis: true p is not equal to 0.7
## 95 percent confidence interval:
##  0.7229217 0.7700715
## sample estimates:
##         p 
## 0.7472119
The goal of doing two-sample proportion test:
to see if the population proportion of a variable equals to the population proportion of another variable, given the randomly sampled data
This is equivalent to: see if the population proportion of Group A equals to the population proportion of Group B, given the randomly sampled data
Note: both groups should have two categories
where
a should be the number of a response category in the first group
b should be the number of a response category in the second group
X should be the total number of people in the first group
Y should be the total number of people in the second group
For example, how can we test if the population proportions of Snapchat users among Facebook users vs. non-users are equal?
\[H_0:p_a=p_b\]
The population proportion of Snapchat users among Facebook user equals to the population proportion of Snapchat users among people who don’t use Facebook
\[H_a:p_a≠p_b\]
The population proportion of Snapchat users among Facebook user does not equal to the population proportion of Snapchat users among people who don’t use Facebook
First, we need to know the number of people who use Snapchat in both Facebook user group and Facebook non-user group (a and b in the command)
Let’s get the number of people who use Snapchat, given that they are Facebook users:
## 
##  NO YES 
## 728 277
Let’s get the number of people who use Snapchat, given that they are not Facebook users
## 
##  NO YES 
## 309  31
## [1] 1005
## [1] 340
277 The number of people who
use Snapchat, given that they are Facebook users
31 The number of people who
use Snapchat, given that they are not Facebook users
1005 The total number of
people in the Facebook user group
340 The total number of people
in the Facebook non-user group
## 
##  2-sample test for equality of proportions with continuity correction
## 
## data:  c(277, 31) out of c(1005, 340)
## X-squared = 47.913, df = 1, p-value = 4.455e-12
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  0.1412539 0.2276370
## sample estimates:
##     prop 1     prop 2 
## 0.27562189 0.09117647
It is too complicated to specify, so you don’t need to use this method for the two-sample proportion test:
prop.test(c(sum(snapchat[facebook=="YES"]=="YES"),
            sum(snapchat[facebook=="NO"]=="YES")),
          c(length(snapchat[facebook=="YES"]),
            length(snapchat[facebook=="NO"])))## 
##  2-sample test for equality of proportions with continuity correction
## 
## data:  c(sum(snapchat[facebook == "YES"] == "YES"), sum(snapchat[facebook == "NO"] == "YES")) out of c(length(snapchat[facebook == "YES"]), length(snapchat[facebook == "NO"]))
## X-squared = 47.913, df = 1, p-value = 4.455e-12
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  0.1412539 0.2276370
## sample estimates:
##     prop 1     prop 2 
## 0.27562189 0.09117647
There are two parts:
Part 1: Q1-Q4
Part 2: Q5-Q9
The dataset we are going to work with today looks like this:
| VTCONF_COM_W38 | F_RACETHN | F_IDEO | 
|---|---|---|
| Very confident | White | Liberal | 
| Not too confident | White | Moderate | 
| Somewhat confident | White | Very conservative | 
| Very confident | White | Moderate | 
| Somewhat confident | White | Liberal | 
| Somewhat confident | Other | Liberal | 
Part 2, Question 5:
Check out the subsetting technique on Lab slides last week
Hint for part a: • You should create a new R object (a vector) called conf_white
• The line of code should looks like this: conf_white <- CONFVAR[RACE==“White”]
In a similar fashion, creat vectors conf_black and
conf_hispanic
Part 2, Question 5:
Part d: use the table() command to find out the number
of observations in each response option
• I have introduced the table() function in Lab 4
• Check out the details of this function in Lab 4 slides
Part 2, Question 6:
• prop.test() function: test for the proportion of people who answered “Very Confident”
First, you need to find the number of people who answered “Very Confident”:
Second, you need to find the total number of observations:
After that, use this code to write do proportion test by plug in values you’ve got in an above code or simply write:
Part 2, Question 7:
You will be expect to write something like this:
“We are 95% confident that true parameter will be within XXX and XXX values.”
Part 2, Question 8:
• prop.test() function
• You need to do two-sample proportion tests
• Each time, you need to compare two of the following vectors: conf_white, conf_black, conf_Hispanic
• conf_white vs. conf_black
• conf_white vs. conf_hispanic
• conf_black vs. conf_hispanic