S371: Lab 10

Lab Instructor: Katya Baldina ()

2023-10-25

Announcement

HW6 IS DUE SUNDAY (OCT.29) 11.59 PM!

Test II: Reflection

We do not accept null hypothesis

We reject or fail to reject the null hypothesis. We have weak or strong support for null hypothesis.

Test II: Reflection

Sample distribution refers to the distribution of a particular characteristic or variable among the individuals or units selected from a population.

Sampling distribution refers to the distribution of a statistic (such as the mean, standard deviation, etc.) calculated from multiple random samples of the same size drawn from a population.

Do not confuse the sampling distribution with the sample distribution. The sampling distribution considers the distribution of sample statistics (e.g. mean), whereas the sample distribution is basically the distribution of the sample taken from the population

One-sample proportion test

The goal of doing one-sample proportion test:

Note: the variable should only have two response options

One-sample proportion test

prop.test(x,n,p=Y, alternative=“XXX”)

In the command,

One-sample proportion test: example

For example, how can we test if the population proportion of Snapchat users equals to 0.5 (50%) or not?

\[H_0:p=0.5\]

The population proportion of Snapchat users is 0.5

\[H_a:p≠0.5\]

The population proportion of Snapchat users is not 0.5

One-sample proportion test: example

First, we need to know the number of people who use Snapchat (x in the command) and the total number of people in the sample (n in the command)

Let’s find the number of people, who use SNAPCHAT:

snapchat <- socmedia$SNAPCHAT
table(snapchat)
## snapchat
##   NO  YES 
## 1037  308

Then, we can type to do proportion test:

prop.test(308,1345)
## 
##  1-sample proportions test with continuity correction
## 
## data:  308 out of 1345, null probability 0.5
## X-squared = 394.04, df = 1, p-value < 2.2e-16
## alternative hypothesis: true p is not equal to 0.5
## 95 percent confidence interval:
##  0.2069730 0.2525886
## sample estimates:
##         p 
## 0.2289963

p-value: 2.2e-16

95%CI: 0.2069, 0.2525

prop. of SNAPCHAT users: 0.2289

One-sample proportion test: advanced way

Other than table() function, you can specify the number of observations in this way:

sum(snapchat=="YES")
## [1] 308

One-sample proportion test: advanced way

sum(snapchat=="YES")
## [1] 308
table(snapchat)
## snapchat
##   NO  YES 
## 1037  308

One-sample proportion test: advanced way

How to get the total number of observations?

length(snapchat)
## [1] 1345

length() function counts the total number of observations in a list of numbers (variable)

One-sample proportion test: advanced way

Then, we can combine the sum() and length() functions together:

prop.test(sum(snapchat=="YES"), length(snapchat))
## 
##  1-sample proportions test with continuity correction
## 
## data:  sum(snapchat == "YES") out of length(snapchat), null probability 0.5
## X-squared = 394.04, df = 1, p-value < 2.2e-16
## alternative hypothesis: true p is not equal to 0.5
## 95 percent confidence interval:
##  0.2069730 0.2525886
## sample estimates:
##         p 
## 0.2289963

sum(snapchat=="YES") is the the number of observations in the variable SNAPCHAT with the answer of YES

length(snapchat) is the total number of observations

Practice

Is the population proportion of Facebook users equals to 0.7 (70%) or not?

facebook <-socmedia$FACEBOOK
prop.test(sum(facebook=="YES"), length(facebook), p = 0.7)
## 
##  1-sample proportions test with continuity correction
## 
## data:  sum(facebook == "YES") out of length(facebook), null probability 0.7
## X-squared = 14.052, df = 1, p-value = 0.0001778
## alternative hypothesis: true p is not equal to 0.7
## 95 percent confidence interval:
##  0.7229217 0.7700715
## sample estimates:
##         p 
## 0.7472119

Two-sample proportion test

The goal of doing two-sample proportion test:

Note: both groups should have two categories

Two-sample proportion test

prop.test(c(a, b), c(X,Y))

where

a should be the number of a response category in the first group

b should be the number of a response category in the second group

X should be the total number of people in the first group

Y should be the total number of people in the second group

Two-sample proportion test: example

For example, how can we test if the population proportions of Snapchat users among Facebook users vs. non-users are equal?

\[H_0:p_a=p_b\]

The population proportion of Snapchat users among Facebook user equals to the population proportion of Snapchat users among people who don’t use Facebook

\[H_a:p_a≠p_b\]

The population proportion of Snapchat users among Facebook user does not equal to the population proportion of Snapchat users among people who don’t use Facebook

Two-sample proportion test: example

First, we need to know the number of people who use Snapchat in both Facebook user group and Facebook non-user group (a and b in the command)

Let’s get the number of people who use Snapchat, given that they are Facebook users:

table(snapchat[facebook=="YES"])
## 
##  NO YES 
## 728 277

Let’s get the number of people who use Snapchat, given that they are not Facebook users

table(snapchat[facebook=="NO"])
## 
##  NO YES 
## 309  31

Two-sample proportion test: example

length(snapchat[facebook=="YES"])
## [1] 1005
length(snapchat[facebook=="NO"])
## [1] 340

277 The number of people who use Snapchat, given that they are Facebook users

31 The number of people who use Snapchat, given that they are not Facebook users

1005 The total number of people in the Facebook user group

340 The total number of people in the Facebook non-user group

prop.test(c(277,31),c(1005,340))
## 
##  2-sample test for equality of proportions with continuity correction
## 
## data:  c(277, 31) out of c(1005, 340)
## X-squared = 47.913, df = 1, p-value = 4.455e-12
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  0.1412539 0.2276370
## sample estimates:
##     prop 1     prop 2 
## 0.27562189 0.09117647

Two-sample proportion test: advanced way

It is too complicated to specify, so you don’t need to use this method for the two-sample proportion test:

prop.test(c(sum(snapchat[facebook=="YES"]=="YES"),
            sum(snapchat[facebook=="NO"]=="YES")),
          c(length(snapchat[facebook=="YES"]),
            length(snapchat[facebook=="NO"])))
## 
##  2-sample test for equality of proportions with continuity correction
## 
## data:  c(sum(snapchat[facebook == "YES"] == "YES"), sum(snapchat[facebook == "NO"] == "YES")) out of c(length(snapchat[facebook == "YES"]), length(snapchat[facebook == "NO"]))
## X-squared = 47.913, df = 1, p-value = 4.455e-12
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  0.1412539 0.2276370
## sample estimates:
##     prop 1     prop 2 
## 0.27562189 0.09117647

HW6 Guide

There are two parts:

HW6 Guide

The dataset we are going to work with today looks like this:

load('pewdataw38.Rdata')
VTCONF_COM_W38 F_RACETHN F_IDEO
Very confident White Liberal
Not too confident White Moderate
Somewhat confident White Very conservative
Very confident White Moderate
Somewhat confident White Liberal
Somewhat confident Other Liberal

HW6 Guide

Part 2, Question 5:

Check out the subsetting technique on Lab slides last week

new_vector <- datasetname$variable_of_interest[datasetname$subsetting_var==value]

Hint for part a: • You should create a new R object (a vector) called conf_white

• The line of code should looks like this: conf_white <- CONFVAR[RACE==“White”]

conf_white <- pewdata$VTCONF_COM_W38[pewdata$F_RACETHN=='White']

In a similar fashion, creat vectors conf_black and conf_hispanic

HW6 Guide

Part 2, Question 5:

Part d: use the table() command to find out the number of observations in each response option

• I have introduced the table() function in Lab 4

• Check out the details of this function in Lab 4 slides

HW6 Guide

Part 2, Question 6:

• prop.test() function: test for the proportion of people who answered “Very Confident”

First, you need to find the number of people who answered “Very Confident”:

sum(vector=="value")

Second, you need to find the total number of observations:

length(vector)

After that, use this code to write do proportion test by plug in values you’ve got in an above code or simply write:

prop.test(sum(vector=="value"), length(vector))

HW6 Guide

Part 2, Question 7:

You will be expect to write something like this:

“We are 95% confident that true parameter will be within XXX and XXX values.”

HW6 Guide

Part 2, Question 8:

• prop.test() function

prop.test(c(a, b)), c(total_a, total_b))

• You need to do two-sample proportion tests

• Each time, you need to compare two of the following vectors: conf_white, conf_black, conf_Hispanic

• conf_white vs. conf_black

• conf_white vs. conf_hispanic

• conf_black vs. conf_hispanic