Test II: Reflection

Sample distribution refers to the distribution of a particular characteristic or variable among the individuals or units selected from a population.

Sampling distribution refers to the distribution of a statistic (such as the mean, standard deviation, etc.) calculated from multiple random samples of the same size drawn from a population.

Do not confuse the sampling distribution with the sample distribution. The sampling distribution considers the distribution of sample statistics (e.g. mean), whereas the sample distribution is basically the distribution of the sample taken from the population

One-sample proportion test

The goal of doing one-sample proportion test:

to see if the population proportion of a variable equals to a specified value, given the randomly sampled data

Note: the variable should only have two response options

One-sample proportion test

prop.test(x,n,p=Y, alternative=“XXX”)

In the command,

x should be the number of individuals in one of the response options
n should be the total number of observation
Y should be a proportion (between 0 and 1). It is the hypothesized population proportion in the null hypothesis.
- The default is 0.5 if you don’t inpu th p=Y option, R would assume you want to test against 0.5
XXX can be two.sided, less, or greater
- The default is two.sided
- if you don’t type the option for alternative, R would assume you want two-sided p-value (by default, we use two-sided p-value in this class)

One-sample proportion test: example

For example, how can we test if the population proportion of Snapchat users equals to 0.5 (50%) or not?

\[H_0:p=0.5\]

The population proportion of Snapchat users is 0.5

\[H_a:p≠0.5\]

The population proportion of Snapchat users is not 0.5

One-sample proportion test: example

First, we need to know the number of people who use Snapchat (x in the command) and the total number of people in the sample (n in the command)

Let’s find the number of people, who use SNAPCHAT:

snapchat <- socmedia$SNAPCHAT
table(snapchat)

## snapchat
##   NO  YES 
## 1037  308

Then, we can type to do proportion test:

prop.test(308,1345)

## 
##  1-sample proportions test with continuity correction
## 
## data:  308 out of 1345, null probability 0.5
## X-squared = 394.04, df = 1, p-value < 2.2e-16
## alternative hypothesis: true p is not equal to 0.5
## 95 percent confidence interval:
##  0.2069730 0.2525886
## sample estimates:
##         p 
## 0.2289963

p-value: 2.2e-16

95%CI: 0.2069, 0.2525

prop. of SNAPCHAT users: 0.2289

One-sample proportion test: advanced way

Other than table() function, you can specify the number of observations in this way:

sum(snapchat=="YES")

## [1] 308

This function counts the number of observations in a variable X based on a condition
If the variable X contains characters, then sum(X==“HELLO”) gives you the number of observations in variable X with characters “HELLO”
If the variable X contains numbers, then sum(X==i) gives you the number of observations in variable X with the value of i

One-sample proportion test: advanced way

sum(snapchat=="YES")

## [1] 308

table(snapchat)

## snapchat
##   NO  YES 
## 1037  308

One-sample proportion test: advanced way

How to get the total number of observations?

length(snapchat)

## [1] 1345

length() function counts the total number of observations in a list of numbers (variable)

One-sample proportion test: advanced way

Then, we can combine the sum() and length() functions together:

prop.test(sum(snapchat=="YES"), length(snapchat))

## 
##  1-sample proportions test with continuity correction
## 
## data:  sum(snapchat == "YES") out of length(snapchat), null probability 0.5
## X-squared = 394.04, df = 1, p-value < 2.2e-16
## alternative hypothesis: true p is not equal to 0.5
## 95 percent confidence interval:
##  0.2069730 0.2525886
## sample estimates:
##         p 
## 0.2289963

sum(snapchat=="YES") is the the number of observations in the variable SNAPCHAT with the answer of YES

length(snapchat) is the total number of observations

Practice

Is the population proportion of Facebook users equals to 0.7 (70%) or not?

facebook <-socmedia$FACEBOOK
prop.test(sum(facebook=="YES"), length(facebook), p = 0.7)

## 
##  1-sample proportions test with continuity correction
## 
## data:  sum(facebook == "YES") out of length(facebook), null probability 0.7
## X-squared = 14.052, df = 1, p-value = 0.0001778
## alternative hypothesis: true p is not equal to 0.7
## 95 percent confidence interval:
##  0.7229217 0.7700715
## sample estimates:
##         p 
## 0.7472119

Two-sample proportion test

The goal of doing two-sample proportion test:

to see if the population proportion of a variable equals to the population proportion of another variable, given the randomly sampled data
This is equivalent to: see if the population proportion of Group A equals to the population proportion of Group B, given the randomly sampled data

Note: both groups should have two categories

Two-sample proportion test

prop.test(c(a, b), c(X,Y))

where

a should be the number of a response category in the first group

b should be the number of a response category in the second group

X should be the total number of people in the first group

Y should be the total number of people in the second group

Two-sample proportion test: example

For example, how can we test if the population proportions of Snapchat users among Facebook users vs. non-users are equal?

\[H_0:p_a=p_b\]

The population proportion of Snapchat users among Facebook user equals to the population proportion of Snapchat users among people who don’t use Facebook

\[H_a:p_a≠p_b\]

The population proportion of Snapchat users among Facebook user does not equal to the population proportion of Snapchat users among people who don’t use Facebook

Two-sample proportion test: example

First, we need to know the number of people who use Snapchat in both Facebook user group and Facebook non-user group (a and b in the command)

Let’s get the number of people who use Snapchat, given that they are Facebook users:

table(snapchat[facebook=="YES"])

## 
##  NO YES 
## 728 277

Let’s get the number of people who use Snapchat, given that they are not Facebook users

table(snapchat[facebook=="NO"])

## 
##  NO YES 
## 309  31

Two-sample proportion test: example

length(snapchat[facebook=="YES"])

## [1] 1005

length(snapchat[facebook=="NO"])

## [1] 340

277 The number of people who use Snapchat, given that they are Facebook users

31 The number of people who use Snapchat, given that they are not Facebook users

1005 The total number of people in the Facebook user group

340 The total number of people in the Facebook non-user group

prop.test(c(277,31),c(1005,340))

## 
##  2-sample test for equality of proportions with continuity correction
## 
## data:  c(277, 31) out of c(1005, 340)
## X-squared = 47.913, df = 1, p-value = 4.455e-12
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  0.1412539 0.2276370
## sample estimates:
##     prop 1     prop 2 
## 0.27562189 0.09117647

Two-sample proportion test: advanced way

It is too complicated to specify, so you don’t need to use this method for the two-sample proportion test:

prop.test(c(sum(snapchat[facebook=="YES"]=="YES"),
            sum(snapchat[facebook=="NO"]=="YES")),
          c(length(snapchat[facebook=="YES"]),
            length(snapchat[facebook=="NO"])))

## 
##  2-sample test for equality of proportions with continuity correction
## 
## data:  c(sum(snapchat[facebook == "YES"] == "YES"), sum(snapchat[facebook == "NO"] == "YES")) out of c(length(snapchat[facebook == "YES"]), length(snapchat[facebook == "NO"]))
## X-squared = 47.913, df = 1, p-value = 4.455e-12
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  0.1412539 0.2276370
## sample estimates:
##     prop 1     prop 2 
## 0.27562189 0.09117647

HW6 Guide

There are two parts:

Part 1: Q1-Q4
Part 2: Q5-Q9

HW6 Guide

The dataset we are going to work with today looks like this:

load('pewdataw38.Rdata')

VTCONF_COM_W38	F_RACETHN	F_IDEO
Very confident	White	Liberal
Not too confident	White	Moderate
Somewhat confident	White	Very conservative
Very confident	White	Moderate
Somewhat confident	White	Liberal
Somewhat confident	Other	Liberal

HW6 Guide

Part 2, Question 5:

Check out the subsetting technique on Lab slides last week

new_vector <- datasetname$variable_of_interest[datasetname$subsetting_var==value]

Hint for part a: • You should create a new R object (a vector) called conf_white

• The line of code should looks like this: conf_white <- CONFVAR[RACE==“White”]

conf_white <- pewdata$VTCONF_COM_W38[pewdata$F_RACETHN=='White']

In a similar fashion, creat vectors conf_black and conf_hispanic

HW6 Guide

Part 2, Question 5:

Part d: use the table() command to find out the number of observations in each response option

• I have introduced the table() function in Lab 4

• Check out the details of this function in Lab 4 slides

HW6 Guide

Part 2, Question 6:

• prop.test() function: test for the proportion of people who answered “Very Confident”

First, you need to find the number of people who answered “Very Confident”:

sum(vector=="value")

Second, you need to find the total number of observations:

length(vector)

After that, use this code to write do proportion test by plug in values you’ve got in an above code or simply write:

prop.test(sum(vector=="value"), length(vector))

HW6 Guide

Part 2, Question 7:

You will be expect to write something like this:

“We are 95% confident that true parameter will be within XXX and XXX values.”

HW6 Guide

Part 2, Question 8:

• prop.test() function

prop.test(c(a, b)), c(total_a, total_b))

• You need to do two-sample proportion tests

• Each time, you need to compare two of the following vectors: conf_white, conf_black, conf_Hispanic

• conf_white vs. conf_black

• conf_white vs. conf_hispanic

• conf_black vs. conf_hispanic

S371: Lab 10

Announcement

Test II: Reflection

Test II: Reflection

One-sample proportion test

One-sample proportion test

One-sample proportion test: example

One-sample proportion test: example

One-sample proportion test: advanced way

One-sample proportion test: advanced way

One-sample proportion test: advanced way

One-sample proportion test: advanced way

Practice

Two-sample proportion test

Two-sample proportion test

Two-sample proportion test: example

Two-sample proportion test: example

Two-sample proportion test: example

Two-sample proportion test: advanced way

HW6 Guide

HW6 Guide

HW6 Guide

HW6 Guide

HW6 Guide

HW6 Guide

HW6 Guide