S371: Lab 3

Lab Instructor: Katya Baldina ()

2023-09-06

Announcement

HW1 is due Sep.7 (TOMORROW)!

Lab Notes and R Codes are now here: https://katyalex.github.io/

You can also make an appointment with me here: https://calendar.app.google/SDGU3k2BU7jmmZsy6

Announcement

Your first test is on Sep.20 1PM - Sep.21 11.59PM.

There will be a lab review session on Sep.20 11.30AM-12.45PM AT BALLANTINE HALL 346

NO CLASS WITH PROF. SCHULTZ, BUT A LAB REVIEW SESSION WITH ME

R object (continue)

R object: it is the thing that you can store in your current R working environment (note: all R objects will be gone once you close RStudio)

You can store almost anything in R as an object:

• A number

• Characters

• A vector (a list of numbers/a list of characters)

• Logicals (TRUE/FALSE)

• Data frame (kind of like an Excel spreadsheet with numbers stored in columns and rows)

• and many others…

Vector

Vector = Variable

You can create a list of numbers (vector/variable) into an R object

For example, we want to store the list of students’ test scores together in an R object called “testscores”:

testscores = c(25, 78, 56, 95)
testscores
## [1] 25 78 56 95

The c() function combines the list of elements within the parentheses You can store different kinds of data using c() function (numbers, characters, etc.)

Vector

You can see that a new object (testscores) shows up in the environment window

List of number: calculations

Now you can do any calculatin with it. For example:

testscores*2
## [1]  50 156 112 190

You can multiply it with another vector:

attendance = c(0.8, 1, 0.9, 0.4)
testscores*attendance
## [1] 20.0 78.0 50.4 38.0

R multiplies the first number in the testscores with the first number in the attendance, the second number in testscores with the second number in the attendance, and etc…

quantile() function

Using quantile() function you can obtain any specific quantile of an R object.

What is quantile?

a quantile is nothing but a sample that is divided into equal groups or sizes. Due to this nature, the quantiles are also called as Fractiles. In the quantiles, the 25th percentile is called as lower quartile, 50th percentile is called as Median and the 75th Percentile is called as the upper quartile. (Source)

quantile() function

Using quantile(vector,quantile), you first must input name of the vector, which quantile you want to know, and then quantiles you want to know.

For quantile you can specify any value between 0 and 1:

quantile() function

Let’s look at the example:

listnum <- c(25, 78, 56, 95, 44, 58, 67, 36, 10, 75)
quantile(listnum, 0.44)
##   44% 
## 55.52

If you do not specify quantile, R will give you the minimum, 1st quartile, median, 3rd quartile, and the maximum by default

quantile(listnum)
##   0%  25%  50%  75% 100% 
##   10   38   57   73   95

If you specify quantile as 0.5, you will get the median of listnum:

quantile(listnum, 0.5)
## 50% 
##  57

You can also specify quantile as a list of numbers:

quantile(listnum, c(0,1))
##   0% 100% 
##   10   95

Note: whenever you need to specify multiple numbers simultaneously within a function, you need to use the c() function to do it

summary() function

You can also obtain summary of the variable by using summary() function.

For example:

summary(listnum)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    10.0    38.0    57.0    54.4    73.0    95.0

You can see, that the result is almost identical with the quantile() function.

quantile(listnum)
##   0%  25%  50%  75% 100% 
##   10   38   57   73   95

HW1 Guide (Due Sep. 7)

HW1 Guide (Due Sep. 7)

HW consists of two parts:

Part 1: Q1-5

Part 2 (R part): Q6

We will walk through the Part 2, but feel free to ask me about Part 1 after the lab.

HW1

load("medalsHW1.Rdata")

The dataset medalsHW1.Rda has four variables:

Structure of the script

For this HW you need to fill out the blanks and write some lines of code yourself. First, read the directions very carefully.

######################################################
# YOUR NAME HERE
# SOC-S371
# Fall 2023
#
# Homework 1
######################################################

## READ THIS FIRST:
## Use this skeleton to complete the R portion of homework 1. Use the code
## from lecture as a guide.
## Read the comments and run the code line by line. Most lines are complete.
## Others need to have parts filled in.
##  - Replace "[ADD VARIABLE HERE]" with the appropriate variable name.
##  - Replace "## ADD CODE HERE ##" with the appropriate command(s).
## The help files for the relevant commands are included below.

Comments before code give you a hint of what the next line does:

# Print the length of the Gold medal data
length(Gold)

ls() function

Line 21: ls() function: This function lists out all the R objects in your environment window

For example:

ls()
## [1] "attendance" "Bronze"     "Gold"       "listnum"    "Silver"    
## [6] "testscores" "Total"

length() function

If you put down an R object which is a vector within the parentheses, this function outputs the number of observations in the vector.

For example:

testscores = c(25, 78, 56, 95)
length(testscores)
## [1] 4

This function counts both valid and invalid observations(i.e., missing cases)

If you put down an R object which is a data frame within the parentheses, this function gives you the number of variables in the data frame.

HW1

In the script file (HW 1.R), you need to change the command on line 37, 42, 46, and 51

hist([ADD VARIABLE HERE], breaks=c(0,5,10,15,20,25,30,35,40),right=FALSE)
mean([ADD VARIABLE HERE])
sd([ADD VARIABLE HERE])
quantile([ADD VARIABLE HERE],c(0,.25,.5,.75,1))

Replace [ADD VARIABLE HERE] with the appropriate variable name.

For example, if I need to construct histogram for the variable Gold, I will write:

hist(Gold, breaks=c(0,5,10,15,20,25,30,35,40),right=FALSE)

Read the instruction carefully and replace [ADD VARIABLE HERE] with the appropriate variable name on line 34 (hint: not Gold)

Histogram hist()

In R, histogram (and other graphs) would be displayed in the Plots window (default position: lower right-hand corner)

You can display the graph in a separate window by clicking Zoom:

You can right-click the zoomed image and save it to your word document. You can also click “Export” button to do the same.

mean() and sd() functions

You need to put down a variable within the parentheses.

mean(Gold)
## [1] 3.433333
sd(Gold)
## [1] 4.141325

quantile() - refer to the previous slides!

HW1

R script

Remember to modify the commands in the R script file (HW 1.R), not do it directly in the R console.

At the end of the HW1 (Question 6 part e), you need to copy and paste the whole R script file with valid commands inside!

Don’t copy the R code and output from the console window

Just copy the content in the R script window

Now let’s play with some data

#make this example reproducible
set.seed(1)
n=300
#This command simulated data with the sample size n=300 with mean 50 and standard deviation of 10;
#distribution is normal.
mydata = rnorm(n, mean=50, sd=10)  
#view first 6 observations in sample
head(mydata)
## [1] 43.73546 51.83643 41.64371 65.95281 53.29508 41.79532
#As we asked R to do it, we have 300 observations in our data:
length(mydata)
## [1] 300
#And we have mean of 50!
mean(mydata)
## [1] 50.33584
#find standard deviation of sample
sd(mydata)
## [1] 9.636959
# Calculate the 5 number quantile summary of the variable
quantile(mydata,c(0,.25,.5,.75,1))
##       0%      25%      50%      75%     100% 
## 21.11079 44.10949 49.61563 56.72734 76.49167

Let’s visualize the data!

hist(mydata, breaks=10,right=FALSE)

?hist
hist(mydata, breaks=10,right=FALSE, xlab = "Test Scores", col = "yellow", border = "green")