HW1 is due Sep.7 (TOMORROW)!
Lab Notes and R Codes are now here: https://katyalex.github.io/
You can also make an appointment with me here: https://calendar.app.google/SDGU3k2BU7jmmZsy6
Your first test is on Sep.20 1PM - Sep.21 11.59PM.
There will be a lab review session on Sep.20 11.30AM-12.45PM AT BALLANTINE HALL 346
NO CLASS WITH PROF. SCHULTZ, BUT A LAB REVIEW SESSION WITH ME
R object: it is the thing that you can store in your current R working environment (note: all R objects will be gone once you close RStudio)
You can store almost anything in R as an object:
• A number
• Characters
• A vector (a list of numbers/a list of characters)
• Logicals (TRUE/FALSE)
• Data frame (kind of like an Excel spreadsheet with numbers stored in columns and rows)
• and many others…
Vector = Variable
You can create a list of numbers (vector/variable) into an R object
For example, we want to store the list of students’ test scores together in an R object called “testscores”:
## [1] 25 78 56 95
The c() function combines the list of elements within
the parentheses You can store different kinds of data using
c() function (numbers, characters, etc.)
You can see that a new object (testscores) shows up in the
environment window
Now you can do any calculatin with it. For example:
## [1] 50 156 112 190
You can multiply it with another vector:
## [1] 20.0 78.0 50.4 38.0
R multiplies the first number in the testscores with the first number in the attendance, the second number in testscores with the second number in the attendance, and etc…
quantile() functionUsing quantile() function you can obtain any specific
quantile of an R object.
What is quantile?
a quantile is nothing but a sample that is divided into equal groups or sizes. Due to this nature, the quantiles are also called as Fractiles. In the quantiles, the 25th percentile is called as lower quartile, 50th percentile is called as Median and the 75th Percentile is called as the upper quartile. (Source)
quantile() functionUsing quantile(vector,quantile), you first must input
name of the vector, which quantile you want to know, and then quantiles
you want to know.
For quantile you can specify any value between 0 and 1:
0 is the 0th percentile (aka minimum)
1 is the 100th percentile (aka maximum)
0.5 is the 50th percentile (aka median)
0.25 is the 25th percentile (aka 1st quartile)
0.75 is the 75th percentile (aka 3rd quartile)
quantile() functionLet’s look at the example:
## 44%
## 55.52
If you do not specify quantile, R will give you the minimum, 1st quartile, median, 3rd quartile, and the maximum by default
## 0% 25% 50% 75% 100%
## 10 38 57 73 95
If you specify quantile as 0.5, you will get the median of listnum:
## 50%
## 57
You can also specify quantile as a list of numbers:
## 0% 100%
## 10 95
Note: whenever you need to specify multiple numbers
simultaneously within a function, you need to use the c()
function to do it
summary() functionYou can also obtain summary of the variable by using
summary() function.
For example:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 10.0 38.0 57.0 54.4 73.0 95.0
You can see, that the result is almost identical with the
quantile() function.
## 0% 25% 50% 75% 100%
## 10 38 57 73 95
HW consists of two parts:
Part 1: Q1-5
Part 2 (R part): Q6
We will walk through the Part 2, but feel free to ask me about Part 1 after the lab.
For this HW you need to fill out the blanks and write some lines of code yourself. First, read the directions very carefully.
######################################################
# YOUR NAME HERE
# SOC-S371
# Fall 2023
#
# Homework 1
######################################################
## READ THIS FIRST:
## Use this skeleton to complete the R portion of homework 1. Use the code
## from lecture as a guide.
## Read the comments and run the code line by line. Most lines are complete.
## Others need to have parts filled in.
## - Replace "[ADD VARIABLE HERE]" with the appropriate variable name.
## - Replace "## ADD CODE HERE ##" with the appropriate command(s).
## The help files for the relevant commands are included below.Comments before code give you a hint of what the next line does:
ls() functionLine 21: ls() function: This function lists out all the
R objects in your environment window
For example:
## [1] "attendance" "Bronze" "Gold" "listnum" "Silver"
## [6] "testscores" "Total"
length() functionIf you put down an R object which is a vector within the parentheses, this function outputs the number of observations in the vector.
For example:
## [1] 4
This function counts both valid and invalid observations(i.e., missing cases)
If you put down an R object which is a data frame within the parentheses, this function gives you the number of variables in the data frame.
print() functionIf you put down an R object which is a vector within the parentheses, this function outputs the actual observations stored inside the vector
## [1] 25 78 56 95
If you put down a number within the parentheses, this function outputs that number:
## [1] 4
If you put down characters with quotes or double quotes within the parentheses, this function outputs the characters:
## [1] "Hello World!"
In the script file (HW 1.R), you need to change the command on line 37, 42, 46, and 51
hist([ADD VARIABLE HERE], breaks=c(0,5,10,15,20,25,30,35,40),right=FALSE)
mean([ADD VARIABLE HERE])
sd([ADD VARIABLE HERE])
quantile([ADD VARIABLE HERE],c(0,.25,.5,.75,1))Replace [ADD VARIABLE HERE] with the appropriate variable name.
For example, if I need to construct histogram for the variable Gold, I will write:
Read the instruction carefully and replace [ADD VARIABLE HERE] with the appropriate variable name on line 34 (hint: not Gold)
hist()In R, histogram (and other graphs) would be displayed in the Plots window (default position: lower right-hand corner)
You can display the graph in a separate window by clicking Zoom:
You can right-click the zoomed image and save it to your word document. You can also click “Export” button to do the same.
mean() and sd() functionsYou need to put down a variable within the parentheses.
Other kinds of R objects are not allowed
This function outputs the mean of the variable:
## [1] 3.433333
## [1] 4.141325
quantile() - refer to the previous slides!
In the script file (HW 1.R), you also need to type three entire lines of code by yourself.
Delete ## ADD CODE HERE ## with the appropriate R commands
Remember to modify the commands in the R script file (HW 1.R), not do it directly in the R console.
At the end of the HW1 (Question 6 part e), you need to copy and paste the whole R script file with valid commands inside!
Don’t copy the R code and output from the console window
Just copy the content in the R script window
#make this example reproducible
set.seed(1)
n=300
#This command simulated data with the sample size n=300 with mean 50 and standard deviation of 10;
#distribution is normal.
mydata = rnorm(n, mean=50, sd=10)
#view first 6 observations in sample
head(mydata)## [1] 43.73546 51.83643 41.64371 65.95281 53.29508 41.79532
## [1] 300
## [1] 50.33584
## [1] 9.636959
## 0% 25% 50% 75% 100%
## 21.11079 44.10949 49.61563 56.72734 76.49167