S371: Lab 11

Lab Instructor: Katya Baldina ()

2023-11-01

Announcement

HW7 IS DUE THURSDAY (Nov. 2) 11.59 PM!

Plots in R

Interchangeable terms:

Plots in R

Univariate graphs:

Histogram

hist(X)

You can add many options

breaks : specify the range and breaks of x-axis (or the number of breaks)

xlim : specify the range of x-axis

main : specify the main title

ylab : specify the y-axis label

xlab : specify the x-axis label

col : specify the color of the bar

Histogram

hist(as.numeric(insta$Followers), main = "Distribution of number of followers", 
     xlab = "Number of followers", 
     col = 'lightblue')

Box plot

boxplot(as.numeric(insta$Followers),
        xlab = "Distribution of number of followers", 
        main="Box plot", 
        col = "pink")

Bar chart

To make a bar chart, you should construct a one- way table first

For example, make a one-way table for the variable Region based on the “instagramm2022.Rda” data file:

onewaytable1 <- table(insta$Region)
barplot(onewaytable1,
        xlab="", 
        ylab="Number of people", 
        main = "Instagram popular account\n distribution by region", 
        col = "slateblue1", 
        cex.names=0.9)

HW 7 Guide

There are two parts:

I will walk through question 3 with you today

If you have questions about Part 1, feel free to ask me after the end of today’s lab

HW 7 Guide

The data frame: instagram2022.Rda

This data contains 50 popular Instagram accounts with a list of variables

HW 7 Guide

HW 7 Guide

Line 20 and 21: These two lines create a custom R function called ctable() which shows column proportions

ctable <- function(...,digits=2)
  round(addmargins(prop.table(table(...),margin=2),margin=1),digits)

For the purpose of this class, you don’t need to understand how it is done

You just need to run these two lines before running any lines below

Note: the first variable in the ctable() function goes to the row in the output. The second variable in the ctable() function goes to the column in the output.

HW 7 Guide

In case you want to know what line 24 and 25 do:

Custom function: once these two lines have been run, you can use the R function ctable()

This line defines how the ctable() function works:

• It puts the two variables in the parentheses in to a table() command. Then put the table() command in a prop.table() command to make the output as proportion by column. Then put the prop.table() command in a addmargins() command to add column total. Lastly, the addmargins() command is put into the round() command to round off the output.

HW 7 Guide

Line 28: load() command

This line loads the data frame (instagram2022.Rda) into RStudio

• If you have loaded this data frame through FileOpen File, then you can skip this line

HW 7 Guide

Line 31: print() function

If you put down an R object which is a data frame within the parentheses, this function outputs the actual observations stored inside the data frame

HW 7 Guide

Line 30: table() function

This line creates a bivariate table

You need to add an explanation of what this function does right above this function.

Line 32: ctable() function

This is the custom R function that professor creates on line 20 and 21

• You have to run line 20 and 21 first before running line 32

• The ctable() function gives you a bivariate table in column proportions

• Note: the first variable in the ctable() function goes to the row in the output. The second variable in the ctable() function goes to the column in the output.

HW 7 Guide

You need to add an explanation of what this function does right above this function.

# Add comment here
table(insta$Industry,insta$Region)
##                
##                 Asia Caribbean Europe North America South America
##   Entertainment    0         0      1             6             0
##   Fashion          0         0      0             3             0
##   Film             3         0      0             3             0
##   Govt             0         0      0             1             0
##   Media            0         0      0             2             0
##   Music            2         2      1            15             1
##   Sports           1         0      5             2             2
# Add comment here
ctable(insta$Industry,insta$Region)
##                
##                 Asia Caribbean Europe North America South America
##   Entertainment 0.00      0.00   0.14          0.19          0.00
##   Fashion       0.00      0.00   0.00          0.09          0.00
##   Film          0.50      0.00   0.00          0.09          0.00
##   Govt          0.00      0.00   0.00          0.03          0.00
##   Media         0.00      0.00   0.00          0.06          0.00
##   Music         0.33      1.00   0.14          0.47          0.33
##   Sports        0.17      0.00   0.71          0.06          0.67
##   Sum           1.00      1.00   1.00          1.00          1.00

HW 7 Guide

Line 46 and 49

These lines are incomplete

• You need to delete [ADD VARIABLE HERE]

• Read the comment on line 45. Then you know which variable should be put on line 46 to replace [ADD VARIABLE HERE]

# Conditional table of accounts by Female and Region
ctable([ADD VARIABLE HERE], insta$Region)

# Table of accounts by Corporate and Region
table([ADD VARIABLE HERE], [ADD VARIABLE HERE])

HW 7 Guide

Line 49: You need to write the entire line of R code

# Conditional table of accounts by Corporate and Region
## ADD CODE HERE

HW 7 Guide

b. What other variables are included in the ‘insta’ data frame? (5 points)

List variables that are not mentioned in R script for the homework

c. What are the dimensions of each of the 4 two-way tables (excluding conditional tables)? (8 points)

# Add comment here
table(insta$Industry,insta$Region)
##                
##                 Asia Caribbean Europe North America South America
##   Entertainment    0         0      1             6             0
##   Fashion          0         0      0             3             0
##   Film             3         0      0             3             0
##   Govt             0         0      0             1             0
##   Media            0         0      0             2             0
##   Music            2         2      1            15             1
##   Sports           1         0      5             2             2

Hint: How many rowsXcolumns in the table?

HW 7 Guide

You will find the answers to these questions by looking at your R script.

d. Which industry has the most corporate accounts? (2 points)

e. Which industry is most evenly split between corporate and non-corporate accounts? (2 points)

f. Which industry is most popular in North America? South America? Europe? (6 points)

g. Which region(s) is the most male dominated? Which region(s) is the most female dominate? (4 points)

h. Which region has the most corporate accounts? (2 points)

i. Which region has the highest percentage of corporate accounts? (2 points)

j. Which region has the most different industry accounts in the top 50? (2 points)

k. Which region has the fewest? (2 points)