HW7 IS DUE THURSDAY (Nov. 2) 11.59 PM!
Interchangeable terms:
Plot
Graph
Chart
Univariate graphs:
For quantitative variable, create:
For categorical variable, create:
You can add many options
breaks : specify the range and breaks of x-axis (or the
number of breaks)
xlim : specify the range of x-axis
main : specify the main title
ylab : specify the y-axis label
xlab : specify the x-axis label
col : specify the color of the bar
hist(as.numeric(insta$Followers), main = "Distribution of number of followers", 
     xlab = "Number of followers", 
     col = 'lightblue')boxplot(as.numeric(insta$Followers),
        xlab = "Distribution of number of followers", 
        main="Box plot", 
        col = "pink")To make a bar chart, you should construct a one- way table first
For example, make a one-way table for the variable Region based on the “instagramm2022.Rda” data file:
onewaytable1 <- table(insta$Region)
barplot(onewaytable1,
        xlab="", 
        ylab="Number of people", 
        main = "Instagram popular account\n distribution by region", 
        col = "slateblue1", 
        cex.names=0.9)There are two parts:
Part 1: Q1 and Q2
Part 2: Q3
I will walk through question 3 with you today
If you have questions about Part 1, feel free to ask me after the end of today’s lab
The data frame: instagram2022.Rda
This data contains 50 popular Instagram accounts with a list of variables
Rank of account (based on number of followers)
User name
Owner
The profession
Country
Personal vs. corporal account
Industry
Region
Female
Line 20 and 21: These two lines create a custom R function called ctable() which shows column proportions
For the purpose of this class, you don’t need to understand how it is done
You just need to run these two lines before running any lines below
Note: the first variable in the ctable() function goes to the row in the output. The second variable in the ctable() function goes to the column in the output.
In case you want to know what line 24 and 25 do:
Custom function: once these two lines have been run, you can use the R function ctable()
This line defines how the ctable() function works:
• It puts the two variables in the parentheses in to a table() command. Then put the table() command in a prop.table() command to make the output as proportion by column. Then put the prop.table() command in a addmargins() command to add column total. Lastly, the addmargins() command is put into the round() command to round off the output.
Line 28: load() command
This line loads the data frame (instagram2022.Rda) into RStudio
• If you have loaded this data frame through FileOpen File, then you can skip this line
Line 31: print() function
If you put down an R object which is a data frame within the parentheses, this function outputs the actual observations stored inside the data frame
Line 30: table() function
This line creates a bivariate table
You need to add an explanation of what this function does right above this function.
Line 32: ctable() function
This is the custom R function that professor creates on line 20 and 21
• You have to run line 20 and 21 first before running line 32
• The ctable() function gives you a bivariate table in column proportions
• Note: the first variable in the ctable() function goes to the row in the output. The second variable in the ctable() function goes to the column in the output.
You need to add an explanation of what this function does right above this function.
##                
##                 Asia Caribbean Europe North America South America
##   Entertainment    0         0      1             6             0
##   Fashion          0         0      0             3             0
##   Film             3         0      0             3             0
##   Govt             0         0      0             1             0
##   Media            0         0      0             2             0
##   Music            2         2      1            15             1
##   Sports           1         0      5             2             2
##                
##                 Asia Caribbean Europe North America South America
##   Entertainment 0.00      0.00   0.14          0.19          0.00
##   Fashion       0.00      0.00   0.00          0.09          0.00
##   Film          0.50      0.00   0.00          0.09          0.00
##   Govt          0.00      0.00   0.00          0.03          0.00
##   Media         0.00      0.00   0.00          0.06          0.00
##   Music         0.33      1.00   0.14          0.47          0.33
##   Sports        0.17      0.00   0.71          0.06          0.67
##   Sum           1.00      1.00   1.00          1.00          1.00
Line 46 and 49
These lines are incomplete
• You need to delete [ADD VARIABLE HERE]
• Read the comment on line 45. Then you know which variable should be put on line 46 to replace [ADD VARIABLE HERE]
Line 49: You need to write the entire line of R code
b. What other variables are included in the ‘insta’ data frame? (5 points)
List variables that are not mentioned in R script for the homework
c. What are the dimensions of each of the 4 two-way tables (excluding conditional tables)? (8 points)
##                
##                 Asia Caribbean Europe North America South America
##   Entertainment    0         0      1             6             0
##   Fashion          0         0      0             3             0
##   Film             3         0      0             3             0
##   Govt             0         0      0             1             0
##   Media            0         0      0             2             0
##   Music            2         2      1            15             1
##   Sports           1         0      5             2             2
Hint: How many rowsXcolumns in the table?
You will find the answers to these questions by looking at your R script.
d. Which industry has the most corporate accounts? (2 points)
e. Which industry is most evenly split between corporate and non-corporate accounts? (2 points)
f. Which industry is most popular in North America? South America? Europe? (6 points)
g. Which region(s) is the most male dominated? Which region(s) is the most female dominate? (4 points)
h. Which region has the most corporate accounts? (2 points)
i. Which region has the highest percentage of corporate accounts? (2 points)
j. Which region has the most different industry accounts in the top 50? (2 points)
k. Which region has the fewest? (2 points)