6.2 ggplot_grader()

Description

Grades Private (or Public) Questions involving ggplot objects from the ggplot2 package (Wickham et al. 2024).

This function provides more detailed feedback than traditionally given by the autograder. Rather than evaluating for full plot correctness, ggplot_grader() checks for key plot attributes, like a proper title or x-y aesthetics, and returns targeted feedback. Accordingly, some flexibility is maintained between the student’s plot and the solution.

See Plots for more information.

Usage

ggplot_grader(var_name, 
              var, 
              var_test, 
              status = paste0("Question ", quest_num, " (Private)"),
              prev_check = FALSE, 
              title_check = FALSE, 
              axis_check = FALSE, 
              aesthetic_check = FALSE, 
              aesthetic_flexible_check = FALSE, 
              axis_name_check = FALSE, 
              data_check = FALSE, 
              facet_wrap_check = FALSE, 
              geom_check = FALSE, 
              geom_check_name = c("GeomPoint"), 
              color_fill_check = FALSE, 
              color_fill_check_name = c("colour"), 
              quest_num = 0, 
              quest_pt = 0, 
              quest_prev = 1, 
              quest_prev_status = NULL)

Arguments

Element Description Default
var_name character, the expected name of var -
var expected ggplot object, the student’s ggplot -
var_test other ggplot object, the answer key’s ggplot to be compared with var -
status character, the question’s displayed part, number, and type (Private/Public) paste0(“Question”, quest_num, ” (Private)“)
prev_check logical indicating if the Prerequisite Check should be triggered; requires quest_prev value and can optionally be combined with quest_prev_status FALSE
title_check logical indicating if an existence check for a graph title should be triggered FALSE
axis_check logical indicating if an existence check for the x-y axis labels should be triggered FALSE
aesthetic_check logical indicating if a strict check for the correct x-y aesthetic mappings should be triggered; alternative to aesthetic_flexible_check; see aesthetic_check or aesthetic_flexible_check? FALSE
aesthetic_flexible_check logical indicating if a flexible check for the correct x-y aesthetic mappings should be triggered; alternative to aesthetic_check; see aesthetic_check or aesthetic_flexible_check? FALSE
axis_name_check logical indicating if a check for the correct x-y axis labels should be triggered FALSE
data_check logical indicating if a check for the correct data associated with the x-y aesthetics should be triggered; requires aesthetic_check FALSE
facet_wrap_check logical indicating if a check for the existence of ggplot2::facet_wrap() should be triggered FALSE
geom_check logical indicating if an existence check for ggplot layer types (e.g., ggplot2::geom_point()) should be triggered; requires geom_check_name FALSE
geom_check_name character vector of the various ggplot layer types to check for in geom_check; Example: c(“GeomPoint”, “GeomSmooth”) c(“GeomPoint”)
color_fill_check logical indicating if an existence check for the “color” and/or “fill” arguments should be triggered; requires color_fill_check_name FALSE
color_fill_check_name character vector of the “color” and/or “fill” arguments to check for in color_fill_check; Example: c(“colour”, “fill”) c(“colour”)
quest_num numeric, the question’s number 0
quest_pt numeric, the question’s maximum score 0
quest_prev numeric, the prerequisite question’s number 1
quest_prev_status character, the prerequisite question’s part and number NULL

Value

The test.results[#, ] vector, containing the question’s status, amount of points awarded, total point value, and feedback message.

See What is test.results? for more details.

Details

Designed for grading Private (or Public) Questions involving ggplot objects only (i.e., objects with a ggplot class attribute).

The function flexibly grades ggplot objects by allowing the user to specify which ggplot attributes to check for.

Note that var is the expected student’s ggplot object and may not necessarily exist or be a ggplot object (e.g., the student decided to use R’s base plot() instead).

  • This is not a problem because ggplot_grader() first tests for a ggplot class attribute and then performs the Name Check. If var does not exist or is not a ggplot object, the subsequent Checks will not be triggered.

Like in private_grader(), quest_prev_status is an optional argument when using prev_check and quest_prev but helps clarify an unclear prerequisite question’s location in the feedback message.

Note that for color_fill_check, colour is used to set the argument instead of color.

When to use aesthetic_check or aesthetic_flexible_check?

aesthetic_check tests if the x-y aesthetic mappings of var are exactly the same as in var_test, while aesthetic_flexible_check tests if the x-y aesthetic mappings of var_test are contained in var.

Consider the following motivating example:

#student's ggplot object
var <- data |> ggplot(aes(date, death * 100)) 

#answer key's ggplot object
var_test <- data |> mutate(death = death * 100) |> 
            ggplot(aes(date, death)) 

Under aesthetic_check, the student’s var would not be considered correct, because its x-y aesthetic mappings (x = date, y = death * 100) are not the same as in var_test (x = date, y = death), despite the ggplots being virtually identical.

However, under aesthetic_flexible_check, var is now considered correct, since it contains var_test’s aesthetics.

aesthetic_flexible_check is more flexible than aesthetic_check, since it allows for a wider range of acceptable x-y aesthetics. However, because the student’s aesthetics can then vary slightly from the answer key, the data_check can’t be performed without an error occurring.5

Consequently, data_check is automatically turned off when aesthetic_flexible_check is set to TRUE.

Recommendation: If it is reasonable that a student could use similar, but not identical, x-y aesthetics to the solution (like above), use aesthetic_flexible_check, keeping in mind that data_check will be turned off. Otherwise, if the aesthetics can reasonably be expected to be the same as the solution and/or the prompt is explicit about which aesthetics to chose, use aesthetic_check. Implementing a data_check is available here.

  • Generally, opting for aesthetic_flexible_check is the safer and better choice (at least until this issue is resolved).

Warning:

aesthetic_flexible_check and aesthetic_check are mutually exclusive checks.

Do not use both arguments together – that is, don’t set both to TRUE.

Example 1

Consider the following question from Homework 10.

Part 1: COVID-19 Data

  1. (Public Question) Next, create a bar plot, using ggplot2, that plots the percentage of people who died from COVID-19 based on their vaccination status and age group. The plot should have four bars in two groups. One group of bars should be for those unvaccinated and the second group of bars should be for those vaccinated. In each group, you should have two bars representing the percentage who died of COVID-19, one for those 50 years or older and one for those younger than 50. Save your bar plot as covid_plot2.

The answer should look something like:

covid_plot2 <- covid_data |> 
  summarize(death = sum(outcome == "death")/n(),
            survived = sum(outcome == "survived")/n(),
            .by = c(vaccine_status, age_group)) |>
    ggplot(aes(x = vaccine_status, y = death, fill = age_group)) +
    geom_bar(stat = "identity", position = "dodge") +
    ylab("percentage") 

Assume this question is worth 20 points and that we want to check if the ggplot has:

  • a geom_bar layer,
  • a fill argument,
  • and x-y axis labels.

Then, the autograder code for this question should look like:

#Comparing the student's ggplot `covid_plot2` with the answer key's `covid_plot2_test` 
#Note: the student can use `geom_col()` instead of `geom_bar()`, since it has the same ggplot attribute `GeomBar`
test.results[4, ] <- ggplot_grader("covid_plot2", covid_plot2, covid_plot2_test, 
                                   status = "Part 1 Question 4 (Public)", axis_check = TRUE, 
                                   geom_check = TRUE, geom_check_name = c("GeomBar"), 
                                   color_fill_check = TRUE, color_fill_check_name = c("fill"),
                                   quest_num = 4, quest_pt = 20)

Example 2

Consider the following question from Homework 12.

Part 1: Coding Assignment

  1. In the final coding question, you will recreate Figures 1 and 2, which represent the median home price in the data for each state and the median home price to county per capita income ratio for each state.

    1. (Private Question) Using priceIncomeData, create a bar plot called priceByIncomePlot that shows the median home price to county per capita income ratio for each state and the US as a whole. The x-axis should be the state, and the y-axis should be the median home price to county per capita income ratio. The bars should be colored based on if the state median home price to county per capita income ratio is above or below the US as a whole.

The answer should look something like:

priceByIncomePlot <- priceIncomeData |>
  ggplot(aes(x = reorder(state, price_by_income), y = price_by_income)) +
  geom_bar(stat = "identity", aes(fill = above_below_us_price_by_income)) +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) + #rotate x-axis values for visibility 
  labs(title = 'Median Home Price to Average Income Ratio', x = 'State', y = 'Median Price to Average Income Ratio')

Assume this question is worth 15 points and is the eighth question of the assignment.

We want to check if the ggplot has:

  • a title,
  • the correct x-y axis labels,
  • a geom_bar layer,
  • a fill argument,
  • and a correctly created priceIncomeData from a previous question (the sixth question).

Then, the autograder code for this question should look like:

#Comparing the student's ggplot `priceByIncomePlot` with the answer key's `priceByIncomePlot_test` 
#Note: `axis_name_check` can be used without `axis_check` to check if the x-y axis labels are correct (and therefore exist)
test.results[8, ] <- ggplot_grader("priceByIncomePlot", priceByIncomePlot, priceByIncomePlot_test, 
                                   status = "Question 4c (Private)", title_check = TRUE, 
                                   axis_check = TRUE, axis_name_check = TRUE, 
                                   geom_check = TRUE, geom_check_name = c("GeomBar"), 
                                   color_fill_check = TRUE, color_fill_check_name = c("fill"), 
                                   prev_check = TRUE, quest_prev = 6, quest_prev_status = "Question 4a",
                                   quest_num = 8, quest_pt = 15) 

Example 3

Consider the following question from Homework 13.

Part 1: Coding Assignment

  1. In this question, you will work with the gdpData.csv file to analyze national Olympic performance and GDP.

    1. (Public Question) Next, using olympic_gdp_data and ggplot, create a group of scatter plots that plot the percentage of the GDP and percentage of the medals won by each country in the 2010 through 2016 Olympic games. The x-axis should contain the percentage of the GDP and the y-axis should contain the percentage of the medals won. The plots should be faceted by the year and season of the Olympic games, such that one plot is for the 2010 winter Olympics, another is for the 2012 summer Olympics, etc. Include a line with slope equal to 1 to denote the values for which the percent of GDP and percent of medals won is equal. You should also label the points on your graph with the NOC codes for the countries that have at least 5% of the GDP or medals won during each Olympic games. Save your plot as olympic_gdp_perc_plot.

The answer should look something like:

olympic_gdp_perc_plot <- olympic_gdp_data |> 
    filter(year >= 2010) |> 
    filter(!is.na(gdp_per_capita), medalsPerc > 0.001) |> 
    ggplot(aes(x = gdpPerc, y = medalsPerc)) + 
    geom_point() +
    geom_text(aes(label = if_else(medalsPerc > 0.05 | gdpPerc > 0.05, noc, ''), 
                  hjust = 0, vjust = 0)) +
    geom_abline(intercept = 0, slope = 1, col = 'red') +
    facet_wrap(year~season)

Assume that this question is worth 5 points and is the ninth question of the assignment.

We want to check if the ggplot has:

  • x-y axis labels,
  • the flexibly correct x-y aesthetics,
  • geom_point, geom_abline, and geom_text layers,
  • a facet_wrap() function,
  • and a correctly created olympic_gdp_data from a previous question (the eighth question).

Then, the autograder code for this question should look like:

#Comparing the student's ggplot `olympic_gdp_perc_plot` with the answer key's `olympic_gdp_perc_plot_test` 
#Note: `geom_abline()` can be checked through the `GeomAbline` attribute and `geom_text` through the `GeomText` attribute
test.results[9, ] <- ggplot_grader("olympic_gdp_perc_plot", olympic_gdp_perc_plot , 
                                   olympic_gdp_perc_plot_test, status = "Question 2d (Public)",
                                   axis_check = TRUE, aesthetic_flexible_check = TRUE, 
                                   geom_check = TRUE, geom_check_name = c("GeomPoint", "GeomAbline", "GeomText"),  
                                   quest_num = 9, quest_pt = 5, facet_wrap_check = TRUE,
                                   prev_check = TRUE, quest_prev = 8, quest_prev_status = "Question 2c (Public)")

Example 4

Consider the hypothetical homework question:

Part 1: Coding Assignment

  1. (Private Question) Using basketball_data from Question 1, create a scatter plot of points versus assists with ggplot. The x-axis should be pt and the y-axis should be ast. Save the ggplot as basketball_scatter.

The answer should look something like:

basketball_scatter <- basketball_data |> 
  ggplot(aes(x = pt, y = ast)) +
  geom_point()

Assume this question is worth 5 points and that we want to check if the ggplot has:

  • the correct x-y axis labels,
  • the correct x-y aesthetics,
  • the correct x-y data,
  • a geom_point layer,
  • and a correctly created basketball_data from a previous question (the first question).

Then, the autograder code for this question should look like:

#Comparing the student's plot `basketball_scatter` with the answer key's `basketball_scatter_test` 
test.results[2, ] <- ggplot_grader("basketball_scatter", basketball_scatter, basketball_scatter_test,
                                    status = "Part 1 Question 2 (Private)", axis_name_check = TRUE, 
                                    aesthetic_check = TRUE, data_check = TRUE,
                                    geom_check = TRUE, geom_check_name = c("GeomPoint"), 
                                    quest_num = 2, quest_pt = 5, prev_check = TRUE, quest_prev = 1)

Acknowledgements

Author: Riley Berman

Contributors: Alex Zhao, Jack Keefer, Shreya Sinha, Michal Snopek

References

Wickham, Hadley, Winston Chang, Lionel Henry, Thomas Lin Pedersen, Kohske Takahashi, Claus Wilke, Kara Woo, Hiroaki Yutani, Dewey Dunnington, and Teun van den Brand. 2024. Ggplot2: Create Elegant Data Visualisations Using the Grammar of Graphics. https://ggplot2.tidyverse.org.

  1. We haven’t found any smooth way to fix this, which might entail comparing only the final data used in the ggplot without reference to the x-y aesthetics. The problem we ran into is that to compare the student’s and answer key’s ggplot data, while allowing for flexible row and column ordering, we need to have exactly the same x-y aesthetics. Perhaps something like ggplot_build can resolve this.↩︎