The Great British Bakeoff

TidyTuesday 2022wk43
TidyTuesday
RStats
Author

Amit Lad

Published

October 31, 2022

Introduction

This week’s data set is all to do with the Great British Bake-off (GBBO). GBBO is a British television baking competition, in which a group of amateur bakers compete against each other in a series of rounds, attempting to impress the judges with their baking skills. Contestants are eliminated as the rounds progress, and the winner is selected from the contestants who reach the final.

Despite the show’s popularity, it’s not something that I have ever watched, and I had no idea what to expect from the data. My approach was to ask some GBBO fans for their hypotheses, and then build visualisations to test these hypotheses.

Hypostheses

The hypotheses that I will test are:

  • the average age will have gone down over the series

  • the technical ranking will have no significant correlation with who won star baker in a week

  • viewers will drop when Sue, Mel and Mary leave

Findings

Show the code
library(tidyverse)
library(tidytuesdayR)
library(here)

#tt_data <- tt_load(2022, week=43)
#write_rds(x = tt_data, file = paste0(here("posts","TT_2022_wk43_BakeOff"), "/tt_data.rds"))
tt_data <- read_rds(paste0(here("posts","TT_2022_wk43_BakeOff"), "/tt_data.rds"))
challenges <- tt_data$challenges
bakers <- tt_data$bakers
ratings <- tt_data$ratings
episodes <- tt_data$episodes

Has the average age gone down over the series?

Show the code
baker_ages <- 
  bakers %>%
  group_by(series) %>%
  summarise(
    median_age = median(age), 
    mean_age = mean(age), 
    winner_age = sum(series_winner*age),
    .groups = "drop"
    )

ggplot(data = baker_ages, mapping = aes(x = factor(series))) +
  theme(
    panel.background = element_blank(),
    panel.grid.major.y = element_line(colour = "grey95"),
    panel.grid.minor.y = element_line(colour = "grey97"),
    panel.grid.major.x = element_blank(),
    axis.ticks = element_blank()
    ) +
  scale_x_discrete(label = function(x){return(paste("S", x))}) +
  geom_violin(data = bakers, mapping = aes(y = age, fill = factor(series)), alpha = 0.3, show.legend = FALSE) +
  geom_point(mapping = aes(x = series, y = winner_age, colour = "winner age colour"), shape = 19, size = 3) +
  geom_point(mapping = aes(x = series, y = median_age, colour = "median age colour"), shape = 18, size = 2) +
  geom_point(mapping = aes(x = series, y = mean_age, colour = "mean age colour"), shape = 18, size = 2) +
  xlab("") +
  ylab("Age") +
  scale_colour_manual(
    name = "", 
    guide = "legend",
    values =c("winner age colour"="orange","median age colour"="black", "mean age colour"="grey"), 
    labels = c("Winners age", "Median age", "Mean age")
    ) +
  annotate("text", x = 2.5, y = 69, size = 3, label = "Age range of \nparticpants grew \nbetween S1 and S8") +
  annotate("text", x = 9.5, y = 65, size = 3, label = "Much younger \nparticipants in \nS9 and S10") +
  labs(
    title = "GBBO winners tend to be from the younger contestants",
    subtitle = "Violin plots of the age distribution of participants each season \nwith summary statistics overlaid",
    caption = "#TidyTuesday 2022 wk43 | Data: {bakeoff} package | Viz: Amit Lad"
    )

Do Star Bakers have higher technical rankings?

Show the code
star_baker_vs_technical_ranking <-
  challenges %>%
  filter(result == "STAR BAKER") %>%
  left_join(episodes, by = c("episode", "series")) %>%
  select(series, episode, baker, result, technical, bakers_appeared) %>%
  mutate(technical_ranking_percentile = 100* (bakers_appeared - technical) / (bakers_appeared - 1))

ggplot(data = star_baker_vs_technical_ranking, mapping = aes(x = technical_ranking_percentile)) +
  theme(
    panel.background = element_blank(),
    panel.grid.major.y = element_blank(),
    panel.grid.minor.y = element_blank(),
    panel.grid.major.x = element_blank(),
    axis.text.y = element_blank(),
    axis.ticks = element_blank(),
    axis.title = element_blank()
    ) +
  geom_density(aes(technical_ranking_percentile, colour = factor(series))) +
  scale_color_discrete(
    name = "",
    labels=c("Season 2", "Season 3", "Season 4", "Season 5", "Season 6", "Season 7", "Season 8", "Season 9", "Season 10")) +
  geom_density(aes(technical_ranking_percentile), linetype = "dotted", size = 1) +
  labs(
    title = "Overall, GBBO Star Bakers tended to rank higher on the technical challenge",
    subtitle = "Density plots of the technical challenge rankings of star bakers.  \nThe dotted black line is for all seasons combined.  \nThere were no star bakers in Season 1.",
    caption = "#TidyTuesday 2022 wk43 | Data: {bakeoff} package | Viz: Amit Lad"
    ) +
  annotate("text", x = 25, y = 0.015, size = 3, label = "In S8 and S9, \nmany Star Bakers \ndid not rank highly on the \ntechnical challenge.  ") +
  annotate("text", x = 80, y = 0.033, size = 3, label = "In S2 and S3, \nmost Star Bakers \nranked highly on the \ntechnical challenge.  ")

Does viewership drop when Sue, Mel and Mary leave?

Show the code
viewership <- 
  ratings %>% 
  group_by(series) %>% 
  mutate(ave_views = mean(viewers_7day)) %>% 
  ungroup() %>% 
  mutate(series = factor(series))

ggplot(viewership, aes(x = episode_count, y = viewers_7day)) +
  theme(
    panel.background = element_blank(),
    panel.grid.major.y = element_line(colour = "grey95"),
    panel.grid.minor.y = element_line(colour = "grey97"),
    panel.grid.major.x = element_blank(),
    axis.ticks = element_blank(),
    axis.text.x = element_blank(),
    ) +
  geom_segment(aes(xend = episode_count, yend = ave_views, colour = series)) +
    scale_color_discrete(
    name = "",
    labels=c("Season 1", "Season 2", "Season 3", "Season 4", "Season 5", "Season 6", "Season 7", "Season 8", "Season 9", "Season 10")) +
  geom_line(data = filter(viewership, series == 1), aes(y = ave_views, colour = factor(series))) +
  geom_line(data = filter(viewership, series == 2), aes(y = ave_views, colour = factor(series))) +
  geom_line(data = filter(viewership, series == 3), aes(y = ave_views, colour = factor(series))) +
  geom_line(data = filter(viewership, series == 4), aes(y = ave_views, colour = factor(series))) +
  geom_line(data = filter(viewership, series == 5), aes(y = ave_views, colour = factor(series))) +
  geom_line(data = filter(viewership, series == 6), aes(y = ave_views, colour = factor(series))) +
  geom_line(data = filter(viewership, series == 7), aes(y = ave_views, colour = factor(series))) +
  geom_line(data = filter(viewership, series == 8), aes(y = ave_views, colour = factor(series))) +
  geom_line(data = filter(viewership, series == 9), aes(y = ave_views, colour = factor(series))) +
  geom_line(data = filter(viewership, series == 10), aes(y = ave_views, colour = factor(series))) +
  xlab("") +
  ylab("Viewers (millions)") +
  scale_y_continuous(limits = c(2, 16), breaks = seq(0, 16, by = 2)) +
  labs(
    title = "GBBO viewership steadily increased for first 7 seasons, \nand then significantly reduced after Sue, Mel and Mary left",
    subtitle = "Horizontal lines show the average viewership for each season. \nVertical lines show viewership for individual episodes in relation to season average.",
    caption = "#TidyTuesday 2022 wk43 | Data: {bakeoff} package | Viz: Amit Lad"
    ) +
  annotate("text", x = 75, y = 7, size = 3, label = "Sue, Mel and Mary left GBBO after S7 \nand the show moved from BBC to C4.")

Showcase of other people’s visualisations work

@cararthompson

@cararthompson

@zakvarty

@zakvarty

@PauBaudry

@PauBaudry

@danoehm

@danoehm

@andiyudha

@andiyudha

@JalkanenTero

@JalkanenTero

Citation

BibTeX citation:
@online{lad2022,
  author = {Amit Lad},
  title = {The {Great} {British} {Bakeoff}},
  date = {2022-10-31},
  url = {https://amitlad.com/posts/TT_2022_wk43_BakeOff/post.html},
  langid = {en}
}
For attribution, please cite this work as:
Amit Lad. 2022. “The Great British Bakeoff.” October 31, 2022. https://amitlad.com/posts/TT_2022_wk43_BakeOff/post.html.