This week’s data set is all to do with the Great British Bake-off (GBBO). GBBO is a British television baking competition, in which a group of amateur bakers compete against each other in a series of rounds, attempting to impress the judges with their baking skills. Contestants are eliminated as the rounds progress, and the winner is selected from the contestants who reach the final.
Despite the show’s popularity, it’s not something that I have ever watched, and I had no idea what to expect from the data. My approach was to ask some GBBO fans for their hypotheses, and then build visualisations to test these hypotheses.
Hypostheses
The hypotheses that I will test are:
the average age will have gone down over the series
the technical ranking will have no significant correlation with who won star baker in a week
baker_ages <- bakers %>%group_by(series) %>%summarise(median_age =median(age), mean_age =mean(age), winner_age =sum(series_winner*age),.groups ="drop" )ggplot(data = baker_ages, mapping =aes(x =factor(series))) +theme(panel.background =element_blank(),panel.grid.major.y =element_line(colour ="grey95"),panel.grid.minor.y =element_line(colour ="grey97"),panel.grid.major.x =element_blank(),axis.ticks =element_blank() ) +scale_x_discrete(label =function(x){return(paste("S", x))}) +geom_violin(data = bakers, mapping =aes(y = age, fill =factor(series)), alpha =0.3, show.legend =FALSE) +geom_point(mapping =aes(x = series, y = winner_age, colour ="winner age colour"), shape =19, size =3) +geom_point(mapping =aes(x = series, y = median_age, colour ="median age colour"), shape =18, size =2) +geom_point(mapping =aes(x = series, y = mean_age, colour ="mean age colour"), shape =18, size =2) +xlab("") +ylab("Age") +scale_colour_manual(name ="", guide ="legend",values =c("winner age colour"="orange","median age colour"="black", "mean age colour"="grey"), labels =c("Winners age", "Median age", "Mean age") ) +annotate("text", x =2.5, y =69, size =3, label ="Age range of \nparticpants grew \nbetween S1 and S8") +annotate("text", x =9.5, y =65, size =3, label ="Much younger \nparticipants in \nS9 and S10") +labs(title ="GBBO winners tend to be from the younger contestants",subtitle ="Violin plots of the age distribution of participants each season \nwith summary statistics overlaid",caption ="#TidyTuesday 2022 wk43 | Data: {bakeoff} package | Viz: Amit Lad" )
Do Star Bakers have higher technical rankings?
Show the code
star_baker_vs_technical_ranking <- challenges %>%filter(result =="STAR BAKER") %>%left_join(episodes, by =c("episode", "series")) %>%select(series, episode, baker, result, technical, bakers_appeared) %>%mutate(technical_ranking_percentile =100* (bakers_appeared - technical) / (bakers_appeared -1))ggplot(data = star_baker_vs_technical_ranking, mapping =aes(x = technical_ranking_percentile)) +theme(panel.background =element_blank(),panel.grid.major.y =element_blank(),panel.grid.minor.y =element_blank(),panel.grid.major.x =element_blank(),axis.text.y =element_blank(),axis.ticks =element_blank(),axis.title =element_blank() ) +geom_density(aes(technical_ranking_percentile, colour =factor(series))) +scale_color_discrete(name ="",labels=c("Season 2", "Season 3", "Season 4", "Season 5", "Season 6", "Season 7", "Season 8", "Season 9", "Season 10")) +geom_density(aes(technical_ranking_percentile), linetype ="dotted", size =1) +labs(title ="Overall, GBBO Star Bakers tended to rank higher on the technical challenge",subtitle ="Density plots of the technical challenge rankings of star bakers. \nThe dotted black line is for all seasons combined. \nThere were no star bakers in Season 1.",caption ="#TidyTuesday 2022 wk43 | Data: {bakeoff} package | Viz: Amit Lad" ) +annotate("text", x =25, y =0.015, size =3, label ="In S8 and S9, \nmany Star Bakers \ndid not rank highly on the \ntechnical challenge. ") +annotate("text", x =80, y =0.033, size =3, label ="In S2 and S3, \nmost Star Bakers \nranked highly on the \ntechnical challenge. ")
Does viewership drop when Sue, Mel and Mary leave?
Show the code
viewership <- ratings %>%group_by(series) %>%mutate(ave_views =mean(viewers_7day)) %>%ungroup() %>%mutate(series =factor(series))ggplot(viewership, aes(x = episode_count, y = viewers_7day)) +theme(panel.background =element_blank(),panel.grid.major.y =element_line(colour ="grey95"),panel.grid.minor.y =element_line(colour ="grey97"),panel.grid.major.x =element_blank(),axis.ticks =element_blank(),axis.text.x =element_blank(), ) +geom_segment(aes(xend = episode_count, yend = ave_views, colour = series)) +scale_color_discrete(name ="",labels=c("Season 1", "Season 2", "Season 3", "Season 4", "Season 5", "Season 6", "Season 7", "Season 8", "Season 9", "Season 10")) +geom_line(data =filter(viewership, series ==1), aes(y = ave_views, colour =factor(series))) +geom_line(data =filter(viewership, series ==2), aes(y = ave_views, colour =factor(series))) +geom_line(data =filter(viewership, series ==3), aes(y = ave_views, colour =factor(series))) +geom_line(data =filter(viewership, series ==4), aes(y = ave_views, colour =factor(series))) +geom_line(data =filter(viewership, series ==5), aes(y = ave_views, colour =factor(series))) +geom_line(data =filter(viewership, series ==6), aes(y = ave_views, colour =factor(series))) +geom_line(data =filter(viewership, series ==7), aes(y = ave_views, colour =factor(series))) +geom_line(data =filter(viewership, series ==8), aes(y = ave_views, colour =factor(series))) +geom_line(data =filter(viewership, series ==9), aes(y = ave_views, colour =factor(series))) +geom_line(data =filter(viewership, series ==10), aes(y = ave_views, colour =factor(series))) +xlab("") +ylab("Viewers (millions)") +scale_y_continuous(limits =c(2, 16), breaks =seq(0, 16, by =2)) +labs(title ="GBBO viewership steadily increased for first 7 seasons, \nand then significantly reduced after Sue, Mel and Mary left",subtitle ="Horizontal lines show the average viewership for each season. \nVertical lines show viewership for individual episodes in relation to season average.",caption ="#TidyTuesday 2022 wk43 | Data: {bakeoff} package | Viz: Amit Lad" ) +annotate("text", x =75, y =7, size =3, label ="Sue, Mel and Mary left GBBO after S7 \nand the show moved from BBC to C4.")
---title: 'The Great British Bakeoff'date: '2022-10-31'categories: ['TidyTuesday', 'RStats']description: 'TidyTuesday 2022wk43'execute: message: false warning: falseformat: html: code-fold: true code-summary: "Show the code" code-tools: trueeditor_options: chunk_output_type: consoledraft: falseimage: tt_GBBO_amitlad_01.png---### IntroductionThis week's data set is all to do with the Great British Bake-off (GBBO). GBBO is a British television baking competition, in which a group of amateur bakers compete against each other in a series of rounds, attempting to impress the judges with their baking skills. Contestants are eliminated as the rounds progress, and the winner is selected from the contestants who reach the final.Despite the show's popularity, it's not something that I have ever watched, and I had no idea what to expect from the data. My approach was to ask some GBBO fans for their hypotheses, and then build visualisations to test these hypotheses.### HyposthesesThe hypotheses that I will test are:- the average age will have gone down over the series- the technical ranking will have no significant correlation with who won star baker in a week- viewers will drop when Sue, Mel and Mary leave### Findings```{r download_data}library(tidyverse)library(tidytuesdayR)library(here)#tt_data <- tt_load(2022, week=43)#write_rds(x = tt_data, file = paste0(here("posts","TT_2022_wk43_BakeOff"), "/tt_data.rds"))tt_data <-read_rds(paste0(here("posts","TT_2022_wk43_BakeOff"), "/tt_data.rds"))challenges <- tt_data$challengesbakers <- tt_data$bakersratings <- tt_data$ratingsepisodes <- tt_data$episodes```#### Has the average age gone down over the series?```{r baker_age}baker_ages <- bakers %>%group_by(series) %>%summarise(median_age =median(age), mean_age =mean(age), winner_age =sum(series_winner*age),.groups ="drop" )ggplot(data = baker_ages, mapping =aes(x =factor(series))) +theme(panel.background =element_blank(),panel.grid.major.y =element_line(colour ="grey95"),panel.grid.minor.y =element_line(colour ="grey97"),panel.grid.major.x =element_blank(),axis.ticks =element_blank() ) +scale_x_discrete(label =function(x){return(paste("S", x))}) +geom_violin(data = bakers, mapping =aes(y = age, fill =factor(series)), alpha =0.3, show.legend =FALSE) +geom_point(mapping =aes(x = series, y = winner_age, colour ="winner age colour"), shape =19, size =3) +geom_point(mapping =aes(x = series, y = median_age, colour ="median age colour"), shape =18, size =2) +geom_point(mapping =aes(x = series, y = mean_age, colour ="mean age colour"), shape =18, size =2) +xlab("") +ylab("Age") +scale_colour_manual(name ="", guide ="legend",values =c("winner age colour"="orange","median age colour"="black", "mean age colour"="grey"), labels =c("Winners age", "Median age", "Mean age") ) +annotate("text", x =2.5, y =69, size =3, label ="Age range of \nparticpants grew \nbetween S1 and S8") +annotate("text", x =9.5, y =65, size =3, label ="Much younger \nparticipants in \nS9 and S10") +labs(title ="GBBO winners tend to be from the younger contestants",subtitle ="Violin plots of the age distribution of participants each season \nwith summary statistics overlaid",caption ="#TidyTuesday 2022 wk43 | Data: {bakeoff} package | Viz: Amit Lad" )```#### Do Star Bakers have higher technical rankings?```{r star_baker_technical_rankings}star_baker_vs_technical_ranking <- challenges %>%filter(result =="STAR BAKER") %>%left_join(episodes, by =c("episode", "series")) %>%select(series, episode, baker, result, technical, bakers_appeared) %>%mutate(technical_ranking_percentile =100* (bakers_appeared - technical) / (bakers_appeared -1))ggplot(data = star_baker_vs_technical_ranking, mapping =aes(x = technical_ranking_percentile)) +theme(panel.background =element_blank(),panel.grid.major.y =element_blank(),panel.grid.minor.y =element_blank(),panel.grid.major.x =element_blank(),axis.text.y =element_blank(),axis.ticks =element_blank(),axis.title =element_blank() ) +geom_density(aes(technical_ranking_percentile, colour =factor(series))) +scale_color_discrete(name ="",labels=c("Season 2", "Season 3", "Season 4", "Season 5", "Season 6", "Season 7", "Season 8", "Season 9", "Season 10")) +geom_density(aes(technical_ranking_percentile), linetype ="dotted", size =1) +labs(title ="Overall, GBBO Star Bakers tended to rank higher on the technical challenge",subtitle ="Density plots of the technical challenge rankings of star bakers. \nThe dotted black line is for all seasons combined. \nThere were no star bakers in Season 1.",caption ="#TidyTuesday 2022 wk43 | Data: {bakeoff} package | Viz: Amit Lad" ) +annotate("text", x =25, y =0.015, size =3, label ="In S8 and S9, \nmany Star Bakers \ndid not rank highly on the \ntechnical challenge. ") +annotate("text", x =80, y =0.033, size =3, label ="In S2 and S3, \nmost Star Bakers \nranked highly on the \ntechnical challenge. ")```#### Does viewership drop when Sue, Mel and Mary leave?```{r viewership}viewership <- ratings %>%group_by(series) %>%mutate(ave_views =mean(viewers_7day)) %>%ungroup() %>%mutate(series =factor(series))ggplot(viewership, aes(x = episode_count, y = viewers_7day)) +theme(panel.background =element_blank(),panel.grid.major.y =element_line(colour ="grey95"),panel.grid.minor.y =element_line(colour ="grey97"),panel.grid.major.x =element_blank(),axis.ticks =element_blank(),axis.text.x =element_blank(), ) +geom_segment(aes(xend = episode_count, yend = ave_views, colour = series)) +scale_color_discrete(name ="",labels=c("Season 1", "Season 2", "Season 3", "Season 4", "Season 5", "Season 6", "Season 7", "Season 8", "Season 9", "Season 10")) +geom_line(data =filter(viewership, series ==1), aes(y = ave_views, colour =factor(series))) +geom_line(data =filter(viewership, series ==2), aes(y = ave_views, colour =factor(series))) +geom_line(data =filter(viewership, series ==3), aes(y = ave_views, colour =factor(series))) +geom_line(data =filter(viewership, series ==4), aes(y = ave_views, colour =factor(series))) +geom_line(data =filter(viewership, series ==5), aes(y = ave_views, colour =factor(series))) +geom_line(data =filter(viewership, series ==6), aes(y = ave_views, colour =factor(series))) +geom_line(data =filter(viewership, series ==7), aes(y = ave_views, colour =factor(series))) +geom_line(data =filter(viewership, series ==8), aes(y = ave_views, colour =factor(series))) +geom_line(data =filter(viewership, series ==9), aes(y = ave_views, colour =factor(series))) +geom_line(data =filter(viewership, series ==10), aes(y = ave_views, colour =factor(series))) +xlab("") +ylab("Viewers (millions)") +scale_y_continuous(limits =c(2, 16), breaks =seq(0, 16, by =2)) +labs(title ="GBBO viewership steadily increased for first 7 seasons, \nand then significantly reduced after Sue, Mel and Mary left",subtitle ="Horizontal lines show the average viewership for each season. \nVertical lines show viewership for individual episodes in relation to season average.",caption ="#TidyTuesday 2022 wk43 | Data: {bakeoff} package | Viz: Amit Lad" ) +annotate("text", x =75, y =7, size =3, label ="Sue, Mel and Mary left GBBO after S7 \nand the show moved from BBC to C4.")```### Showcase of other people's visualisations work#### @cararthompson[![\@cararthompson](tt_GBBO_01.jpeg)](https://twitter.com/cararthompson/status/1585577814249439232)#### @zakvarty[![\@zakvarty](tt_GBBO_02.jpeg)](https://twitter.com/zakvarty/status/1584901965661802496)#### @PauBaudry[![\@PauBaudry](tt_GBBO_03.jpeg)](https://twitter.com/PauBaudry/status/1585932001500356608?s=20&t=wntTXemEbMRq-G3yTqLiFw)#### @danoehm[![\@danoehm](tt_GBBO_04.jpeg)](https://twitter.com/danoehm/status/1585599863462703104)#### @andiyudha[![\@andiyudha](tt_GBBO_05.jpeg)](https://twitter.com/andiyudha/status/1586034948011876352)#### @JalkanenTero[![\@JalkanenTero](tt_GBBO_06.jpeg)](https://twitter.com/JalkanenTero/status/1585151265486802946)