class: center, middle, inverse, title-slide <style type="text/css"> .remark-slide table{ border: none } .remark-slide-table { } tr:first-child { border-top: none; } tr:last-child { border-bottom: none; } </style> <style type="text/css"> /* THIS IS A CSS CHUNK - THIS IS A COMMENT */ /* Size of font in code echo. E.g. 10px or 50% */ .remark-code { font-size: 70%; } /* Size of font in text */ .medium-text { font-size: 75%; } /* Size of font in tables */ .small-table table { font-size: 6px; } .medium-table table { font-size: 8px; } .medium-large-table table { font-size: 10px; } </style> # Introduction to R for Applied Epidemiology ### {ggplot2} - Scales, themes, and labels contact@appliedepi.org --- # Grammar of Graphics Let's return to that sequence that builds a ggplot. -- The order of layers will usually look like this: 1) **"Open" the plot** -- 2) **"Map" data columns** -- 3) **Add (`+`) “geom” layers** -- 4) **Modify "scales"**, such as a color scale or y-axis breaks -- 5) **Add "theme" plot design elements** such as axis labels, title, caption, fonts, text sizes, background themes, or axes rotation ??? Remember that although the commands may be long, it is infinitely easier to edit and recycle than in Excel --- class: inverse, center, middle ## Scales in {ggplot2} <img src="../../images/ggplot_scales_themes/scales.png" width="50%" /> --- # Scales - overview Scale commands replace defaults for how aesthetics are displayed such as: * *Which* colors or shapes to display * *How* date or proportions are written in axis labels * The min/max and frequency of axes breaks -- <h4>Generic formula: <span style="color:darkgreen">scale</span>_<span style="color:cornflowerblue">aesthetic</span>_<span style="color:deeppink">method</span>()</h4> -- <h4>1) <span style="color:darkgreen">scale</span>_ : this prefix never changes.</h4> -- <h4>2) <span style="color:cornflowerblue">aesthetic</span>: _fill_ or _color_ or _x_ or _y_ etc.</h4> -- <h4>3) <span style="color:deeppink">method</span>: _continuous() or _discrete() or _manual() or _date() etc.</h4> -- <h4><span style="color:darkgreen">scale</span>_<span style="color:cornflowerblue">color</span>_<span style="color:deeppink">continuous</span>()</h4> <h4><span style="color:darkgreen">scale</span>_<span style="color:cornflowerblue">x</span>_<span style="color:deeppink">date</span>()</h4> <h4><span style="color:darkgreen">scale</span>_<span style="color:cornflowerblue">x</span>_<span style="color:deeppink">manual</span>()</h4> --- # Scales examples Some examples of scale commands: You want to adjust |Scale command --------------------|------------------- continuous y-axis |`scale_y_continuous()` date x-axis |`scale_x_date()` categorical x-axis |`scale_x_discrete()` fill, continuous |`scale_fill_continuous()` fill, continuous |`scale_fill_gradient()` color, manual assign |`scale_color_manual()` --- # Scales - default .pull-left[ ```r ggplot( data = surv, mapping = aes( x = district, fill = sex)) + geom_bar() ``` Above, the fill of a bar plot uses the **default colors and axis breaks**: ] .pull-right[ <img src="intro05-2_files/figure-html/unnamed-chunk-26-1.png" width="504" /> ] ??? Ignore the overlapping x-axis labels, for code simplicity we are not adjusting these in this slide. --- # Scales - adjusted fill .pull-left[ ```r ggplot( data = surv, mapping = aes( x = district, fill = sex)) + geom_bar() + *scale_fill_manual( * values = c( * "male" = "violetred", * "female" = "aquamarine")) ``` Within `scale_fill_manual()` we provide **manual** assignments within a vector `c()`. .footnote[Use *na.value = "grey"* for missing values ] ] .pull-right[ <img src="intro05-2_files/figure-html/unnamed-chunk-28-1.png" width="504" /> ] ??? Discuss the na.value= arguments in most scale commands, and the difference between having NA values in the data and having an explicit missing value such as "Unknown". --- # Scales - colorbrewer .pull-left[ ```r ggplot( data = surv, mapping = aes( x = district, fill = sex)) + geom_bar() + *scale_fill_brewer( * type = "qual", * na.value = "grey50") ``` Scales - colorbrewer .footnote[Use *na.value = "grey"* for missing values ] ] .pull-right[ <img src="intro05-2_files/figure-html/unnamed-chunk-30-1.png" width="504" /> ] ??? Briefly mention the `na.value=` arguments in most scale commands, and the difference between having NA values in the data and having an explicit missing value such as "Unknown". --- # Scales - adjusted y-axis .pull-left[ ```r ggplot( data = surv, mapping = aes( x = district, fill = sex)) + geom_bar() + scale_fill_manual( values = c( "male" = "violetred", "female" = "aquamarine")) + *scale_y_continuous( * breaks = seq(from = 0, * to = 250, * by = 10)) ``` In `scale_y_continuous()` we adjust the y-axis breaks using `seq()` to define a numeric sequence. ] .pull-right[ <img src="intro05-2_files/figure-html/unnamed-chunk-32-1.png" width="504" /> ] --- # Scales - start axes at 0 .pull-left[ ```r ggplot( data = surv, mapping = aes( x = district, fill = sex)) + geom_bar() + scale_fill_manual( values = c( "male" = "violetred", "female" = "aquamarine")) + scale_y_continuous( breaks = seq(from = 0, to = 250, by = 10), * expand = c(0, 0)) + *scale_x_discrete( * expand = c(0, 0)) ``` In `scale_x_` or `scale_y_` commands, use `expand = c(0,0)` to remove excess space around the plot. ] .pull-right[ <img src="intro05-2_files/figure-html/unnamed-chunk-34-1.png" width="504" /> ] --- # Scales - date axis labels .pull-left[ ```r ggplot( data = surv, mapping = aes(x = date_onset)) + geom_histogram() ``` The default scale for date axes will vary by the range of your data. ] .pull-right[ <img src="intro05-2_files/figure-html/unnamed-chunk-36-1.png" width="504" /> ] --- # Scales - date label breaks .pull-left[ ```r ggplot( data = surv, mapping = aes(x = date_onset)) + geom_histogram() + *scale_x_date( * date_breaks = "2 months") #2 months ``` Adjust axis labels with `scale_x_date()`. Use `date_breaks=` values like "1 week", "2 weeks", or "3 months". These adjust axis break *labels*, not histogram bins! ] .pull-right[ <img src="intro05-2_files/figure-html/unnamed-chunk-41-1.png" width="504" /> ] ??? For tips on geom_histogram() bins, see Epi R Handbook epicurves page --- # Scales - date axis labels .pull-left[ ```r ggplot( data = surv, mapping = aes(x = date_onset)) + geom_histogram() + scale_x_date( date_breaks = "2 months", * date_labels = "%d %b\n%Y") ``` Specify date label format to `date_labels=` using ["strptime" syntax](https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/strptime) `"%d %b %Y"` for DD MM YYYY. .footnote[See Epi R Handbook [Epicurves](https://epirhandbook.com/en/new_pages/epicurves.html) and [Strings](https://epirhandbook.com/en/new_pages/characters_strings.html) chapters ] ] .pull-right[ <img src="intro05-2_files/figure-html/unnamed-chunk-43-1.png" width="504" /> ] ??? /n is a newline --- # Scales - auto-efficient labels .pull-left[ ```r ggplot( data = surv, mapping = aes(x = date_onset)) + geom_histogram() + scale_x_date( * date_breaks = "2 weeks", #2 weeks * labels = label_date_short() ) ``` Or, simply assign `labels=` to `label_date_short()` from [{scales}](https://scales.r-lib.org/) The year is not repeated on each label. ] .pull-right[ <img src="intro05-2_files/figure-html/unnamed-chunk-48-1.png" width="504" /> ] --- class: inverse, center, middle ## Labels in {ggplot2} <img src="../../images/ggplot_scales_themes/labels.png" width="50%" /> --- # Plot labels .pull-left[ ```r ggplot( data = surv, mapping = aes( x = date_onset, fill = hospital)) + geom_histogram() + scale_x_date( date_breaks = "2 weeks", labels = label_date_short() )+ scale_fill_brewer(type = "qual", na.value = "grey50") ``` ] .pull-right[ <img src="intro05-2_files/figure-html/unnamed-chunk-51-1.png" width="504" /> ] --- # Plot labels .pull-left[ ```r ggplot( data = surv, mapping = aes( x = date_onset, fill = hospital)) + geom_histogram() + scale_x_date( date_breaks = "2 weeks", labels = label_date_short() )+ scale_fill_brewer(type = "qual", na.value = "grey50")+ *labs( * title = "Epidemic curve of Ebola outbreak", * subtitle = "Confirmed cases, 2014", * x = "Date", * y = "Number of cases", * caption = "Fictional Ebola data", * fill = "Hospital" ) + ``` Use `labs()` as above. Note: edit legend title via the aesthetic that created the legend (e.g. `fill=`). ] .pull-right[ <img src="intro05-2_files/figure-html/unnamed-chunk-56-1.png" width="504" /> ] --- # Dynamic labels Embed code within `str_glue("text here {CODE HERE} text here")` The code will update with the data. ```r str_glue("Data as of {Sys.Date()}") ``` ``` ## Data as of 2025-05-21 ``` ``` ## Datos a partir de 2025-05-21 ``` -- ```r str_glue("{fmt_count(surv, is.na(date_onset))} cases missing onset and not shown") ``` ``` ## 33 (5.2%) cases missing onset and not shown ``` .footnote[See the [Epi R Handbook Strings chapter](https://epirhandbook.com/en/new_pages/characters_strings.html#dynamic-strings) and the {stringr} package. ] ??? Explain that in str_glue, anything within curly brackets it will run as R code. --- class: inverse, center, middle ## Themes in {ggplot2} <img src="../../images/ggplot_scales_themes/themes.png" width="50%" /> --- # Themes Themes are **non-data** design features, for example: * Plot canvas background * Text size, color, and orientation * Position of legend -- The power of {ggplot2} - you can make extremely small adjustments ... if you want to. --- # Themes .pull-left[ ["Complete themes"](https://ggplot2.tidyverse.org/reference/ggtheme.html) are easy to add. ```r # Try one of these... theme_minimal() theme_light() theme_bw() theme_gray() theme_dark() theme_void() theme_classic() ``` Try the argument `base_size = 16` to quickly increase text size. ] .pull-right[ <img src="intro05-2_files/figure-html/unnamed-chunk-73-1.png" width="504" /> ] --- # Micro-adjustments Make micro-adjustments within `theme()`. Supply arguments for each small change you want to make, for example: Plot component | Theme argument -----------------------------------------------------------------------------------------------|--------------- legend position |`legend.position = ` legend direction (horizontal or vertical) | `legend.direction = ` -- These can be *very very* specific: Plot component | Theme argument -----------------------------------------------------------------------------------------------|--------------- Length of axis ticks on the X-axis |`axis.ticks.length.x = ` There are literally hundreds of options. ??? Think of all the minute adjustments that you make when creating a plot in Excel. How many times do you have to re-do all those steps? All of those steps can be coded, which makes the plot adjustable. --- # Themes .pull-left[ Make micro-adjustments within `theme()`. ```r ggplot( data = surv, mapping = aes( x = age_years, y = ht_cm, color = sex), alpha = 0.3) + geom_point() + labs( title = "Height by age", x = "Age (years)", y = "Height (cm)", color = "sex")+ theme_minimal(base_size = 16) + *theme( * legend.position = "bottom", * plot.title = element_text( * color = "red", * face = "bold"), * axis.title.y = element_text(angle = 90)) ``` ] .pull-right[ The syntax takes practice - see [this list](https://ggplot2.tidyverse.org/reference/theme.html) of feature-specific arguments. <img src="intro05-2_files/figure-html/unnamed-chunk-78-1.png" width="504" /> ] ??? Talk about these theme() arguments and how they consist of two parts, just like `mapping = aes()`. Explain that nobody has these all memorized, but the common ones are easy to remember once you use them enough. Remember to add them AFTER any complete themes. --- # Themes - `element_text()` If changing *text*, the micro-adjustments often need to be made *within* `element_text()` ```r theme( legend.position = "bottom", * plot.title = element_text( * color = "red", * face = "bold"), * axis.title.y = element_text(angle = 90)) ``` This is a similar syntax to `mapping = aes()` ??? Talk about these theme() arguments and how they consist of two parts, just like `mapping = aes()`. Explain that nobody has these all memorized, but the common ones are easy to remember once you use them enough. --- class: inverse, center, middle ## Exercise! Go to the course website Open the exercise for Module 5 part 2, and login Follow the instructions to continue coding in your "ebola" R project Let an instructor know if you are unsure what to do <img src="../../images/breakout/teamwork3.png" width="50%" />