class: center, middle, inverse, title-slide .title[ # Advanced statistics in R ] .subtitle[ ## Regression - univariate and stratified ] .author[ ### ] .date[ ###
contact@appliedepi.org
] --- <style type="text/css"> .remark-slide table{ border: none } .remark-slide-table { } tr:first-child { border-top: none; } tr:last-child { border-bottom: none; } .tiny .remark-code { /*Change made here*/ font-size: 50% !important; } .teenytiny .remark-code { /*Change made here*/ font-size: 25% !important; } .giant .remark-code { /*Change made here*/ font-size: 150% !important; } .smaller .remark-code { /*Change made here*/ font-size: 75% !important; } .huge .remark-code { /*Change made here*/ font-size: 250% !important; } </style> # Regression .pull-left[ Regression analysis is one of the most useful tools in our toolbox. It allows us to establish statistical relationships between an outcome and an exposure, or exposures, *in our dataset*. And this allows us to do a number of different things, such as... ] .pull-right[ <img src="../../images/regression/regression_figure.png" width="100%" /> ] --- # Regression - Testing a theory or association. <img src="../../images/regression/testing_test_tube.jpg" width="50%" /> --- # Regression - Testing a theory or association. - Predicting what could happen in a new dataset if the relationships remain the same. <img src="../../images/regression/testing_test_tube.jpg" width="50%" /> --- # Regression - Testing a theory or association. - Predicting what could happen in a new dataset if the relationships remain the same. - Controlling for confounding and effect modification. - Are the relationships between the outcome and the predictor true? - Or are they an artifact of another value we previously did not adjust for? <img src="../../images/regression/testing_test_tube.jpg" width="50%" /> --- # gtsummary There are many ways to carry out regressions in R, but here we will be using the package **gtsummary** as it allows us to quickly and efficiently analysis data and produce publication ready tables with ease. <img src="../../images/regression/gt_logo.png" width="30%" /> --- # gtsummary syntax While these have numerous potential inputs (see `?tbl_uvregression` for examples for the *univariate* regression), we are primarily concerned with only four of them --- # gtsummary syntax * `method = ` * The type of regression we want to run, set to `glm` for our purposes --- # gtsummary syntax * `method = ` * The type of regression we want to run, set to `glm` for our purposes * `y = ` * The dependent (outcome) exposure we want to estimate --- # gtsummary syntax * `method = ` * The type of regression we want to run, set to `glm` for our purposes * `y = ` * The dependent (outcome) exposure we want to estimate * `method.args = ` * The type of glm we want to run, for a logistic regression it would be `method.args = list(family = binomial)` --- # gtsummary syntax * `method = ` * The type of regression we want to run, set to `glm` for our purposes * `y = ` * The dependent (outcome) exposure we want to estimate * `method.args = ` * The type of glm we want to run, for a logistic regression it would be `method.args = list(family = binomial)` * `exponentiate = ` * Whether or not we want to exponentiate the result to produce odds ratios rather than log odds (only useful for logistic regression) --- # gtsummary syntax .pull-left[ ``` r linelist %>% ``` ] --- # gtsummary syntax .pull-left[ ``` r linelist %>% select(age, gender, temp, cough, outcome_death) ``` ] --- # gtsummary syntax .pull-left[ ``` r linelist %>% select(age, gender, temp, cough, outcome_death) %>% drop_na() ``` ] --- # gtsummary syntax .pull-left[ ``` r linelist %>% select(age, gender, temp, cough, outcome_death) %>% drop_na() %>% tbl_uvregression( method = , y = , method.args = , exponentiate = ) ``` ] --- # Univariate regression .pull-left[ ``` r linelist %>% select(age, gender, temp, cough, outcome_death) %>% drop_na() %>% tbl_uvregression( method = glm, y = outcome_death, method.args = list(binomial), exponentiate = TRUE ) ``` ] .pull-right[
Characteristic
N
OR
1
95% CI
1
p-value
age
291
0.99
0.97, 1.01
0.3
gender
291
female
—
—
male
0.81
0.50, 1.29
0.4
temp
291
1.08
0.83, 1.39
0.6
cough
291
no
—
—
yes
1.92
1.01, 3.68
0.046
1
OR = Odds Ratio, CI = Confidence Interval
] --- # Stratified regression Here we define stratified regression as the process of carrying out separate regression analyses on **different "groups" of data**. We do this because we think there may be plausible reasons why there might be **different relationships** for **different groups between** between the dependent and independent exposures. --- # Groups we might want to stratify by Can you think of any groups you might want to separate in your analysis? --- # Groups we might want to stratify by <font size="46">Age</font> --- # Groups we might want to stratify by Age <font size="46">Sex</font> --- # Groups we might want to stratify by Age Sex <font size="46">Race/ethnicity</font> --- # Groups we might want to stratify by Age Sex Race/ethnicity <font size="46">Geographic area</font> --- # Stratified regression * `filter()` our dataset to the group we want - `gender == male` and `gender == female` * We then _remove_ gender after we filter - We are subsetting the data so each regression only has the `gender` data of the subset --- # Stratified regression .small[ ``` r male_regression <- linelist %>% select(age, gender, temp, cough, outcome_death) ``` ] --- # Stratified regression .small[ ``` r male_regression <- linelist %>% select(age, gender, temp, cough, outcome_death) %>% filter(gender == "male") %>% ``` ] --- # Stratified regression .small[ ``` r male_regression <- linelist %>% select(age, gender, temp, cough, outcome_death) %>% filter(gender == "male") %>% select(-gender) %>% ``` ] --- # Stratified regression .small[ ``` r male_regression <- linelist %>% select(age, gender, temp, cough, outcome_death) %>% filter(gender == "male") %>% select(-gender) %>% drop_na() %>% ``` ] --- # Stratified regression .small[ ``` r male_regression <- linelist %>% select(age, gender, temp, cough, outcome_death) %>% filter(gender == "male") %>% select(-gender) %>% drop_na() %>% tbl_uvregression( method = glm, y = outcome_death, method.args = list(family = binomial), exponentiate = TRUE ) ``` ] --- # Stratified regression .small[ ``` r female_regression <- linelist %>% select(age, gender, temp, cough, outcome_death) %>% filter(gender == "female") %>% select(-gender) %>% drop_na() %>% tbl_uvregression( method = glm, y = outcome_death, method.args = list(family = binomial), exponentiate = TRUE ) ``` ] --- # Comparing outputs We can then display these tables side by side to see if there is any change in exposure assocation using `tbl_merge()` ``` r tbl_merge(tbls = list(male_regression, female_regression)) ```
Characteristic
Table 1
Table 2
N
OR
1
95% CI
1
p-value
N
OR
1
95% CI
1
p-value
age
158
1.00
0.97, 1.02
0.8
133
0.97
0.93, 1.01
0.14
cough
158
133
no
—
—
—
—
yes
2.07
0.90, 4.93
0.090
1.67
0.61, 4.61
0.3
temp
158
1.24
0.88, 1.76
0.2
133
0.89
0.58, 1.32
0.6
1
OR = Odds Ratio, CI = Confidence Interval
--- # Comparing outputs And we can customise the argument `tab_spanner = ` to add in meaningful titles. ``` r tbl_merge(tbls = list(male_regression, female_regression), tab_spanner = c("Male only", "Female only")) ```
Characteristic
Male only
Female only
N
OR
1
95% CI
1
p-value
N
OR
1
95% CI
1
p-value
age
158
1.00
0.97, 1.02
0.8
133
0.97
0.93, 1.01
0.14
cough
158
133
no
—
—
—
—
yes
2.07
0.90, 4.93
0.090
1.67
0.61, 4.61
0.3
temp
158
1.24
0.88, 1.76
0.2
133
0.89
0.58, 1.32
0.6
1
OR = Odds Ratio, CI = Confidence Interval
--- # Ready or not, we are going to give it a try! That's it for univariate and stratified regression. As you can see we only need a few commands to be able to carry out a regression analysis for our dataset! --- # Ready or not, we are going to give it a try! Any questions? **Resources** - Course website (initial setup and slides access): [https://courses.appliedepi.org/statsr/](https://courses.appliedepi.org/statsr/) - [Epi R Handbook](https://epirhandbook.com/en/) 50 chapters of best-practice code examples available online and offline - [Applied Epi Community](https://community.appliedepi.org/) A great resource for asking questions and help!