Linear models: one-way designs

Topics in Scientific and Statistical Computing

Robin Elahi

Outline

  • Intro to linear models (board)

  • Visualizing ANOVA (board)

  • R examples (slides)

Examples from QK2E chapter 6

  • Box 6.1 (regression; continuous predictor)
    • Discussion
  • Box 6.7 (ANOVA; categorical predictor)
    • Discussion

Box 6.1

Christensen et al. (1996) studied the relationships between coarse woody debris (CWD) and shoreline vegetation and lake development in a sample of 16 lakes in North America. The main variables of interest:

  • density of cabins (no. km−1)

  • density of riparian trees (trees km−1)

  • basal area of riparian trees (m2 km−1)

  • density of coarse woody debris (no. km−1)

  • basal area of coarse woody debris (m2 km−1)

Visualize data

Linear model

m1 <- lm(cwdbasal ~ ripdens, data = d)
summary(m1)

Call:
lm(formula = cwdbasal ~ ripdens, data = d)

Residuals:
   Min     1Q Median     3Q    Max 
-38.62 -22.41 -13.33  26.16  61.35 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) -77.09908   30.60801  -2.519 0.024552 *  
ripdens       0.11552    0.02343   4.930 0.000222 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 36.32 on 14 degrees of freedom
Multiple R-squared:  0.6345,    Adjusted R-squared:  0.6084 
F-statistic:  24.3 on 1 and 14 DF,  p-value: 0.0002216

Plot predicted values vs residuals

Discussion

  • Thoughts on the reading (6.1; continuous predictor)

  • This material is foundational. You’ve probably done a linear regression before. If you were teaching an undergrad, what is a key point you would emphasize?

Box 6.7

Keough and Raimondi (1995) set up an experiment to examine the response of serpulid (polychaete worms) larvae to four types of biofilms on hard substrata in shallow marine waters. The four treatments were:

  • F: field substrata (with a net, to exclude other invertebrates)
  • NL: netted substrata developed in the lab
  • UL: un-netted substrata developed in the lab
  • SL: sterile substrata in the lab, without a net

Visualize data

Linear model

m1 <- lm(lserp ~ film, data = d)
summary(m1)

Call:
lm(formula = lserp ~ film, data = d)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.22929 -0.06500  0.01843  0.07054  0.19557 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  2.11343    0.04411  47.912  < 2e-16 ***
filmNL       0.06843    0.06238   1.097  0.28356    
filmSL      -0.17943    0.06238  -2.876  0.00831 ** 
filmUL       0.01886    0.06238   0.302  0.76504    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.1167 on 24 degrees of freedom
Multiple R-squared:  0.4292,    Adjusted R-squared:  0.3578 
F-statistic: 6.015 on 3 and 24 DF,  p-value: 0.003314

Set your reference level

d <- d %>% mutate(film_ = fct_relevel(film, "SL", "UL", "NL", "F"))
glimpse(d$film); glimpse(d$film_)
 Factor w/ 4 levels "F","NL","SL",..: 3 3 3 3 3 3 3 4 4 4 ...
 Factor w/ 4 levels "SL","UL","NL",..: 1 1 1 1 1 1 1 2 2 2 ...

Linear model, take 2

m2 <- lm(lserp ~ film_, data = d)
summary(m2)

Call:
lm(formula = lserp ~ film_, data = d)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.22929 -0.06500  0.01843  0.07054  0.19557 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  1.93400    0.04411  43.844  < 2e-16 ***
film_UL      0.19829    0.06238   3.179 0.004045 ** 
film_NL      0.24786    0.06238   3.973 0.000564 ***
film_F       0.17943    0.06238   2.876 0.008310 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.1167 on 24 degrees of freedom
Multiple R-squared:  0.4292,    Adjusted R-squared:  0.3578 
F-statistic: 6.015 on 3 and 24 DF,  p-value: 0.003314

Null model

m0 <- lm(lserp ~ 1, data = d)
summary(m0)

Call:
lm(formula = lserp ~ 1, data = d)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.31239 -0.11064  0.03261  0.10961  0.21861 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  2.09039    0.02752   75.95   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.1456 on 27 degrees of freedom

Comparing linear models

anova(m0, m1)
Analysis of Variance Table

Model 1: lserp ~ 1
Model 2: lserp ~ film
  Res.Df     RSS Df Sum of Sq      F   Pr(>F)   
1     27 0.57265                                
2     24 0.32688  3   0.24577 6.0149 0.003314 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(m1)
Analysis of Variance Table

Response: lserp
          Df  Sum Sq  Mean Sq F value   Pr(>F)   
film       3 0.24577 0.081924  6.0149 0.003314 **
Residuals 24 0.32688 0.013620                    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Planned contrasts

Define a linear combination of the p means that represent the comparison of interest

\[ c_1 \bar{y}_1 + ... + c_i \bar{y}_i + c_p \bar{y}_p \]

where \(c_i\) represent contrast coefficients and sum to zero, and \(\bar{y}_i\) are group means.

To omit a group, give it a coefficient of zero.

Very cool: you can do contrasts for polynomial trends with group means! (see Box 6.4)

UL vs NL

# Default
contrasts(d$film)
   NL SL UL
F   0  0  0
NL  1  0  0
SL  0  1  0
UL  0  0  1
# Change it
contrasts(d$film) <- c(0, 1, 0, -1)
contrasts(d$film)
   [,1]       [,2]       [,3]
F     0 -0.5000000 -0.7071068
NL    1 -0.1666667  0.4714045
SL    0  0.8333333 -0.2357023
UL   -1 -0.1666667  0.4714045

UL vs NL

# Refit the model with new contrasts
serpulid.aov <- aov(lserp ~ film, data = d)
summary(serpulid.aov)
            Df Sum Sq Mean Sq F value  Pr(>F)   
film         3 0.2458 0.08192   6.015 0.00331 **
Residuals   24 0.3269 0.01362                   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Note that the aov results have not changed.

UL vs NL

tidy(summary.lm(serpulid.aov))
# A tibble: 4 × 5
  term        estimate std.error statistic  p.value
  <chr>          <dbl>     <dbl>     <dbl>    <dbl>
1 (Intercept)   2.09      0.0221    94.8   2.07e-32
2 film1         0.0248    0.0312     0.795 4.35e- 1
3 film2        -0.164     0.0441    -3.72  1.07e- 3
4 film3         0.0834    0.0441     1.89  7.07e- 2

To evaluate the contrast, look at the statistic and p-value for film1

F vs average (NL & UL)

contrasts(d$film) <- c(2,-1,0,-1)
contrasts(d$film)
   [,1]       [,2]        [,3]
F     2 -0.2867757  0.03306064
NL   -1 -0.3677574 -0.66939360
SL    0  0.8603272 -0.09918192
UL   -1 -0.2057940  0.73551488
serpulid.aov <- aov(lserp ~ film, data = d)
tidy(summary.lm(serpulid.aov))
# A tibble: 4 × 5
  term        estimate std.error statistic  p.value
  <chr>          <dbl>     <dbl>     <dbl>    <dbl>
1 (Intercept)   2.09      0.0221    94.8   2.07e-32
2 film1        -0.0145    0.0180    -0.808 4.27e- 1
3 film2        -0.183     0.0441    -4.16  3.53e- 4
4 film3        -0.0141    0.0441    -0.321 7.51e- 1

SL vs average (F & NL & UL)

contrasts(d$film) <- c(-1,-1,3,-1)
contrasts(d$film)
   [,1]        [,2]       [,3]
F    -1 -0.67052910 -0.4658942
NL   -1  0.73874075 -0.3477481
SL    3  0.00000000  0.0000000
UL   -1 -0.06821164  0.8136423
serpulid.aov <- aov(lserp ~ film, data = d)
tidy(summary.lm(serpulid.aov))
# A tibble: 4 × 5
  term        estimate std.error statistic  p.value
  <chr>          <dbl>     <dbl>     <dbl>    <dbl>
1 (Intercept)  2.09       0.0221    94.8   2.07e-32
2 film1       -0.0521     0.0127    -4.09  4.15e- 4
3 film2        0.0493     0.0441     1.12  2.75e- 1
4 film3       -0.00845    0.0441    -0.192 8.50e- 1

Discussion

  • Thoughts on the reading (6.2; categorical predictor)

  • This material is foundational. You’ve probably done a one-way ANOVA before. If you were teaching an undergrad, what is a key point you would emphasize?