Section Goals


Introduction to Analysis of Variance - ANOVA

In environmental science there are many times when we are working with categorical predictor variables and continuous / numerical response variables. For example, let’s consider a fertilizer addition experiment, where the growth of some plant species is measured under a fertilizer versus no-fertilizer treatment. Here, the predictor variable (fertilizer addition or not) is a categorical variable.

If our categorical predictor variable only has 2 levels (e.g., fertilizer vs. no fertilizer), then we may be able to examine differences between treatments using a \(t\) test. However, if we are dealing with a categorical variable with 3 or more levels, then we need to move to ANOVA.

Fixed versus Random Effects

Our categorical predictor variables can represent two different kinds of factors in our model, fixed vs random factors. In this course, we will only deal with fixed factors, but you should be aware of the differences between the two.

Fixed factors are ones we have manipulated, or that we expect a priori to have an effect on the response. From Logan, p 254 - “conclusions about the effects of a fixed factor are restricted to the specific treatment levels investigated …”

Random factors are, as per Logan, p. 254, “randomly chosen from all the possible levels of populations and are used as random representatives of the populations”. E.g., density of plants could be random. Or position of camera traps on a hillside.

Null hypotheses

Our \(H_0\)s are different, depending on whether we are working with fixed versus random factors.

Fixed effects model:

\[ H_0: \mu_1 = \mu_2 = ... = \mu_i = \mu \]

Random effects model:

\[ H_0: \sigma^2_1 = \sigma^2_2 = ... = \sigma^2_i = \sigma^2 \]

One-way ANOVA

We will be using the a data set on aldrin concentration values at three different depths in the Wolf River. This case study is further described in the ANOVA lecture.

aldrin <- read.csv("https://www.dropbox.com/s/sfstj4p8beajir5/aldrin.csv?dl=1")
summary(aldrin)
##      aldrin         depth          
##  Min.   :3.100   Length:30         
##  1st Qu.:4.300   Class :character  
##  Median :4.900   Mode  :character  
##  Mean   :5.097                     
##  3rd Qu.:5.625                     
##  Max.   :8.800

Let’s make a box plot of these data

library(ggplot2)

ggplot(data = aldrin, aes(x = depth, y = aldrin)) +
  geom_boxplot()

Just looking at the box plot, it does appear as though there are differences in the levels of aldrin at the different depths.


Challenge - Compare concentrations of aldrin at different depths with t test

There are three possible pair-wise comparisons we can do here:

  • bottom - middle
  • bottom - surface
  • middle - surface

Demonstration of ANOVA in R

Carry out an ANOVA

ANOVA is a kind of linear model (we’ll talk more about this soon), so we use the lm function in R first.

aldrin_lm <- lm(data = aldrin, aldrin~depth)

Next, we use a special summary command called anova to show the results of our linear model.

anova(aldrin_lm)
## Analysis of Variance Table
## 
## Response: aldrin
##           Df Sum Sq Mean Sq F value   Pr(>F)   
## depth      2 16.961  8.4803  6.1338 0.006367 **
## Residuals 27 37.329  1.3826                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

You could do something similar with the aov function.

aldrin_aov <- aov(data = aldrin, aldrin~depth)
summary(aldrin_aov)
##             Df Sum Sq Mean Sq F value  Pr(>F)   
## depth        2  16.96   8.480   6.134 0.00637 **
## Residuals   27  37.33   1.383                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Testing model assumptions

We can use a set of data visualization tools to test for some of the assumptions of ANOVA (i.e., normality of residuals, independences of observations, equal variances).

plot(aldrin_lm)

In the lecture, I mention the possibility of doing multiple comparisons to test for which groups are different from each other. In practice, we would normally use a method like Tukey’s Honset Significant Differences test, which I implement below. Note that I use the aov command here, instead of the anova(lm()) syntax.

TukeyHSD(aov(data = aldrin, aldrin~depth))
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = aldrin ~ depth, data = aldrin)
## 
## $depth
##                   diff       lwr       upr     p adj
## middepth-bottom  -0.99 -2.293785  0.313785 0.1630654
## surface-bottom   -1.84 -3.143785 -0.536215 0.0045163
## surface-middepth -0.85 -2.153785  0.453785 0.2561389

Non-parametric ANOVA

As with many of the other methods we go over in this course, there are also non-parametric ANOVA approaches. The most common is the Kruskal-Wallace test, which uses rank based statistics. We wont be covering it here, but you can find information on this test by looking at the help file for the kruskal.test function or checking out these websites as a starting point - http://www.sthda.com/english/wiki/kruskal-wallis-test-in-r or https://rcompanion.org/rcompanion/d_06.html.

Multi-way (Factorial) ANOVA

What if we want to look at two, or more, factors?

See visuals as per Logan, Fig 12.1 and 12.2.

Logan - Figure 12.1
Logan - Figure 12.1
Logan - Figure 12.2
Logan - Figure 12.2

For formula, see Logan p. 314 and for calculation of F-ratios see Table 12.1.

Two-way (two-factor) fixed effects model (Model I)

From Logan (Example 12A):

Quinn (1988) manipulated the density of adults limpets within enclosures (8, 15, 30 and 45 individuals per enclosure) during two seasons (winter-spring and summer-autumn) so as to investigate the effects of adult density and season on egg mass production by intertidal limpets. Three replicate enclosures per density/season combination were used, and both density and season were considered fixed factors (from Box 9.4 of Quinn and Keough (2002)).

limpets <- read.csv(file = "https://mlammens.github.io/ENS-623-Research-Stats/data/Logan_Examples/Chapter12/Data/quinn.csv")

head(limpets)
##   DENSITY SEASON  EGGS
## 1       8 spring 2.875
## 2       8 spring 2.625
## 3       8 spring 1.750
## 4       8 summer 2.125
## 5       8 summer 1.500
## 6       8 summer 1.875
summary(limpets)
##     DENSITY         SEASON               EGGS       
##  Min.   : 8.00   Length:24          Min.   :0.3560  
##  1st Qu.:13.25   Class :character   1st Qu.:0.9165  
##  Median :22.50   Mode  :character   Median :1.4330  
##  Mean   :24.50                      Mean   :1.4717  
##  3rd Qu.:33.75                      3rd Qu.:1.9227  
##  Max.   :45.00                      Max.   :2.8750
limpets$DENSITY <- factor(limpets$DENSITY)

Let’s get some visuals.

ggplot(data = limpets) +
  geom_boxplot(aes(x = DENSITY, y = EGGS)) +
  theme_bw()

ggplot(data = limpets) +
  geom_boxplot(aes(x = SEASON, y = EGGS)) +
  theme_bw()

ggplot(data = limpets) +
  geom_boxplot(aes(x = DENSITY, y = EGGS, fill = SEASON)) +
  theme_bw()

And let’s run the model.

limpets_lm <- lm(data = limpets, formula = EGGS ~ SEASON * DENSITY)
summary(limpets_lm)
## 
## Call:
## lm(formula = EGGS ~ SEASON * DENSITY, data = limpets)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.6667 -0.2612 -0.0610  0.2292  0.6647 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)             2.41667    0.24642   9.807  3.6e-08 ***
## SEASONsummer           -0.58333    0.34849  -1.674  0.11358    
## DENSITY15              -0.23933    0.34849  -0.687  0.50206    
## DENSITY30              -0.85133    0.34849  -2.443  0.02655 *  
## DENSITY45              -1.21700    0.34849  -3.492  0.00301 ** 
## SEASONsummer:DENSITY15 -0.41633    0.49284  -0.845  0.41069    
## SEASONsummer:DENSITY30 -0.17067    0.49284  -0.346  0.73363    
## SEASONsummer:DENSITY45 -0.02367    0.49284  -0.048  0.96229    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4268 on 16 degrees of freedom
## Multiple R-squared:  0.749,  Adjusted R-squared:  0.6392 
## F-statistic: 6.822 on 7 and 16 DF,  p-value: 0.000745
anova(limpets_lm)
## Analysis of Variance Table
## 
## Response: EGGS
##                Df Sum Sq Mean Sq F value    Pr(>F)    
## SEASON          1 3.2502  3.2502 17.8419 0.0006453 ***
## DENSITY         3 5.2841  1.7614  9.6691 0.0007041 ***
## SEASON:DENSITY  3 0.1647  0.0549  0.3014 0.8239545    
## Residuals      16 2.9146  0.1822                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Plot the effects of Season and Density using the allEffects function in the effects package. You may need to use install.packages to install this package.

library(effects)
## Loading required package: carData
## lattice theme set by effectsTheme()
## See ?effectsTheme for details.
plot(allEffects(limpets_lm))

Two-way (two-factor) fixed effects model with interactions

From Logan (Example 12B):

In a similar experiment to that illustrated in Example 12A, Quinn (1988) also manipulated the density of larger adults limpets further down the shoreline within enclosures (6, 12 and 24 individuals per enclosure) during the two seasons (winter-spring and summer-autumn) so as to investigate their effects on egg mass production. Again, three replicate enclosures per density/season combination were used, and both density and season were considered fixed factors (from Box 9.4 of Quinn and Keough (2002)).

limpets2 <- read.csv(file = "https://mlammens.github.io/ENS-623-Research-Stats/data/Logan_Examples/Chapter12/Data/quinn1.csv")

head(limpets2)
##   DENSITY SEASON  EGGS
## 1       6 spring 1.167
## 2       6 spring 0.500
## 3       6 spring 1.667
## 4       6 summer 4.000
## 5       6 summer 3.830
## 6       6 summer 3.830
summary(limpets2)
##     DENSITY      SEASON               EGGS       
##  Min.   : 6   Length:18          Min.   :0.5000  
##  1st Qu.: 6   Class :character   1st Qu.:0.8748  
##  Median :12   Mode  :character   Median :1.6485  
##  Mean   :14                      Mean   :1.9484  
##  3rd Qu.:24                      3rd Qu.:2.7075  
##  Max.   :24                      Max.   :4.0000
limpets2$DENSITY <- factor(limpets2$DENSITY)
summary(limpets2)
##  DENSITY    SEASON               EGGS       
##  6 :6    Length:18          Min.   :0.5000  
##  12:6    Class :character   1st Qu.:0.8748  
##  24:6    Mode  :character   Median :1.6485  
##                             Mean   :1.9484  
##                             3rd Qu.:2.7075  
##                             Max.   :4.0000
ggplot(data = limpets2) +
  geom_boxplot(aes(x = DENSITY, y = EGGS)) +
  theme_bw()

ggplot(data = limpets2) +
  geom_boxplot(aes(x = SEASON, y = EGGS)) +
  theme_bw()

ggplot(data = limpets2) +
  geom_boxplot(aes(x = DENSITY, y = EGGS, fill = SEASON)) +
  theme_bw()

And let’s run the model.

limpets2_lm <- lm(data = limpets2, formula = EGGS ~ SEASON * DENSITY)
summary(limpets2_lm)
## 
## Call:
## lm(formula = EGGS ~ SEASON * DENSITY, data = limpets2)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.61133 -0.16167 -0.04217  0.09892  0.55567 
## 
## Coefficients:
##                          Estimate Std. Error t value Pr(>|t|)    
## (Intercept)             1.1113333  0.2183934   5.089 0.000267 ***
## SEASONsummer            2.7753333  0.3088549   8.986 1.12e-06 ***
## DENSITY12              -0.0003333  0.3088549  -0.001 0.999157    
## DENSITY24              -0.4166667  0.3088549  -1.349 0.202221    
## SEASONsummer:DENSITY12 -0.9996667  0.4367868  -2.289 0.041029 *  
## SEASONsummer:DENSITY24 -1.4700000  0.4367868  -3.365 0.005617 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3783 on 12 degrees of freedom
## Multiple R-squared:  0.9301, Adjusted R-squared:  0.9009 
## F-statistic: 31.93 on 5 and 12 DF,  p-value: 1.561e-06
anova(limpets2_lm)
## Analysis of Variance Table
## 
## Response: EGGS
##                Df  Sum Sq Mean Sq F value    Pr(>F)    
## SEASON          1 17.1483 17.1483 119.845 1.336e-07 ***
## DENSITY         2  4.0019  2.0010  13.984 0.0007325 ***
## SEASON:DENSITY  2  1.6907  0.8454   5.908 0.0163632 *  
## Residuals      12  1.7170  0.1431                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Plot the effects of Season and Density using the allEffects function in the effects package. You may need to use install.packages to install this package.

library(effects)

plot(allEffects(limpets2_lm))