t.test
function to a data setlm
, anova
, and
aov
functions to carry out an ANOVAIn environmental science there are many times when we are working with categorical predictor variables and continuous / numerical response variables. For example, let’s consider a fertilizer addition experiment, where the growth of some plant species is measured under a fertilizer versus no-fertilizer treatment. Here, the predictor variable (fertilizer addition or not) is a categorical variable.
If our categorical predictor variable only has 2 levels (e.g., fertilizer vs. no fertilizer), then we may be able to examine differences between treatments using a \(t\) test. However, if we are dealing with a categorical variable with 3 or more levels, then we need to move to ANOVA.
Our categorical predictor variables can represent two different kinds of factors in our model, fixed vs random factors. In this course, we will only deal with fixed factors, but you should be aware of the differences between the two.
Fixed factors are ones we have manipulated, or that we expect a priori to have an effect on the response. From Logan, p 254 - “conclusions about the effects of a fixed factor are restricted to the specific treatment levels investigated …”
Random factors are, as per Logan, p. 254, “randomly chosen from all the possible levels of populations and are used as random representatives of the populations”. E.g., density of plants could be random. Or position of camera traps on a hillside.
Our \(H_0\)s are different, depending on whether we are working with fixed versus random factors.
Fixed effects model:
\[ H_0: \mu_1 = \mu_2 = ... = \mu_i = \mu \]
Random effects model:
\[ H_0: \sigma^2_1 = \sigma^2_2 = ... = \sigma^2_i = \sigma^2 \]
We will be using the a data set on aldrin concentration values at three different depths in the Wolf River. This case study is further described in the ANOVA lecture.
aldrin <- read.csv("https://www.dropbox.com/s/sfstj4p8beajir5/aldrin.csv?dl=1")
summary(aldrin)
## aldrin depth
## Min. :3.100 Length:30
## 1st Qu.:4.300 Class :character
## Median :4.900 Mode :character
## Mean :5.097
## 3rd Qu.:5.625
## Max. :8.800
Let’s make a box plot of these data
library(ggplot2)
ggplot(data = aldrin, aes(x = depth, y = aldrin)) +
geom_boxplot()
Just looking at the box plot, it does appear as though there are differences in the levels of aldrin at the different depths.
There are three possible pair-wise comparisons we can do here:
ANOVA is a kind of linear model (we’ll talk more about this soon), so
we use the lm
function in R first.
aldrin_lm <- lm(data = aldrin, aldrin~depth)
Next, we use a special summary command called anova
to
show the results of our linear model.
anova(aldrin_lm)
## Analysis of Variance Table
##
## Response: aldrin
## Df Sum Sq Mean Sq F value Pr(>F)
## depth 2 16.961 8.4803 6.1338 0.006367 **
## Residuals 27 37.329 1.3826
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
You could do something similar with the aov
function.
aldrin_aov <- aov(data = aldrin, aldrin~depth)
summary(aldrin_aov)
## Df Sum Sq Mean Sq F value Pr(>F)
## depth 2 16.96 8.480 6.134 0.00637 **
## Residuals 27 37.33 1.383
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We can use a set of data visualization tools to test for some of the assumptions of ANOVA (i.e., normality of residuals, independences of observations, equal variances).
plot(aldrin_lm)
In the lecture, I mention the possibility of doing multiple
comparisons to test for which groups are different from each other. In
practice, we would normally use a method like Tukey’s Honset Significant
Differences test, which I implement below. Note that I use the
aov
command here, instead of the anova(lm())
syntax.
TukeyHSD(aov(data = aldrin, aldrin~depth))
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = aldrin ~ depth, data = aldrin)
##
## $depth
## diff lwr upr p adj
## middepth-bottom -0.99 -2.293785 0.313785 0.1630654
## surface-bottom -1.84 -3.143785 -0.536215 0.0045163
## surface-middepth -0.85 -2.153785 0.453785 0.2561389
As with many of the other methods we go over in this course, there
are also non-parametric ANOVA approaches. The most common is the
Kruskal-Wallace test, which uses rank based statistics.
We wont be covering it here, but you can find information on this test
by looking at the help file for the kruskal.test
function
or checking out these websites as a starting point - http://www.sthda.com/english/wiki/kruskal-wallis-test-in-r
or https://rcompanion.org/rcompanion/d_06.html.
What if we want to look at two, or more, factors?
See visuals as per Logan, Fig 12.1 and 12.2.
For formula, see Logan p. 314 and for calculation of F-ratios see Table 12.1.
From Logan (Example 12A):
Quinn (1988) manipulated the density of adults limpets within enclosures (8, 15, 30 and 45 individuals per enclosure) during two seasons (winter-spring and summer-autumn) so as to investigate the effects of adult density and season on egg mass production by intertidal limpets. Three replicate enclosures per density/season combination were used, and both density and season were considered fixed factors (from Box 9.4 of Quinn and Keough (2002)).
limpets <- read.csv(file = "https://mlammens.github.io/ENS-623-Research-Stats/data/Logan_Examples/Chapter12/Data/quinn.csv")
head(limpets)
## DENSITY SEASON EGGS
## 1 8 spring 2.875
## 2 8 spring 2.625
## 3 8 spring 1.750
## 4 8 summer 2.125
## 5 8 summer 1.500
## 6 8 summer 1.875
summary(limpets)
## DENSITY SEASON EGGS
## Min. : 8.00 Length:24 Min. :0.3560
## 1st Qu.:13.25 Class :character 1st Qu.:0.9165
## Median :22.50 Mode :character Median :1.4330
## Mean :24.50 Mean :1.4717
## 3rd Qu.:33.75 3rd Qu.:1.9227
## Max. :45.00 Max. :2.8750
limpets$DENSITY <- factor(limpets$DENSITY)
Let’s get some visuals.
ggplot(data = limpets) +
geom_boxplot(aes(x = DENSITY, y = EGGS)) +
theme_bw()
ggplot(data = limpets) +
geom_boxplot(aes(x = SEASON, y = EGGS)) +
theme_bw()
ggplot(data = limpets) +
geom_boxplot(aes(x = DENSITY, y = EGGS, fill = SEASON)) +
theme_bw()
And let’s run the model.
limpets_lm <- lm(data = limpets, formula = EGGS ~ SEASON * DENSITY)
summary(limpets_lm)
##
## Call:
## lm(formula = EGGS ~ SEASON * DENSITY, data = limpets)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.6667 -0.2612 -0.0610 0.2292 0.6647
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.41667 0.24642 9.807 3.6e-08 ***
## SEASONsummer -0.58333 0.34849 -1.674 0.11358
## DENSITY15 -0.23933 0.34849 -0.687 0.50206
## DENSITY30 -0.85133 0.34849 -2.443 0.02655 *
## DENSITY45 -1.21700 0.34849 -3.492 0.00301 **
## SEASONsummer:DENSITY15 -0.41633 0.49284 -0.845 0.41069
## SEASONsummer:DENSITY30 -0.17067 0.49284 -0.346 0.73363
## SEASONsummer:DENSITY45 -0.02367 0.49284 -0.048 0.96229
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4268 on 16 degrees of freedom
## Multiple R-squared: 0.749, Adjusted R-squared: 0.6392
## F-statistic: 6.822 on 7 and 16 DF, p-value: 0.000745
anova(limpets_lm)
## Analysis of Variance Table
##
## Response: EGGS
## Df Sum Sq Mean Sq F value Pr(>F)
## SEASON 1 3.2502 3.2502 17.8419 0.0006453 ***
## DENSITY 3 5.2841 1.7614 9.6691 0.0007041 ***
## SEASON:DENSITY 3 0.1647 0.0549 0.3014 0.8239545
## Residuals 16 2.9146 0.1822
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Plot the effects of Season and Density using the
allEffects
function in the effects
package.
You may need to use install.packages
to install this
package.
library(effects)
## Loading required package: carData
## lattice theme set by effectsTheme()
## See ?effectsTheme for details.
plot(allEffects(limpets_lm))
From Logan (Example 12B):
In a similar experiment to that illustrated in Example 12A, Quinn (1988) also manipulated the density of larger adults limpets further down the shoreline within enclosures (6, 12 and 24 individuals per enclosure) during the two seasons (winter-spring and summer-autumn) so as to investigate their effects on egg mass production. Again, three replicate enclosures per density/season combination were used, and both density and season were considered fixed factors (from Box 9.4 of Quinn and Keough (2002)).
limpets2 <- read.csv(file = "https://mlammens.github.io/ENS-623-Research-Stats/data/Logan_Examples/Chapter12/Data/quinn1.csv")
head(limpets2)
## DENSITY SEASON EGGS
## 1 6 spring 1.167
## 2 6 spring 0.500
## 3 6 spring 1.667
## 4 6 summer 4.000
## 5 6 summer 3.830
## 6 6 summer 3.830
summary(limpets2)
## DENSITY SEASON EGGS
## Min. : 6 Length:18 Min. :0.5000
## 1st Qu.: 6 Class :character 1st Qu.:0.8748
## Median :12 Mode :character Median :1.6485
## Mean :14 Mean :1.9484
## 3rd Qu.:24 3rd Qu.:2.7075
## Max. :24 Max. :4.0000
limpets2$DENSITY <- factor(limpets2$DENSITY)
summary(limpets2)
## DENSITY SEASON EGGS
## 6 :6 Length:18 Min. :0.5000
## 12:6 Class :character 1st Qu.:0.8748
## 24:6 Mode :character Median :1.6485
## Mean :1.9484
## 3rd Qu.:2.7075
## Max. :4.0000
ggplot(data = limpets2) +
geom_boxplot(aes(x = DENSITY, y = EGGS)) +
theme_bw()
ggplot(data = limpets2) +
geom_boxplot(aes(x = SEASON, y = EGGS)) +
theme_bw()
ggplot(data = limpets2) +
geom_boxplot(aes(x = DENSITY, y = EGGS, fill = SEASON)) +
theme_bw()
And let’s run the model.
limpets2_lm <- lm(data = limpets2, formula = EGGS ~ SEASON * DENSITY)
summary(limpets2_lm)
##
## Call:
## lm(formula = EGGS ~ SEASON * DENSITY, data = limpets2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.61133 -0.16167 -0.04217 0.09892 0.55567
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.1113333 0.2183934 5.089 0.000267 ***
## SEASONsummer 2.7753333 0.3088549 8.986 1.12e-06 ***
## DENSITY12 -0.0003333 0.3088549 -0.001 0.999157
## DENSITY24 -0.4166667 0.3088549 -1.349 0.202221
## SEASONsummer:DENSITY12 -0.9996667 0.4367868 -2.289 0.041029 *
## SEASONsummer:DENSITY24 -1.4700000 0.4367868 -3.365 0.005617 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3783 on 12 degrees of freedom
## Multiple R-squared: 0.9301, Adjusted R-squared: 0.9009
## F-statistic: 31.93 on 5 and 12 DF, p-value: 1.561e-06
anova(limpets2_lm)
## Analysis of Variance Table
##
## Response: EGGS
## Df Sum Sq Mean Sq F value Pr(>F)
## SEASON 1 17.1483 17.1483 119.845 1.336e-07 ***
## DENSITY 2 4.0019 2.0010 13.984 0.0007325 ***
## SEASON:DENSITY 2 1.6907 0.8454 5.908 0.0163632 *
## Residuals 12 1.7170 0.1431
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Plot the effects of Season and Density using the
allEffects
function in the effects
package.
You may need to use install.packages
to install this
package.
library(effects)
plot(allEffects(limpets2_lm))