Let’s work with the limpet data set from Quinn (1988), presented in both Quinn and Keough 2009 and Logan 2010.
# Read in the file
limpets <- read.csv(file = "https://mlammens.github.io/ENS-623-Research-Stats/data/Logan_Examples/Chapter12/Data/quinn.csv")
# Make DENSITY a factor
limpets$DENSITY <- as.factor(limpets$DENSITY)
# Show a summary of the data.frame
summary(limpets)
## DENSITY SEASON EGGS
## 8 :6 spring:12 Min. :0.3560
## 15:6 summer:12 1st Qu.:0.9165
## 30:6 Median :1.4330
## 45:6 Mean :1.4717
## 3rd Qu.:1.9227
## Max. :2.8750
Next, let’s calculate the group means for the different treatments.
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
limpets_grp_means <-
limpets %>%
group_by(SEASON, DENSITY) %>%
summarise(GROUP_Mean = mean(EGGS))
limpets_grp_means
## # A tibble: 8 x 3
## # Groups: SEASON [2]
## SEASON DENSITY GROUP_Mean
## <fct> <fct> <dbl>
## 1 spring 8 2.42
## 2 spring 15 2.18
## 3 spring 30 1.57
## 4 spring 45 1.20
## 5 summer 8 1.83
## 6 summer 15 1.18
## 7 summer 30 0.811
## 8 summer 45 0.593
Use spread
function in the tidyr
package to make this into a nice table
library(tidyr)
spread(limpets_grp_means, SEASON, GROUP_Mean)
## # A tibble: 4 x 3
## DENSITY spring summer
## <fct> <dbl> <dbl>
## 1 8 2.42 1.83
## 2 15 2.18 1.18
## 3 30 1.57 0.811
## 4 45 1.20 0.593
Make a visualization of these data too.
library(ggplot2)
ggplot(data = limpets) +
geom_boxplot(aes(x = DENSITY, y = EGGS, fill = SEASON))
Let’s now run an ANOVA analysis.
limpets_lm <- lm(data = limpets, formula = EGGS ~ SEASON + DENSITY)
anova(limpets_lm)
## Analysis of Variance Table
##
## Response: EGGS
## Df Sum Sq Mean Sq F value Pr(>F)
## SEASON 1 3.2502 3.2502 20.054 0.0002576 ***
## DENSITY 3 5.2841 1.7614 10.868 0.0002226 ***
## Residuals 19 3.0793 0.1621
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We are confident there is an effect of both DENSITY and SEAON on EGGS, based on these results.
Now, we can use the regression coefficients from this model to estimate the group means.
summary(limpets_lm)
##
## Call:
## lm(formula = EGGS ~ SEASON + DENSITY, data = limpets)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.74300 -0.24588 -0.03333 0.22913 0.67367
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.4930 0.1838 13.567 3.17e-11 ***
## SEASONsummer -0.7360 0.1644 -4.478 0.000258 ***
## DENSITY15 -0.4475 0.2324 -1.925 0.069291 .
## DENSITY30 -0.9367 0.2324 -4.030 0.000716 ***
## DENSITY45 -1.2288 0.2324 -5.287 4.21e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4026 on 19 degrees of freedom
## Multiple R-squared: 0.7349, Adjusted R-squared: 0.679
## F-statistic: 13.16 on 4 and 19 DF, p-value: 2.662e-05
Let’s think of these as regression coefficients (which they are!) and use the following symbols:
Coef. symbol | Coef. name |
---|---|
b0 | (Intercept) |
b1 | SEASONsummer |
b2 | DENSITY15 |
b3 | DENSITY30 |
b4 | DENSITY45 |
The table below translates how these coefficients can be used to calcualte estimates for each group mean value.
NOTE - I’m emphasizing that these are estimates of the group means. In this case, our model assumes that there is no interaction between the predictor variables DENSITY and SEASON.
SEASON - Spring | SEASON - Summer | |
---|---|---|
DENSITY - 8 | b0 | b0 + b1 |
DENSITY - 15 | b0 + b2 | b0 + b1 + b2 |
DENSITY - 30 | b0 + b3 | b0 + b1 + b3 |
DENSITY - 45 | b0 + b4 | b0 + b1 + b4 |
Let’s now use the results of our lm
call to estimate the group means. I’m going to do this in the Rmd file by getting the model coefficients, then using the in-line R code execution in the table.
limpets_coef <- round(coef(limpets_lm), digits = 3)
SEASON - Spring | SEASON - Summer | |
---|---|---|
DENSITY - 8 | 2.493 | 1.757 |
DENSITY - 15 | 2.045 | 1.309 |
DENSITY - 30 | 1.556 | 0.82 |
DENSITY - 45 | 1.264 | 0.528 |
If we compare these estimates to the actual values we see in the table above, we should see that they are pretty close.
Now, let’s see what happens if we include and interaction between DENSITY and SEASON.
limpets_lm_int <- lm(data = limpets, EGGS ~ SEASON * DENSITY)
anova(limpets_lm_int)
## Analysis of Variance Table
##
## Response: EGGS
## Df Sum Sq Mean Sq F value Pr(>F)
## SEASON 1 3.2502 3.2502 17.8419 0.0006453 ***
## DENSITY 3 5.2841 1.7614 9.6691 0.0007041 ***
## SEASON:DENSITY 3 0.1647 0.0549 0.3014 0.8239545
## Residuals 16 2.9146 0.1822
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
And now let’s look at the regression coefficients.
summary(limpets_lm_int)
##
## Call:
## lm(formula = EGGS ~ SEASON * DENSITY, data = limpets)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.6667 -0.2612 -0.0610 0.2292 0.6647
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.41667 0.24642 9.807 3.6e-08 ***
## SEASONsummer -0.58333 0.34849 -1.674 0.11358
## DENSITY15 -0.23933 0.34849 -0.687 0.50206
## DENSITY30 -0.85133 0.34849 -2.443 0.02655 *
## DENSITY45 -1.21700 0.34849 -3.492 0.00301 **
## SEASONsummer:DENSITY15 -0.41633 0.49284 -0.845 0.41069
## SEASONsummer:DENSITY30 -0.17067 0.49284 -0.346 0.73363
## SEASONsummer:DENSITY45 -0.02367 0.49284 -0.048 0.96229
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4268 on 16 degrees of freedom
## Multiple R-squared: 0.749, Adjusted R-squared: 0.6392
## F-statistic: 6.822 on 7 and 16 DF, p-value: 0.000745
As above, let’s think of these as regression coefficients (which they are!) and use the following symbols:
Coef. symbol | Coef. name |
---|---|
b0 | (Intercept) |
b1 | SEASONsummer |
b2 | DENSITY15 |
b3 | DENSITY30 |
b4 | DENSITY45 |
b5 | SEASONsummer:DENSITY15 |
b6 | SEASONsummer:DENSITY30 |
b7 | SEASONsummer:DENSITY45 |
The table below translates how these coefficients can be used to calcualte estimates for each group mean value. Pay particular attention to the fact that we now have three more coefficients to work with, representing the interaction effects.
SEASON - Spring | SEASON - Summer | |
---|---|---|
DENSITY - 8 | b0 | b0 + b1 |
DENSITY - 15 | b0 + b2 | b0 + b1 + b2 + b5 |
DENSITY - 30 | b0 + b3 | b0 + b1 + b3 + b6 |
DENSITY - 45 | b0 + b4 | b0 + b1 + b4 + b7 |
Now using this table, let’s use the results of our lm
call to estimate the group means.
limpets_int_coef <- round(coef(limpets_lm_int), digits = 3)
SEASON - Spring | SEASON - Summer | |
---|---|---|
DENSITY - 8 | 2.417 | 1.834 |
DENSITY - 15 | 2.178 | 1.179 |
DENSITY - 30 | 1.566 | 0.812 |
DENSITY - 45 | 1.2 | 0.593 |