Let’s work with the limpet data set from Quinn (1988), presented in both Quinn and Keough 2009 and Logan 2010.

# Read in the file
limpets <- read.csv(file = "https://mlammens.github.io/ENS-623-Research-Stats/data/Logan_Examples/Chapter12/Data/quinn.csv")
# Make DENSITY a factor
limpets$DENSITY <- as.factor(limpets$DENSITY)
# Show a summary of the data.frame
summary(limpets)
##  DENSITY    SEASON        EGGS       
##  8 :6    spring:12   Min.   :0.3560  
##  15:6    summer:12   1st Qu.:0.9165  
##  30:6                Median :1.4330  
##  45:6                Mean   :1.4717  
##                      3rd Qu.:1.9227  
##                      Max.   :2.8750

Next, let’s calculate the group means for the different treatments.

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
limpets_grp_means <-
  limpets %>%
  group_by(SEASON, DENSITY) %>%
  summarise(GROUP_Mean = mean(EGGS))
limpets_grp_means
## # A tibble: 8 x 3
## # Groups:   SEASON [2]
##   SEASON DENSITY GROUP_Mean
##   <fct>  <fct>        <dbl>
## 1 spring 8            2.42 
## 2 spring 15           2.18 
## 3 spring 30           1.57 
## 4 spring 45           1.20 
## 5 summer 8            1.83 
## 6 summer 15           1.18 
## 7 summer 30           0.811
## 8 summer 45           0.593

Use spread function in the tidyr package to make this into a nice table

library(tidyr)
spread(limpets_grp_means, SEASON, GROUP_Mean)
## # A tibble: 4 x 3
##   DENSITY spring summer
##   <fct>    <dbl>  <dbl>
## 1 8         2.42  1.83 
## 2 15        2.18  1.18 
## 3 30        1.57  0.811
## 4 45        1.20  0.593

Make a visualization of these data too.

library(ggplot2)

ggplot(data = limpets) +
  geom_boxplot(aes(x = DENSITY, y = EGGS, fill = SEASON))

Let’s now run an ANOVA analysis.

limpets_lm <- lm(data = limpets, formula = EGGS ~  SEASON + DENSITY)
anova(limpets_lm)
## Analysis of Variance Table
## 
## Response: EGGS
##           Df Sum Sq Mean Sq F value    Pr(>F)    
## SEASON     1 3.2502  3.2502  20.054 0.0002576 ***
## DENSITY    3 5.2841  1.7614  10.868 0.0002226 ***
## Residuals 19 3.0793  0.1621                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We are confident there is an effect of both DENSITY and SEAON on EGGS, based on these results.

Now, we can use the regression coefficients from this model to estimate the group means.

summary(limpets_lm)
## 
## Call:
## lm(formula = EGGS ~ SEASON + DENSITY, data = limpets)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.74300 -0.24588 -0.03333  0.22913  0.67367 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    2.4930     0.1838  13.567 3.17e-11 ***
## SEASONsummer  -0.7360     0.1644  -4.478 0.000258 ***
## DENSITY15     -0.4475     0.2324  -1.925 0.069291 .  
## DENSITY30     -0.9367     0.2324  -4.030 0.000716 ***
## DENSITY45     -1.2288     0.2324  -5.287 4.21e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4026 on 19 degrees of freedom
## Multiple R-squared:  0.7349, Adjusted R-squared:  0.679 
## F-statistic: 13.16 on 4 and 19 DF,  p-value: 2.662e-05

Let’s think of these as regression coefficients (which they are!) and use the following symbols:

Coef. symbol Coef. name
b0 (Intercept)
b1 SEASONsummer
b2 DENSITY15
b3 DENSITY30
b4 DENSITY45

The table below translates how these coefficients can be used to calcualte estimates for each group mean value.

NOTE - I’m emphasizing that these are estimates of the group means. In this case, our model assumes that there is no interaction between the predictor variables DENSITY and SEASON.

SEASON - Spring SEASON - Summer
DENSITY - 8 b0 b0 + b1
DENSITY - 15 b0 + b2 b0 + b1 + b2
DENSITY - 30 b0 + b3 b0 + b1 + b3
DENSITY - 45 b0 + b4 b0 + b1 + b4

Let’s now use the results of our lm call to estimate the group means. I’m going to do this in the Rmd file by getting the model coefficients, then using the in-line R code execution in the table.

limpets_coef <- round(coef(limpets_lm), digits = 3)
SEASON - Spring SEASON - Summer
DENSITY - 8 2.493 1.757
DENSITY - 15 2.045 1.309
DENSITY - 30 1.556 0.82
DENSITY - 45 1.264 0.528

If we compare these estimates to the actual values we see in the table above, we should see that they are pretty close.

Now, let’s see what happens if we include and interaction between DENSITY and SEASON.

limpets_lm_int <- lm(data = limpets, EGGS ~ SEASON * DENSITY)
anova(limpets_lm_int)
## Analysis of Variance Table
## 
## Response: EGGS
##                Df Sum Sq Mean Sq F value    Pr(>F)    
## SEASON          1 3.2502  3.2502 17.8419 0.0006453 ***
## DENSITY         3 5.2841  1.7614  9.6691 0.0007041 ***
## SEASON:DENSITY  3 0.1647  0.0549  0.3014 0.8239545    
## Residuals      16 2.9146  0.1822                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

And now let’s look at the regression coefficients.

summary(limpets_lm_int)
## 
## Call:
## lm(formula = EGGS ~ SEASON * DENSITY, data = limpets)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.6667 -0.2612 -0.0610  0.2292  0.6647 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)             2.41667    0.24642   9.807  3.6e-08 ***
## SEASONsummer           -0.58333    0.34849  -1.674  0.11358    
## DENSITY15              -0.23933    0.34849  -0.687  0.50206    
## DENSITY30              -0.85133    0.34849  -2.443  0.02655 *  
## DENSITY45              -1.21700    0.34849  -3.492  0.00301 ** 
## SEASONsummer:DENSITY15 -0.41633    0.49284  -0.845  0.41069    
## SEASONsummer:DENSITY30 -0.17067    0.49284  -0.346  0.73363    
## SEASONsummer:DENSITY45 -0.02367    0.49284  -0.048  0.96229    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4268 on 16 degrees of freedom
## Multiple R-squared:  0.749,  Adjusted R-squared:  0.6392 
## F-statistic: 6.822 on 7 and 16 DF,  p-value: 0.000745

As above, let’s think of these as regression coefficients (which they are!) and use the following symbols:

Coef. symbol Coef. name
b0 (Intercept)
b1 SEASONsummer
b2 DENSITY15
b3 DENSITY30
b4 DENSITY45
b5 SEASONsummer:DENSITY15
b6 SEASONsummer:DENSITY30
b7 SEASONsummer:DENSITY45

The table below translates how these coefficients can be used to calcualte estimates for each group mean value. Pay particular attention to the fact that we now have three more coefficients to work with, representing the interaction effects.

SEASON - Spring SEASON - Summer
DENSITY - 8 b0 b0 + b1
DENSITY - 15 b0 + b2 b0 + b1 + b2 + b5
DENSITY - 30 b0 + b3 b0 + b1 + b3 + b6
DENSITY - 45 b0 + b4 b0 + b1 + b4 + b7

Now using this table, let’s use the results of our lm call to estimate the group means.

limpets_int_coef <- round(coef(limpets_lm_int), digits = 3)
SEASON - Spring SEASON - Summer
DENSITY - 8 2.417 1.834
DENSITY - 15 2.178 1.179
DENSITY - 30 1.566 0.812
DENSITY - 45 1.2 0.593