Some times we have a mix of continuous and categorical predictor variables, with a continuous response variable.

See Figure 15.1 in Logan p. 449

Logan - Figure 15.1

Logan - Figure 15.1

Example

From Logan p. 457

To investigate the impacts of sexual activity on male fruitfly longevity, Partridge and Farquhar (1981), measured the longevity of male fruitflies with access to either one virgin female (potential mate), eight virgin females, one pregnant female (not a potential mate), eight pregnant females or no females. The available male fruitflies varied in size and since size is known to impact longevity, the researchers randomly allocated each individual fruitfly to one of the five treatments and also measured thorax length as a covariate (from Box 12.1 of Quinn and Keough (2002)).

Get the data

fruitfly <- read.csv(file = "https://mlammens.github.io/ENS-623-Research-Stats/data/Logan_Examples/Chapter15/Data/partridge.csv")

head(fruitfly)
##   TREATMENT THORAX LONGEV
## 1     Preg8   0.64     35
## 2     Preg8   0.68     37
## 3     Preg8   0.68     49
## 4     Preg8   0.72     46
## 5     Preg8   0.72     63
## 6     Preg8   0.76     39

Plot these data

By group membership

library(ggplot2)
ggplot(data = fruitfly, aes(x = TREATMENT, y = LONGEV)) +
  geom_boxplot() +
  theme_bw()

By size

ggplot(data = fruitfly, aes(x = THORAX, y = LONGEV)) +
  geom_point() +
  theme_bw()

By both

ggplot(data = fruitfly, aes(x = THORAX, y = LONGEV, colour = TREATMENT)) +
  geom_point() +
  theme_bw()

With linear regressions

ggplot(data = fruitfly, aes(x = THORAX, y = LONGEV)) +
  geom_point(aes(colour = TREATMENT)) +
  stat_smooth(aes(colour = TREATMENT), method = "lm", se = FALSE) +
  stat_smooth(method = "lm", size = 2, colour = "black", se = FALSE) +
  theme_bw()
## `geom_smooth()` using formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'

Let’s build an ANCOVA model

fruitfly_ancova <- lm(data = fruitfly, LONGEV ~ TREATMENT * THORAX)

Look at the model diagnostics

plot(fruitfly_ancova)

Note that there is a wedge-shaped pattern to the residuals, suggestive of non-homogeniety of variance. We can use a log10 transform to deal with this problem.

fruitfly_ancova_log <- lm(data = fruitfly, log10(LONGEV) ~ TREATMENT * THORAX)

Recheck model diagnositcs

plot(fruitfly_ancova_log)

Looking at the ANOVA table for this model, the fact that TREATMENT:THORAX interaction is not significant indicates that the slopes are not significantly different from each other.

anova(fruitfly_ancova_log)
## Analysis of Variance Table
## 
## Response: log10(LONGEV)
##                   Df  Sum Sq Mean Sq  F value Pr(>F)    
## TREATMENT          4 0.97717 0.24429  35.5765 <2e-16 ***
## THORAX             1 1.01749 1.01749 148.1776 <2e-16 ***
## TREATMENT:THORAX   4 0.04288 0.01072   1.5611 0.1894    
## Residuals        115 0.78967 0.00687                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We can also look at a summary of this model and interpret the coefficients.

summary(fruitfly_ancova_log)
## 
## Call:
## lm(formula = log10(LONGEV) ~ TREATMENT * THORAX, data = fruitfly)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.216291 -0.059952 -0.004375  0.059322  0.174271 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)             0.9313     0.1686   5.522 2.11e-07 ***
## TREATMENTPreg1          0.1048     0.2620   0.400  0.68980    
## TREATMENTPreg8         -0.0429     0.2380  -0.180  0.85727    
## TREATMENTVirg1         -0.2496     0.2628  -0.950  0.34410    
## TREATMENTVirg8         -0.6716     0.2420  -2.775  0.00644 ** 
## THORAX                  1.0260     0.2007   5.111 1.29e-06 ***
## TREATMENTPreg1:THORAX  -0.1017     0.3145  -0.323  0.74691    
## TREATMENTPreg8:THORAX   0.0922     0.2887   0.319  0.74998    
## TREATMENTVirg1:THORAX   0.2341     0.3127   0.749  0.45561    
## TREATMENTVirg8:THORAX   0.6049     0.2949   2.052  0.04249 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.08287 on 115 degrees of freedom
## Multiple R-squared:  0.7207, Adjusted R-squared:  0.6988 
## F-statistic: 32.97 on 9 and 115 DF,  p-value: < 2.2e-16

Now we build another model in which we assume the slopes are all parallel.

fruitfly_ancova_2 <- lm(data = fruitfly, log10(LONGEV) ~ TREATMENT + THORAX)

plot(fruitfly_ancova_2)

summary(fruitfly_ancova_2)
## 
## Call:
## lm(formula = log10(LONGEV) ~ TREATMENT + THORAX, data = fruitfly)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.226736 -0.058445 -0.003469  0.059961  0.170389 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     0.79095    0.08443   9.368 5.89e-16 ***
## TREATMENTPreg1  0.02260    0.02368   0.954   0.3419    
## TREATMENTPreg8  0.03648    0.02385   1.530   0.1287    
## TREATMENTVirg1 -0.05381    0.02366  -2.275   0.0247 *  
## TREATMENTVirg8 -0.18165    0.02392  -7.592 7.79e-12 ***
## THORAX          1.19385    0.09900  12.060  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.08364 on 119 degrees of freedom
## Multiple R-squared:  0.7055, Adjusted R-squared:  0.6932 
## F-statistic: 57.02 on 5 and 119 DF,  p-value: < 2.2e-16