Some times we have a mix of continuous and categorical predictor variables, with a continuous response variable.
See Figure 15.1 in Logan p. 449
Logan - Figure 15.1
From Logan p. 457
To investigate the impacts of sexual activity on male fruitfly longevity, Partridge and Farquhar (1981), measured the longevity of male fruitflies with access to either one virgin female (potential mate), eight virgin females, one pregnant female (not a potential mate), eight pregnant females or no females. The available male fruitflies varied in size and since size is known to impact longevity, the researchers randomly allocated each individual fruitfly to one of the five treatments and also measured thorax length as a covariate (from Box 12.1 of Quinn and Keough (2002)).
Get the data
fruitfly <- read.csv(file = "https://mlammens.github.io/ENS-623-Research-Stats/data/Logan_Examples/Chapter15/Data/partridge.csv")
head(fruitfly)
## TREATMENT THORAX LONGEV
## 1 Preg8 0.64 35
## 2 Preg8 0.68 37
## 3 Preg8 0.68 49
## 4 Preg8 0.72 46
## 5 Preg8 0.72 63
## 6 Preg8 0.76 39
Plot these data
By group membership
library(ggplot2)
ggplot(data = fruitfly, aes(x = TREATMENT, y = LONGEV)) +
geom_boxplot() +
theme_bw()
By size
ggplot(data = fruitfly, aes(x = THORAX, y = LONGEV)) +
geom_point() +
theme_bw()
By both
ggplot(data = fruitfly, aes(x = THORAX, y = LONGEV, colour = TREATMENT)) +
geom_point() +
theme_bw()
With linear regressions
ggplot(data = fruitfly, aes(x = THORAX, y = LONGEV)) +
geom_point(aes(colour = TREATMENT)) +
stat_smooth(aes(colour = TREATMENT), method = "lm", se = FALSE) +
stat_smooth(method = "lm", size = 2, colour = "black", se = FALSE) +
theme_bw()
## `geom_smooth()` using formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'
Let’s build an ANCOVA model
fruitfly_ancova <- lm(data = fruitfly, LONGEV ~ TREATMENT * THORAX)
Look at the model diagnostics
plot(fruitfly_ancova)
Note that there is a wedge-shaped pattern to the residuals, suggestive of non-homogeniety of variance. We can use a log10 transform to deal with this problem.
fruitfly_ancova_log <- lm(data = fruitfly, log10(LONGEV) ~ TREATMENT * THORAX)
Recheck model diagnositcs
plot(fruitfly_ancova_log)
Looking at the ANOVA table for this model, the fact that TREATMENT:THORAX interaction is not significant indicates that the slopes are not significantly different from each other.
anova(fruitfly_ancova_log)
## Analysis of Variance Table
##
## Response: log10(LONGEV)
## Df Sum Sq Mean Sq F value Pr(>F)
## TREATMENT 4 0.97717 0.24429 35.5765 <2e-16 ***
## THORAX 1 1.01749 1.01749 148.1776 <2e-16 ***
## TREATMENT:THORAX 4 0.04288 0.01072 1.5611 0.1894
## Residuals 115 0.78967 0.00687
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We can also look at a summary
of this model and interpret the coefficients.
summary(fruitfly_ancova_log)
##
## Call:
## lm(formula = log10(LONGEV) ~ TREATMENT * THORAX, data = fruitfly)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.216291 -0.059952 -0.004375 0.059322 0.174271
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.9313 0.1686 5.522 2.11e-07 ***
## TREATMENTPreg1 0.1048 0.2620 0.400 0.68980
## TREATMENTPreg8 -0.0429 0.2380 -0.180 0.85727
## TREATMENTVirg1 -0.2496 0.2628 -0.950 0.34410
## TREATMENTVirg8 -0.6716 0.2420 -2.775 0.00644 **
## THORAX 1.0260 0.2007 5.111 1.29e-06 ***
## TREATMENTPreg1:THORAX -0.1017 0.3145 -0.323 0.74691
## TREATMENTPreg8:THORAX 0.0922 0.2887 0.319 0.74998
## TREATMENTVirg1:THORAX 0.2341 0.3127 0.749 0.45561
## TREATMENTVirg8:THORAX 0.6049 0.2949 2.052 0.04249 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.08287 on 115 degrees of freedom
## Multiple R-squared: 0.7207, Adjusted R-squared: 0.6988
## F-statistic: 32.97 on 9 and 115 DF, p-value: < 2.2e-16
Now we build another model in which we assume the slopes are all parallel.
fruitfly_ancova_2 <- lm(data = fruitfly, log10(LONGEV) ~ TREATMENT + THORAX)
plot(fruitfly_ancova_2)
summary(fruitfly_ancova_2)
##
## Call:
## lm(formula = log10(LONGEV) ~ TREATMENT + THORAX, data = fruitfly)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.226736 -0.058445 -0.003469 0.059961 0.170389
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.79095 0.08443 9.368 5.89e-16 ***
## TREATMENTPreg1 0.02260 0.02368 0.954 0.3419
## TREATMENTPreg8 0.03648 0.02385 1.530 0.1287
## TREATMENTVirg1 -0.05381 0.02366 -2.275 0.0247 *
## TREATMENTVirg8 -0.18165 0.02392 -7.592 7.79e-12 ***
## THORAX 1.19385 0.09900 12.060 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.08364 on 119 degrees of freedom
## Multiple R-squared: 0.7055, Adjusted R-squared: 0.6932
## F-statistic: 57.02 on 5 and 119 DF, p-value: < 2.2e-16