Hypothesis Testing

The answers to these questions, with respect to this course, are primarily based on frequentist statistics.

Example

Let’s say we have a sample of observations that we want to know if they come from some population with a known value for \(\mu\) (i.e., we know the population mean). We can see how (un)likely it is that the sample estimates come from this particular population, by looking at where these values fall in a t distribution. That is, calculate the t statistic:

\[ t_s = \frac{\bar{y} - \mu}{s_{\bar{y}}} \]

t-test

  • Looking at differences between two samples of data. The differences should be t distributed.
  • Major assumptions - the samples are drawn from populations that are:
    1. Normally Distributed
    2. Equally varied (i.e. equal variance)
    3. Each observation is independent

We select the error rate

General concencus is \(p = 0.05\). This is known as Type-I error, or \(\alpha\).

Type-I versus Type-II error

  • Type-I error: \(\alpha\) - our test suggests there is an effect, but there really is not one
  • Type-II error: \(\beta\) - when you fail to detect an effect that really occurs

Statistical power

The reciprical of Type-II error (\(\beta\)) is power.

\[ power(1-\beta) \propto \frac{ES \sqrt{n} \alpha}{\sigma} \]

How do we increase statistical power?

Increase the sample size.

  • Distribution of the mean becomes narrower
  • Acceptance region becomes narrower
  • Curves overlap less
  • Type II error rate becomes smaller
power

power

What are the general steps to hypothesis testing?

See p. 33-34 in Q&K. Fisher’s approach below:

What are some of the potential problems with this approach?

Our data or something more extreme

When we use a sampling distribution for our test statistic (e.g., the t distribution), we are asking “what is the probability of observing our data, or something more extreme, in the long run, if \(H_0\) is true.” Mathematically, this can be written as:

\[ P(data|H_0) \]

Examples of parametric hypothesis testing

Below is the generic form of the t statistic:

\[ t_s = \frac{St - \theta}{S_{St}} \]

where \(St\) is the value of some statistic (e.g., the mean) from our sample, \(\theta\) is the population value for that statistic, and \(S_{St}\) is the estimated standard error of the statistic \(St\) (based on our sample).

How can we use this formula to test whether two samples are drawn from the same population?

Imagine the case where we have two different samples, and for each we’re testing whether the means are different from the population means. We then have:

\[ t = \frac{\bar{y_1}-\mu_1}{s_{\bar{y}_1}} \]

and

\[ t = \frac{\bar{y_2}-\mu_2}{s_{\bar{y}_2}} \]

If the two samples are drawn from the same population, then \(\mu_1 = \mu_2\), or \(\mu_1 - \mu_2 = 0\).

We can then write our t stat as:

\[ t = \frac{(\bar{y_1} - \bar{y_2}) - (\mu_1 - \mu_2)}{s_{\bar{y}_1 - \bar{y}_2}} \]

which simplifies to:

\[ t = \frac{\bar{y_1} - \bar{y_2}}{s_{\bar{y}_1 - \bar{y}_2}} \]

where \(s_{\bar{y}_1 - \bar{y}_2}\) is the standard error of the difference between the means and is equal to:

\[ s_{\bar{y}_1 - \bar{y}_2} = \sqrt{ \frac{(n_1-1)s_1^2 + (n_2-1)s_2^2}{n_1 + n_2 -2} (\frac{1}{n_1} + \frac{1}{n_2}) } \]

Differences in fecundity of intertidal gastropods in two different intertidal zones (Example 6A in Logan, Box 3.1 in Q&K)

Ward and Quinn (1988) investigated the differences in fecundity of Lepsiella vinosa in two different intertidal zones (mussel zone and littorinid zone).

Get the data and have a quick look

gastro <- read.csv(file = "https://mlammens.github.io/ENS-623-Research-Stats/data/Logan_Examples/Chapter6/Data/ward.csv")
summary(gastro)
##      ZONE         EGGS      
##  Littor:37   Min.   : 5.00  
##  Mussel:42   1st Qu.: 8.50  
##              Median :10.00  
##              Mean   :10.11  
##              3rd Qu.:12.00  
##              Max.   :18.00

Make a box plot to help assess differences in variance and deviations from normality.

library(ggplot2)

ggplot() +
  geom_boxplot(data = gastro, aes(x = ZONE, y = EGGS)) +
  theme_bw()

Calculate means and standard deviations of each group separatley. We will be using dplyr for this.

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
gastro %>%
  group_by(ZONE) %>%
  summarise(Mean = mean(EGGS), Var = var(EGGS))
## # A tibble: 2 x 3
##   ZONE    Mean   Var
##   <fct>  <dbl> <dbl>
## 1 Littor  8.70  4.10
## 2 Mussel 11.4   5.36

Run t test.

(gastro_t_test <- t.test(data = gastro, EGGS ~ ZONE, var.equal = TRUE))
## 
##  Two Sample t-test
## 
## data:  EGGS by ZONE
## t = -5.3899, df = 77, p-value = 7.457e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -3.63511 -1.67377
## sample estimates:
## mean in group Littor mean in group Mussel 
##             8.702703            11.357143
gastro_t_test$estimate[1] - gastro_t_test$estimate[2]
## mean in group Littor 
##             -2.65444

Metabolic rates of northern fulmars (sea-bird)

Furness and Bryant (1996) measured the metabolic rates of male and female breeding northern fulmars, and tested if there were any observalbe differences in these rates.

Get the data and have a look

fulmars <- read.csv(file = "https://mlammens.github.io/ENS-623-Research-Stats/data/Logan_Examples/Chapter6/Data/furness.csv")
fulmars
##             SEX METRATE BODYMASS
## 1  Male          2950.0      875
## 2  Female        1956.1      635
## 3  Male          2308.7      765
## 4  Male          2135.6      780
## 5  Male          1945.6      790
## 6  Female        1490.5      635
## 7  Female        1361.3      668
## 8  Female        1086.5      640
## 9  Female        1091.0      645
## 10 Male          1195.5      788
## 11 Female         727.7      635
## 12 Male           843.3      855
## 13 Male           525.8      860
## 14 Male           605.7     1005
summary(fulmars)
##            SEX       METRATE          BODYMASS     
##  Female      :6   Min.   : 525.8   Min.   : 635.0  
##  Male        :8   1st Qu.: 904.1   1st Qu.: 641.2  
##                   Median :1278.4   Median : 772.5  
##                   Mean   :1444.5   Mean   : 755.4  
##                   3rd Qu.:1953.5   3rd Qu.: 838.8  
##                   Max.   :2950.0   Max.   :1005.0

Make a box plot to help assess differences in variance and deviations from normality.

ggplot() +
  geom_boxplot(data = fulmars, aes(x = SEX, y = METRATE)) +
  theme_bw()

Challange

Are the variances the same?

Calculate means and standard deviations of each group separatley. We will be using dplyr for this.

fulmars %>%
  group_by(SEX) %>%
  summarise(Mean = mean(METRATE), Var = var(METRATE))
## # A tibble: 2 x 3
##   SEX             Mean     Var
##   <fct>          <dbl>   <dbl>
## 1 "Female      " 1286. 177209.
## 2 "Male        " 1564. 799903.

Based on inequality of variances, use Welch’s t-test.

(fulmars_t_test <- t.test(data = fulmars, METRATE ~ SEX, var.equal = FALSE))
## 
##  Welch Two Sample t-test
## 
## data:  METRATE by SEX
## t = -0.77317, df = 10.468, p-value = 0.4565
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -1075.3208   518.8042
## sample estimates:
## mean in group Female       mean in group Male         
##                   1285.517                   1563.775