Introduction to Hypothesis Testing and t tests

Section Goals
Hypothesis Testing
- Example
t-test
We select the Type-I error rate
Generic “test” statistics
Comparing two samples
Example of parametric hypothesis testing
- Differences in fecundity of intertidal gastropods in two different intertidal zones (Example 6A in Logan, Box 3.1 in Q&K)

Section Goals

Review of principles of hypothesis testing
Become familiar with the \(t\) test, as a ‘standard’ hypothesis test
Practice calculating a \(t\) statistic “by hand” and using t.test

Hypothesis Testing

What is a hypothesis?
How do we test our hypothesis?

The answers to these questions, with respect to this course, are primarily based on frequentest statistics.

Example

Let’s say we have a sample of observations that we want to know if they come from some population with a known value for \(\mu\) (i.e., we know the population mean). We can see how (un)likely it is that the sample estimates come from this particular population, by looking at where these values fall in a t distribution. That is, calculate the t statistic:

\[ t_s = \frac{\bar{y} - \mu}{s_{\bar{y}}} \]

ASIDE: Why are we using the t distribution?

We are using the t distribution because we are using an approximation of standard error, \(s_{\bar{y}}\), based on data in our sample.

t-test

Looking at differences between two samples of data. The differences should be t distributed.
Major assumptions - the samples are drawn from populations that are:
1. (Approximately) normally distributed
2. Equally varied (i.e. equal variance)
3. Each observation is independent

We select the Type-I error rate

General consensus is \(p = 0.05\). This is known as Type-I error, or \(\alpha\).

Type-I versus Type-II error

Type-I error: \(\alpha\) - our test suggests there is an effect, but there really is not one
Type-II error: \(\beta\) - when you fail to detect an effect that really occurs

Statistical power

The reciprocal of Type-II error (\(\beta\)) is power.

\[ power(1-\beta) \propto \frac{ES \sqrt{n} \alpha}{\sigma} \]

where \(ES\) is effect size, \(n\) is the sample size, \(\alpha\) is the accepted Type I error rate, and \(\sigma\) is the standard deviation.

How do we increase statistical power?

Increase the sample size.

Distribution of the mean becomes narrower
Acceptance region becomes narrower
Curves overlap less
Type II error rate becomes smaller

power

What are the general steps to hypothesis testing?

See p. 33-34 in Q&K. Fisher’s approach below:

Construct null hypothesis (\(H_0\))
Choose a test stat that measures deviation from that null hypothesis
Collect data and compare the value of the test stat to the known distribution of that stat.
Determine p-val
Accept or reject null

What are some of the potential problems with this approach?

Our data or something more extreme

When we use a sampling distribution for our test statistic (e.g., the t distribution), we are asking “what is the probability of observing our data, or something more extreme, in the long run, if \(H_0\) is true.” Mathematically, this can be written as:

\[ P(data|H_0) \]

Generic “test” statistics

Below is the generic form of the t statistic:

\[ t_s = \frac{St - \theta}{S_{St}} \]

where \(St\) is the value of some statistic (e.g., the mean) from our sample, \(\theta\) is the population value for that statistic, and \(S_{St}\) is the estimated standard error of the statistic \(St\) (based on our sample).

Comparing two samples

How can we use this formula to test whether two samples are drawn from the same population?

Imagine the case where we have two different samples, and for each we’re testing whether the means are different from the population means. We then have:

\[ t_1 = \frac{\bar{y_1}-\mu_1}{s_{\bar{y}_1}} \]

and

\[ t_2 = \frac{\bar{y_2}-\mu_2}{s_{\bar{y}_2}} \]

If the two samples are drawn from the same population, then \(\mu_1 = \mu_2\), or \(\mu_1 - \mu_2 = 0\).

We can then write our t stat as:

\[ t = \frac{(\bar{y_1} - \bar{y_2}) - (\mu_1 - \mu_2)}{s_{\bar{y}_1 - \bar{y}_2}} \]

which simplifies to:

\[ t = \frac{\bar{y_1} - \bar{y_2}}{s_{\bar{y}_1 - \bar{y}_2}} \]

where \(s_{\bar{y}_1 - \bar{y}_2}\) is the standard error of the difference between the means and is equal to:

\[ s_{\bar{y}_1 - \bar{y}_2} = \sqrt{ \frac{(n_1-1)s_1^2 + (n_2-1)s_2^2}{n_1 + n_2 -2} (\frac{1}{n_1} + \frac{1}{n_2}) } \]

Example of parametric hypothesis testing

Differences in fecundity of intertidal gastropods in two different intertidal zones (Example 6A in Logan, Box 3.1 in Q&K)

Ward and Quinn (1988) investigated the differences in fecundity of Lepsiella vinosa in two different intertidal zones (mussel zone and littorinid zone).

Lepsiella vinosa image

Get the data and have a quick look

gastro <- read.csv(file = "https://mlammens.github.io/ENS-623-Research-Stats/data/Logan_Examples/Chapter6/Data/ward.csv")
summary(gastro)

##      ZONE                EGGS      
##  Length:79          Min.   : 5.00  
##  Class :character   1st Qu.: 8.50  
##  Mode  :character   Median :10.00  
##                     Mean   :10.11  
##                     3rd Qu.:12.00  
##                     Max.   :18.00

Make a box plot to help assess differences in variance and deviations from normality.

library(ggplot2)

ggplot() +
  geom_boxplot(data = gastro, aes(x = ZONE, y = EGGS)) +
  theme_bw()

Calculate means and standard deviations of each group separately. We will be using dplyr for this.

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

gastro_summary <-
  gastro %>%
  group_by(ZONE) %>%
  summarise(Mean = mean(EGGS), 
            Var = var(EGGS), 
            SD = sd(EGGS),
            SE = sd(EGGS)/sqrt(length(ZONE)))

gastro_summary

## # A tibble: 2 × 5
##   ZONE    Mean   Var    SD    SE
##   <chr>  <dbl> <dbl> <dbl> <dbl>
## 1 Littor  8.70  4.10  2.03 0.333
## 2 Mussel 11.4   5.36  2.31 0.357

Calculate the t stat by hand, using the equations above

First, let’s calculate \(s_{\bar{y}_1 - \bar{y}_2}\) using the formula noted above:

n_littor <- sum(gastro$ZONE == "Littor")
n_mussel <- sum(gastro$ZONE == "Mussel")

sd_littor <- gastro_summary$SD[1]
sd_mussel <- gastro_summary$SD[2]

s_1_2 <- sqrt( (((n_littor - 1)*sd_littor^2 + (n_mussel - 1)*sd_mussel^2 )/
                  (n_littor + n_mussel -2)) * 
                 ((1/n_littor) + (1/n_mussel)) )

mean_littor <- gastro_summary$Mean[1]
mean_mussel <- gastro_summary$Mean[2]

tstat <- (mean_littor - mean_mussel) / s_1_2

OK, now we have our t statistic. In order to use this to say something about the probability that these two sample come from a population with the same mean, we need to know what degrees of freedom to use to parameterize our t distribution. Here’s the simple formula:

\[ df = (n_1 - 1) + (n_2 - 1) \]

So now we can say that the df here is:

df_gastro <- (n_littor - 1) + (n_mussel - 1)

pt(q = tstat, df = df_gastro) * 2

## [1] 7.457222e-07

The resulting probability value is very small, much less than 0.05. In this case, we would reject the null hypothesis that these two samples come from the same population.

Use R’s native `t.test` function

(gastro_t_test <- t.test(data = gastro, EGGS ~ ZONE, var.equal = TRUE))

## 
##  Two Sample t-test
## 
## data:  EGGS by ZONE
## t = -5.3899, df = 77, p-value = 7.457e-07
## alternative hypothesis: true difference in means between group Littor and group Mussel is not equal to 0
## 95 percent confidence interval:
##  -3.63511 -1.67377
## sample estimates:
## mean in group Littor mean in group Mussel 
##             8.702703            11.357143

gastro_t_test$estimate[1] - gastro_t_test$estimate[2]

## mean in group Littor 
##             -2.65444