Special Note: This lecture was delivered using BlackBoard Collaborate Ultra, due to weather closure.

Important properties of distributions

Measures of location, dispersion, and variability

When we are looking at the distribution of any data set, we want to have a measure of the center of the distribution. Usually our first step will be to look at the arithmetic mean, but each statistical distribution has its own Expected Value.

Similarly, we want to have a measure of dispersion and variability. Again, we usually would do something like calculate the standard deviation, but each distribution has its own measure of Variance.

Terms and properties of distributions

Consider a generic probability distribution, with a PDF of

\[ P( a < x < b) = \int_a^b f(x|params)dx \]

(or an analogous PMF). This distribution has a number of properties that could be described.

  • Parameters
    • Location - where
    • Shape - eponymous
    • Scale - spread
  • Support - what values of \(x\) are possible
  • PDF or PMF
  • Mean - expected value
  • Median - middle value
  • Mode - most common value
  • Variance
  • Skewness - measure of asymmetry
  • Kurtosis - how fat/thin the tails are

Estimating distribution parameters

What are we estimating?

Usually measures of location and dispersion and variability.

Challenge

Can you think of any measures of location and dispersion that you are familiar with?

Expected value of a distribution

  • Long-run average value
  • Mean

Expected value for discrete distributions

\[ E[X] = \sum_{i=1}^{\infty} x_i p_i \]

Expected value for continuous distributions

\[ E[X] = \int_{-\infty}^{\infty} x \cdot f(x) dx \]

Population versus sample statistics

We use sample statistics to estimate population statistics. In most cases in biology, populations are too large to measure population parameters directly. Therefore, we use different estimators to calculate the populations statistics based on the sample statistics.

Properities of good estimators

See Q&K page 15 for more details.

  1. Unbiased - repeated estimates should not consistently under- or over-estimate population parameters
  2. Consistent - as sample size increases, sample estimate should get closer to population estimate
  3. Efficient - estimate has lowest variance among other estimators

Challenge

What is the difference between accuracy vs. precision1

Different kinds of estimators

  • Point estimate - a single value estimate for a population parameter
  • Interval estimate - a range of values that might include the parameter with a known probability

Challenge

Give an example of a point estimate and an interval estimate.


  1. https://en.wikipedia.org/wiki/Accuracy_and_precision