Probability through M&Ms

Open your bag(s) and count how many M&Ms you got. Also count how many of each color.

Make a new variable called tot_mm and assign it the number of MMs in your bag.

tot_mm <- 60

`mean` number of MMs in a bag

Calculate the mean number of MMs in a bag.

tot_mm_pop_sample <- c(60, 58, 57, 63, 61, 60, 59)

Mean:

\[ \bar{x} = \frac{\sum_{i=1}^n{x_i}}{n} \]

Using the sum function.

(mean_mm_1 <- sum(tot_mm_pop_sample) / length(tot_mm_pop_sample))

## [1] 59.71429

Using the mean function.

(mean_mm_2 <- mean(tot_mm_pop_sample))

## [1] 59.71429

Quantify the variation in the number of MMs in a bag

var function
sd function

What do we mean by variation?

In general, we mean something along the lines of ‘the amount of variation around some mean value’.

How might we quantify this?

Add up the distances from the mean.

First calculate distances from the mean for each bag.

(dist_from_mean <- tot_mm_pop_sample - mean_mm_1)

## [1]  0.2857143 -1.7142857 -2.7142857  3.2857143  1.2857143  0.2857143
## [7] -0.7142857

Next calculate the total distances. We could also call this the sums of distances

(sum_dist_from_mean <- sum(dist_from_mean))

## [1] -7.105427e-15

What information does this give us, and why might it not be useful?

Add up the absolute distances form the mean.

First calculate the absolute distances from the mean.

(absdist_from_mean <- abs(tot_mm_pop_sample - mean_mm_1))

## [1] 0.2857143 1.7142857 2.7142857 3.2857143 1.2857143 0.2857143 0.7142857

Next add up the total absolute distances from the mean.

(sum_absdist_from_mean <- sum(absdist_from_mean))

## [1] 10.28571

Add up the squared distances from the mean.

While the absolute distance from the mean does give us a reasonable measure of variability, because of specific mathematical properties, it’s more convenient to work with squared distances from the mean, leading to the measures of variance and standard deviation.

Challenge

Calculate squared distances from the mean, and sum them to determine the total squared deviation.

Standard deviation can be thought of as the average deviation from the mean.

Challenge

Calcualte the variance of the number of M&Ms in a bag, considering all of the bags in the class. Do this without using the var function.
Calculate the standard deviation of the number of M&Ms in a bag. Do this without using the sd function.

Probability and M&Ms

Without looking, you chose one M&M. What colors could you have chosen?

The set of brown, yellow, green, red, orange, blue is the sample space.

Your selection of a single M&M is called an event.

Challenge

What’s the probability of getting a “color” M&M in your bag?

P(brown), P(yellow), etc.

Putting our data together into a `data.frame`

Here we will collate the classes data into a single data set.

Challenge

What is the mean probability of getting each color across each bag?

Tools to use:

data.frames
apply

Probability of events

What’s the P(green OR blue OR red) in your bag?

P(green) + P(blue) + P(red)

What is the P(NOT green) in your bag?

\[ P(\sim Green) = 1 - P(Green) = P(G)^c \]

Sampling with replacement

I draw one M&M from my bag, put it back, then draw another.

What’s the P(green and then blue)?

What’s the P(green or blue)?

Sampling without replacement

What is the probability of getting green, then blue, without replacing your first draw?

Challenge

What is the sample space when drawing two M&Ms?

Draw a random bag of M&Ms using the company stated frequencies/probabilities for each color

Challenge

How would you create a random bag of M&Ms, assuming that each M&M has an equal probability of being in any given bag?

Mars claims that percentages of each color M&Ms are slightly different.

## Colors as a vector
mm_colors <- c("blue", "brown", "green", "orange", "red", "yellow")
mm_probs <- c(.23, .14, .16, .20, .13, .14)

## I want to "sample" a bag of MMs
new_bag <- sample(x = mm_colors, size = 15, replace = TRUE, prob = mm_probs)
table(new_bag)

## new_bag
##   blue  brown  green orange yellow 
##      2      1      2      7      3

Because it is possible to draw a bag that is completely missing some of the colors, we need to explicitely check how many of each color is in the new bag, if we want to compare with our original bag.

## Count the number of each color
new_bag_counts <- c(sum(new_bag == "blue"),
                    sum(new_bag == "brown"),
                    sum(new_bag == "green"),
                    sum(new_bag == "orange"),
                    sum(new_bag == "red"),
                    sum(new_bag == "yellow"))
new_bag_counts

## [1] 2 1 2 7 0 3

And now to check if my bag is the same as the new bag I can look at a series of logical tests.

individual_color_compare <- new_bag_counts == my_orig_bag

# The bags match if all of these are true
all(individual_color_compare)

Meeting 4 - Introduction to Probability

Matthew E. Aiello-Lammens

Probability

Frequentist statistics

Bayesian statistics

Probability through M&Ms

`mean` number of MMs in a bag

Quantify the variation in the number of MMs in a bag

Probability and M&Ms

Putting our data together into a `data.frame`

Probability of events

Sampling with replacement

Sampling without replacement

Draw a random bag of M&Ms using the company stated frequencies/probabilities for each color

Meeting 4 - Introduction to Probability

Matthew E. Aiello-Lammens

Probability

Frequentist statistics

Bayesian statistics

Probability through M&Ms

mean number of MMs in a bag

Quantify the variation in the number of MMs in a bag

Probability and M&Ms

Putting our data together into a data.frame

Probability of events

Sampling with replacement

Sampling without replacement

Draw a random bag of M&Ms using the company stated frequencies/probabilities for each color

`mean` number of MMs in a bag

Putting our data together into a `data.frame`