# Using the optim function

R has a function optim which optimizes functions and is thus much better than a simple grid search ‘brute force’ approach we learned in Lecture 5. This function is very handy for finding the parameters that minimize the Log-likelihood. In the example below, we are trying to find the best estimates for $\mu$ and $\sigma$ for a normal distribution. Instead of writing a negative log-likelihood function for the normal distribution, we will take advantage of the R function that gives us the probability density function (dnorm), using a few optional arguments.

As in Lecture 5, in this example, we’ll generate some fake “data” by drawing random samples from a $N(\mu=1,\sigma=2)$.

x <- rnorm(1000, mean = 1, sd = 2)


Next, let’s write a function for the negative log-likelihood of our data. Our function has two inputs: x, which is our observed data; and params, which in this case is a vector of length two. params contains our guess at the initial values of the parameters we are trying to estimate. So the first element is our guess for the mean value ($\mu$) and the second element is our guess for the standard deviation ($\sigma$). We are passing these parameters to the function in this way because it is the syntax required to later take advantage of the optim function. Inside the function, we explicitely define the mean and std. dev. values, then calculate the sum of the density values for all of our data in x. Note that we use the argument log = TRUE in the dnorm function, to get the log of the probability density value, and we preface the sum function with a negative sign. So this line in our function returns the negative log-likelihood value for the data in x GIVEN a mean and std. dev. value defined in params.

neg.ll.v2 <- function(x, params) {
mu = params[1]
sigma = params[2]
-sum(dnorm(x, mean = mu, sd = sigma, log = TRUE))
}


Now we maximize the log-likelihood using the function optim:

opt1 <- optim(par = c(1, 1), fn = neg.ll.v2, x = x)
opt1

## $par ## [1] 0.9111144 1.9992149 ## ##$value
## [1] 2111.662
##
## $counts ## function gradient ## 65 NA ## ##$convergence
## [1] 0
##
## $message ## NULL  In the example above, my iniitial guesses at the mean and std. dev. were 1 and 1. This is what the par = c(1, 1) input to optim means. In some cases, the optim funciton can be very sensitive to these values, and it is a good idea to use values as close to the final ones as possible. In this case, given that I was estimating the population mean and std. dev., I could have used the mean and std. dev. of x as my initial guesses. That is, I could have used the sample mean and std. dev. as my initial paramter value guesses. In homework 4, you are asked to carry out a similar process, but instead of estimating the mean and sd of a normal distribution, you are asked to estimate$\beta_0$and$\beta_1\$.