Asking research questions

… it is not realistic to expect yourself to sit at your desk and conjure up the perfect study that will revolutionize the field. -Karban et al. 2014, p. 5

Be willing to hang with the dumb ideas that you will inevitably come up with, because the really great ideas stand on the shoulders of the dumb ones. -Karban et al. 2014, p. 14

Introduction to R - continued

Exploring variable elements

You can get specific elements from vectors and other data structures

  • Introduction to the square brackets []
pets <- c("cat", "dog", "rabbit", "pig", "snake")
pets[1]
## [1] "cat"
  • Getting a number of elements, in sequence
pets[3:4]
## [1] "rabbit" "pig"
  • Getting a number of elements, not in sequence
pets[c(1,4)]
## [1] "cat" "pig"

Working with matrices

Review - Why might we want 2D data?

Let’s make a matrix

Challenge

With the people next to you, break down this function, and describe each argument. What is the final product?

my_mat <- matrix(data = runif(50), nrow = 10, byrow = TRUE)

What does it mean to fill byrow?

matrix(data = 1:9, nrow = 3, byrow = TRUE)
##      [,1] [,2] [,3]
## [1,]    1    2    3
## [2,]    4    5    6
## [3,]    7    8    9

Versus

matrix(data = 1:9, nrow = 3, byrow = FALSE)
##      [,1] [,2] [,3]
## [1,]    1    4    7
## [2,]    2    5    8
## [3,]    3    6    9

Challenge

What is the default value for byrow?

Indexing matrices

Indexing happens by row, column notation.

my_mat <- matrix(data = 1:50, nrow = 10, byrow = TRUE)

my_mat[1,1]
## [1] 1
my_mat[1,2]
## [1] 2
my_mat[2,1]
## [1] 6
my_mat[1:4, 1:3]
##      [,1] [,2] [,3]
## [1,]    1    2    3
## [2,]    6    7    8
## [3,]   11   12   13
## [4,]   16   17   18
my_mat[c(1,3,5), ]
##      [,1] [,2] [,3] [,4] [,5]
## [1,]    1    2    3    4    5
## [2,]   11   12   13   14   15
## [3,]   21   22   23   24   25
my_mat[ ,c(1,3,5)]
##       [,1] [,2] [,3]
##  [1,]    1    3    5
##  [2,]    6    8   10
##  [3,]   11   13   15
##  [4,]   16   18   20
##  [5,]   21   23   25
##  [6,]   26   28   30
##  [7,]   31   33   35
##  [8,]   36   38   40
##  [9,]   41   43   45
## [10,]   46   48   50

Combinining internal functions with matrices

Make a “random” matrix (that isn’t random in this case because of the set.seed function)

set.seed(1)
mat1 <- matrix(data = runif(50), nrow = 10, byrow = TRUE)

Calculate the mean of all of the data

mean(mat1)
## [1] 0.5325929

Calculate the standard deviation of all of the data

sd(mat1)
## [1] 0.272239

Calculate row means and column means

rowMeans(mat1)
##  [1] 0.4640751 0.6389526 0.4446999 0.6729408 0.4382595 0.3983864 0.5177556
##  [8] 0.5411271 0.6667390 0.5429926
colMeans(mat1)
## [1] 0.5949241 0.4500704 0.5808290 0.5491985 0.4879423

Introduce the apply function

apply(mat1, MARGIN = 1, mean)
##  [1] 0.4640751 0.6389526 0.4446999 0.6729408 0.4382595 0.3983864 0.5177556
##  [8] 0.5411271 0.6667390 0.5429926

User Defined Functions

What is a function?

Why write a function?

Example

Note: the example below is borrowed from the Software Carpentry Introduction to Programming materials.

Convert from Fahrenheit to Kelvin

fahr_to_kelvin <- function(temp) {
  kelvin <- ((temp - 32) * (5 / 9)) + 273.15
  return(kelvin)
}

Convert from Kelvin to Celsius

kelvin_to_celsius <- function(temp) {
  celsius <- temp - 273.15
  return(celsius)
}

Challenge

Write a function to convert from Fahrenheit to Celsius.

for loops

How do I do the same thing many times?

Generic for loop

for (variable in collection) {
  do things with variable
}

Let’s get more specific. Say we took a bunch of measurements of temperature in Fahrenheit, but want to convert them. How might we do it?

Make our temperature data set

set.seed(8)
temp_data <- runif(n = 20, min = -5, max = 5) + 45

Challenge

What did we just do?

Use our fahr_to_kelvin function on each element

Iteration 1.

for( x in temp_data){
  fahr_to_kelvin(x)
}

Iteration 2.

for( x in temp_data){
  print(fahr_to_kelvin(x))
}
## [1] 280.185
## [1] 278.749
## [1] 282.037
## [1] 281.216
## [1] 279.3806
## [1] 281.5885
## [1] 279.2104
## [1] 282.7737
## [1] 281.8675
## [1] 281.175
## [1] 280.1336
## [1] 278.0906
## [1] 279.9966
## [1] 280.622
## [1] 278.3624
## [1] 282.749
## [1] 277.6017
## [1] 279.0637
## [1] 279.1307
## [1] 280.4895

Iteration 3.

for( x in 1:length(temp_data)){
  print(fahr_to_kelvin(temp_data[x]))
}
## [1] 280.185
## [1] 278.749
## [1] 282.037
## [1] 281.216
## [1] 279.3806
## [1] 281.5885
## [1] 279.2104
## [1] 282.7737
## [1] 281.8675
## [1] 281.175
## [1] 280.1336
## [1] 278.0906
## [1] 279.9966
## [1] 280.622
## [1] 278.3624
## [1] 282.749
## [1] 277.6017
## [1] 279.0637
## [1] 279.1307
## [1] 280.4895

Iteration 4.

temp_data_kelvin <- vector()
for( x in 1:length(temp_data)){
  temp_data_kelvin[x] <- fahr_to_kelvin(temp_data[x])
}

print(temp_data_kelvin)
##  [1] 280.1850 278.7490 282.0370 281.2160 279.3806 281.5885 279.2104
##  [8] 282.7737 281.8675 281.1750 280.1336 278.0906 279.9966 280.6220
## [15] 278.3624 282.7490 277.6017 279.0637 279.1307 280.4895

Conditionals

We can use conditional statements to control the flow of our code, and to “make choices” as it progresses.

if and else

The if and else statements are key to making choices in your code. Before understanding if/else statements, we need to review booleans - i.e., TRUE and FALSE

Aside: TRUE and FALSE values

TRUE
## [1] TRUE
20 == 20
## [1] TRUE
20 > 40
## [1] FALSE

A ! sign can be used as a logical negation

!(20 > 40)
## [1] TRUE

There are many logical operators to consider.


Challenge Describe what each fo the following operators are doing.

x <- TRUE
y <- FALSE
x & y
## [1] FALSE
x | y
## [1] TRUE
xy <- c(x,y)
any(xy)
## [1] TRUE
all(xy)
## [1] FALSE

Now that we know a bit about booleans, let’s get into if/else statements.

Essential, and if is a conditional that says, “do this thing after the if statement, if the conditional was TRUE.”

Here’s a simple example (taken from Software Carpentry’s lessons)

num <- 37
if (num > 100) {
  print("greater")
} else {
  print("not greater")
}
## [1] "not greater"

Challenge

Re-define num so you get the other option.

We don’t need an else statement for this to work -

num <- 37
if (num > 100) {
  print("The number was greater than 100")
}

We can also write a “cascade” of if/else statements

if (num > 0) {
  return(1)
} else if (num == 0) {
  return(0)
} else {
  return(-1)
}
## [1] 1

Challenge

Make the above into a function, that takes in any value, and returns whether it is positive, negative, or equal to 0.