… it is not realistic to expect yourself to sit at your desk and conjure up the perfect study that will revolutionize the field. -Karban et al. 2014, p. 5
Be willing to hang with the dumb ideas that you will inevitably come up with, because the really great ideas stand on the shoulders of the dumb ones. -Karban et al. 2014, p. 14
You can get specific elements from vectors and other data structures
[]
pets <- c("cat", "dog", "rabbit", "pig", "snake")
pets[1]
## [1] "cat"
pets[3:4]
## [1] "rabbit" "pig"
pets[c(1,4)]
## [1] "cat" "pig"
Review - Why might we want 2D data?
Let’s make a matrix
Challenge
With the people next to you, break down this function, and describe each argument. What is the final product?
my_mat <- matrix(data = runif(50), nrow = 10, byrow = TRUE)
What does it mean to fill byrow
?
matrix(data = 1:9, nrow = 3, byrow = TRUE)
## [,1] [,2] [,3]
## [1,] 1 2 3
## [2,] 4 5 6
## [3,] 7 8 9
Versus
matrix(data = 1:9, nrow = 3, byrow = FALSE)
## [,1] [,2] [,3]
## [1,] 1 4 7
## [2,] 2 5 8
## [3,] 3 6 9
Challenge
What is the default value for byrow
?
Indexing happens by row, column notation.
my_mat <- matrix(data = 1:50, nrow = 10, byrow = TRUE)
my_mat[1,1]
## [1] 1
my_mat[1,2]
## [1] 2
my_mat[2,1]
## [1] 6
my_mat[1:4, 1:3]
## [,1] [,2] [,3]
## [1,] 1 2 3
## [2,] 6 7 8
## [3,] 11 12 13
## [4,] 16 17 18
my_mat[c(1,3,5), ]
## [,1] [,2] [,3] [,4] [,5]
## [1,] 1 2 3 4 5
## [2,] 11 12 13 14 15
## [3,] 21 22 23 24 25
my_mat[ ,c(1,3,5)]
## [,1] [,2] [,3]
## [1,] 1 3 5
## [2,] 6 8 10
## [3,] 11 13 15
## [4,] 16 18 20
## [5,] 21 23 25
## [6,] 26 28 30
## [7,] 31 33 35
## [8,] 36 38 40
## [9,] 41 43 45
## [10,] 46 48 50
Make a “random” matrix (that isn’t random in this case because of the set.seed
function)
set.seed(1)
mat1 <- matrix(data = runif(50), nrow = 10, byrow = TRUE)
Calculate the mean of all of the data
mean(mat1)
## [1] 0.5325929
Calculate the standard deviation of all of the data
sd(mat1)
## [1] 0.272239
Calculate row means and column means
rowMeans(mat1)
## [1] 0.4640751 0.6389526 0.4446999 0.6729408 0.4382595 0.3983864 0.5177556
## [8] 0.5411271 0.6667390 0.5429926
colMeans(mat1)
## [1] 0.5949241 0.4500704 0.5808290 0.5491985 0.4879423
Introduce the apply
function
apply(mat1, MARGIN = 1, mean)
## [1] 0.4640751 0.6389526 0.4446999 0.6729408 0.4382595 0.3983864 0.5177556
## [8] 0.5411271 0.6667390 0.5429926
What is a function?
Why write a function?
Example
Note: the example below is borrowed from the Software Carpentry Introduction to Programming materials.
Convert from Fahrenheit to Kelvin
fahr_to_kelvin <- function(temp) {
kelvin <- ((temp - 32) * (5 / 9)) + 273.15
return(kelvin)
}
Convert from Kelvin to Celsius
kelvin_to_celsius <- function(temp) {
celsius <- temp - 273.15
return(celsius)
}
Challenge
Write a function to convert from Fahrenheit to Celsius.
for
loopsHow do I do the same thing many times?
for (variable in collection) {
do things with variable
}
Let’s get more specific. Say we took a bunch of measurements of temperature in Fahrenheit, but want to convert them. How might we do it?
set.seed(8)
temp_data <- runif(n = 20, min = -5, max = 5) + 45
Challenge
What did we just do?
fahr_to_kelvin
function on each elementIteration 1.
for( x in temp_data){
fahr_to_kelvin(x)
}
Iteration 2.
for( x in temp_data){
print(fahr_to_kelvin(x))
}
## [1] 280.185
## [1] 278.749
## [1] 282.037
## [1] 281.216
## [1] 279.3806
## [1] 281.5885
## [1] 279.2104
## [1] 282.7737
## [1] 281.8675
## [1] 281.175
## [1] 280.1336
## [1] 278.0906
## [1] 279.9966
## [1] 280.622
## [1] 278.3624
## [1] 282.749
## [1] 277.6017
## [1] 279.0637
## [1] 279.1307
## [1] 280.4895
Iteration 3.
for( x in 1:length(temp_data)){
print(fahr_to_kelvin(temp_data[x]))
}
## [1] 280.185
## [1] 278.749
## [1] 282.037
## [1] 281.216
## [1] 279.3806
## [1] 281.5885
## [1] 279.2104
## [1] 282.7737
## [1] 281.8675
## [1] 281.175
## [1] 280.1336
## [1] 278.0906
## [1] 279.9966
## [1] 280.622
## [1] 278.3624
## [1] 282.749
## [1] 277.6017
## [1] 279.0637
## [1] 279.1307
## [1] 280.4895
Iteration 4.
temp_data_kelvin <- vector()
for( x in 1:length(temp_data)){
temp_data_kelvin[x] <- fahr_to_kelvin(temp_data[x])
}
print(temp_data_kelvin)
## [1] 280.1850 278.7490 282.0370 281.2160 279.3806 281.5885 279.2104
## [8] 282.7737 281.8675 281.1750 280.1336 278.0906 279.9966 280.6220
## [15] 278.3624 282.7490 277.6017 279.0637 279.1307 280.4895
We can use conditional statements to control the flow of our code, and to “make choices” as it progresses.
if
and else
The if
and else
statements are key to making choices in your code. Before understanding if
/else
statements, we need to review booleans - i.e., TRUE
and FALSE
TRUE
and FALSE
valuesTRUE
## [1] TRUE
20 == 20
## [1] TRUE
20 > 40
## [1] FALSE
A !
sign can be used as a logical negation
!(20 > 40)
## [1] TRUE
There are many logical operators to consider.
Challenge Describe what each fo the following operators are doing.
x <- TRUE
y <- FALSE
x & y
## [1] FALSE
x | y
## [1] TRUE
xy <- c(x,y)
any(xy)
## [1] TRUE
all(xy)
## [1] FALSE
Now that we know a bit about booleans, let’s get into if
/else
statements.
Essential, and if
is a conditional that says, “do this thing after the if
statement, if the conditional was TRUE.”
Here’s a simple example (taken from Software Carpentry’s lessons)
num <- 37
if (num > 100) {
print("greater")
} else {
print("not greater")
}
## [1] "not greater"
Challenge
Re-define num
so you get the other option.
We don’t need an else
statement for this to work -
num <- 37
if (num > 100) {
print("The number was greater than 100")
}
We can also write a “cascade” of if
/else
statements
if (num > 0) {
return(1)
} else if (num == 0) {
return(0)
} else {
return(-1)
}
## [1] 1
Challenge
Make the above into a function, that takes in any value, and returns whether it is positive, negative, or equal to 0.