Class expectations

See Syllabus

Class goals

Class format

Hypothesis testing and the scientific method

What is a theory?

A proposed explanation for some event or process that is based on the available evidence related to that event. We also call this a model. Models don’t necessarily mean math.

What is a hypothesis?

A prediction based on our model (or theory). For a hypothesis to be considered scientific it must be falsifable.

Scientific method

Traditional view:

  1. Develop a hypothesis
  2. Make a prediction based on hypothesis
  3. Do an experiment to test the prediction
  4. Analyze the data from our experiment
  5. Based on results, make an inference about our hypothesis - do the data support or reject our prediction?
  6. Repeat as necessary to acquire knowledge

Checkout Understanding Science

Analysis is iterative

The process of analyzing data is often iterative. That means, we may start looking at our data, applying analysis techniques that might not be the techniques we use in our final report (i.e., our reports and manuscripts). Don’t be afraid to start exploring your data, and applying analyses, before you think you’re ready.

Introduction to R and RStudio

Why do I have to learn to program in R?

In short, it’s “industry standard”. But the longer answer has to do with replicability of data analyses. Documenting your analysis in R, using R scripts or Rmd files allows you to re-run your analysis whenever the need arises, and to share your analysis workflow with others.

Getting started

Overall layout

  • Four panels, each has valuable information

File management

  • Setup a Biostatistics directory
  • Make a data directory
  • Make a scripts directory

Getting help

  • Help panel (lower right corner)
  • help.search
help.search("bar plot")

Challenge

Use the help.search function to search for something in statistics that you think should be in R? Did you find anything?

  • I know my function - just give me the details - ?barplot

R as calculator

We can use R just like any other calculator.

3 + 5
## [1] 8

There’s internal control for order of operations (Please Excuse My Dear Aunt Sally)

(3 * 5) + 7
## [1] 22
3 * 5 + 7
## [1] 22

Challenge

Write an example where adding parentheses matters.

There are a ton of internal functions, and a lot of add-ons.

sqrt(4)
## [1] 2
abs(-5)
## [1] 5
sqrt(-5)
## Warning in sqrt(-5): NaNs produced
## [1] NaN

R script file

Use a script file for your work. It’s easier to go back to and easy to document.

Important: within an R file, you can use the # sign to add comments. Anything written after the # is not interpreted when you run the code.

Challenge

Create a new R script file in your scripts directory.

Basic file managment in R

# What working directory am I in?
getwd()
## [1] "/Users/maiellolammens/Dropbox/Pace/Teaching/Web/ENS-623-Research-Stats/lectures"
# Move to a different director?
setwd(".")

Things to cover

  • Navigating the file path
  • Tab completion of file paths
  • Tab completion of R commands

Challenge

  • Try to auto-complete fil, what do you find?
  • Use the brief help menu that comes up to find a function that starts with file, and describe what you think it does.

Rmd file

Use this to integrate text and R code into the same document. I will expect most of your homework assignments as an Rmd file.

Practice with Rmd file

Variables and objects

There are several basic types of data structures in R.

  • VECTORS: One-dimensional arrays of numbers, character strings, or logical values (T/F)
  • FACTORS: One-dimensional arrays of factors (Stop - Let’s discuss factors)
  • DATA FRAMES: Data tables in which the various columns may be of different type
  • MATRICES: In R, matrices can only be 2-dimensional, anything higher dimension is called an array (see below). Matrix elements are numerical; some functions, like the transpose function t(), only work on matrices
  • ARRAYS: higher dimensional matrices are called arrays in R
  • LISTS: lists can contain any type of object as list elements. You can have a list of numbers, a list of matrices, a list of characters, etc., or any combination of the above.

Functions that are useful for understanding the different types of data structures

str()
class()

Practice with variables

Define a variable

my_var <- 8

And another

my_var2 <- 10

Work with vars

my_var + my_var2
## [1] 18

Make a new variable

my_var_tot <- my_var + my_var2

Challenge

Change the value of my_var2

my_var2 <- 3

What is the value of my_var_tot now?

Make a vector

Combining values into a vector

# Vector of variables
my_vect <- c(my_var, my_var2)

# Numeric vector
v1 <- c(10, 2, 8, 7, 11, 15)

# Char vector
pets <- c("cat", "dog", "rabbit", "pig")

Making a vector of numbers in sequence

v2 <- 1:10
v3 <- seq(from = 1, to = 10)

Challenge

  1. Look up the help for the seq function, and use this to make a vector from 1 to 100, by steps of 5.
  2. Come up with a way that you would use the length.out argument.