Data science is a valuable, often necessary, skill for many jobs in environmental science and policy. Over the past 20 years, R has become the leading analysis tool in the environmental sciences. Python and other programming languages have also become increasingly prevalent. Any of these approaches increases replicability (or reproducability) of data analyses, which is important for the advancement of science and policy work. Documenting your analysis in R, using R scripts or Rmd files allows you to re-run your analysis whenever the need arises, and to share your analysis work flow with others. We are learning R in this program because it is fairly standard in our fields and learning this program / language will give you a good foundation to learn other programs as needed.
In this class, we will be working primarily in RStudio. So what is the difference between R and RStudio? R is both a programming language (specifically a statistical analysis programming language) and a software for using that language, while RStudio is an Integrated Development Environment, or IDE for short. RStudio offers a number of features, mostly related to visual presentation of information, that make writing and working with R code easier.
There are four panels in the RStudio interface (though you may only have three open when you first start it), each has valuable information.
Before we do anything in R/RStudio, let’s make a new folder on our computers where our class data can reside. You can use your operating systems file manager (i.e., Finder on Mac and Windows Explorer on Windows) to created a new folder where ever suites you.
ENS-623-Research-Stats
directorydata
directoryscripts
directoryGo back to RStudio. Let’s make a new R Project associated with your
ENS-623-Research-Stats
directory. To make a new project, go
to the upper right-hand side of the RStudio interface, where it says
Project: (None). Click the little downward arrow,
select “New Project”, then select “Existing Directory” from the window
that pops up. Use the graphical user interface (GUI) to navigate to the
ENS-623-Research-Stats
directory, then select “Create
Project”. Once you’ve created your project, your RStudio session will
restart in this project.
There is also now a file in the folder you just made that ends in *.Rproj. If you double click on this file, RStudio will open, opening this project at the same time. This is a good way to re-open RStudio for your next session.
If you open a project using the *.Rproj file, then your R session
will automatically set the working directory to your
ENS-623-Research-Stats
folder. If you need to set the
working directory manually, here are two ways to do that.
Point-and-click method - Use ‘Session’ > ‘Set Working Directory’ > ‘Choose Directory’.
Using the R Console:
setwd("/Users/maiellolammens/Dropbox/Pace/Teaching/ENS-623-Research-Stats/ENS-623-Research-Stats-SP25/")
help.search
help.search("bar plot")
Use the help.search
function to search for something in
statistics that you think should be in R? Did you find anything?
?barplot
We can use R just like any other calculator.
3 + 5
## [1] 8
There’s internal control for order of operations (Please Excuse My Dear Aunt Sally)
(3 * 5) + 7
## [1] 22
3 * 5 + 7
## [1] 22
Write an example where adding parentheses matters.
There are a ton of internal functions, and a lot of add-ons.
sqrt(4)
## [1] 2
abs(-5)
## [1] 5
sqrt(-5)
## Warning in sqrt(-5): NaNs produced
## [1] NaN
In order to ensure replicability we often use R script files. They are easy to go back to and to document your work while you go.
Important: within an R file, you can use the # sign to add comments. Anything written after the # is not interpreted when you run the code.
Important: we can write commands in our R script
file, then run/execute those commands by pressing CMD
+
RETURN
(Macs) or CNTR
+ RETURN
(Windows). We can also execute only a section of code by highlighting it
and pressing these same key strokes.
Create a new R script file in your scripts
directory.
Run a few basic commands in the new file.
# What working directory am I in?
getwd()
# Move to a different director?
setwd(".")
# What files are in the current directory?
dir()
fil
, what do you find?file
, and describe what you think it does.Using R Markdown, we can
integrate descriptive text, R code, and the output from that code, into
a seamless document that can be easily reproduced.
Markdown is a simple formatting syntax for authoring HTML, PDF, and MS
Word documents. While some of the details of using Markdown (and thus R
Markdown) can get tricky, the basics are very easy to pickup. RStudio
even comes with two very useful tools to learn and use Markdown. First,
go to the Help
tab and select
Markdown Quick Reference
. Second, a more detailed reference
can be found in the Help
-> Cheatsheets
section.
R Markdown is a tool that allows you to make a simple text document using the Markdown formatting syntax with R code and associated output embedded.
Go to the new file button (upper left corner of the RStudio
interface, or File
-> New File
) and choose
the R Markdown option. A screen will pop-up requesting information such
as Title, Author, and Output Format. Give your new doc a title and
select HTML as the output format. A new file that is
pre-populated with a bunch of text will open in the Source
panel.
We will go over each section briefly:
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. Click Knit now to see what happens.