Let’s all make a new directory to work in today.
Let’s create a new script file in that directory.
Scripts make it easier to repeat your work. You can also add comments using the pound sign.
Short cut to execute commands and functions:
Mac: [Command] + return
Windows [Control] + return (RStudio) [Control] + r (R gui)
setwd("[your dir name here]")
5 + 3
## [1] 8
8^3
## [1] 512
6*3-1
## [1] 17
# Order of operations follows PEMDAS
6*(3-1)
## [1] 12
We can save things within our session as variables
pop_1 <- 1200
pop_2 <- 500
pop_total <- pop_1 + pop_2
pop_1 * 2
## [1] 2400
# Etc.
I chagne pop_1
pop_1 <- 2000
What is pop_total
now?
Best to use the full path to your data, but could also change into the directory you’re data is in, then call it in there.
fral_pres <- read.csv(file = "https://www.dropbox.com/s/x7s7fpu4bepj7xf/F_alnus_CompiledPres.csv?dl=1")
Let’s have a look at these data
head(fral_pres)
## SPEC LONG LAT UNCERTAIN PRIM_SOURCE
## 1 Frangula_alnus -71.17625 44.15129 NA WMNF Invasive Survey
## 2 Frangula_alnus -71.22623 44.17989 NA WMNF Invasive Survey
## 3 Frangula_alnus -71.19045 44.05796 NA WMNF Invasive Survey
## 4 Frangula_alnus -71.88835 43.84155 NA WMNF Invasive Survey
## 5 Frangula_alnus -71.18697 44.14703 NA WMNF Invasive Survey
## 6 Frangula_alnus -71.10204 44.15750 NA WMNF Invasive Survey
## FIELD_HERB YEAR
## 1 Field 2002
## 2 Field 2001
## 3 Field 2006
## 4 Field 2002
## 5 Field 2002
## 6 Field 2001
tail(fral_pres)
## SPEC LONG LAT UNCERTAIN PRIM_SOURCE FIELD_HERB
## 2350 Frangula_alnus -90.63560 44.19780 NA GLIFWC Unknown
## 2351 Frangula_alnus -89.57310 45.80220 NA GLIFWC Unknown
## 2352 Frangula_alnus -80.08867 40.54090 NA GLIFWC Unknown
## 2353 Frangula_alnus -88.22000 42.57000 NA GLIFWC Unknown
## 2354 Frangula_alnus -86.94662 45.87737 NA GLIFWC Unknown
## 2355 Frangula_alnus -87.65442 41.85320 NA GLIFWC Unknown
## YEAR
## 2350 2012
## 2351 2012
## 2352 2012
## 2353 2012
## 2354 2012
## 2355 2012
summary(fral_pres)
## SPEC LONG LAT UNCERTAIN
## Frangula_alnus:2355 Min. :-96.61 Min. :38.60 Min. : 10
## 1st Qu.:-89.54 1st Qu.:42.46 1st Qu.: 10
## Median :-77.01 Median :43.68 Median : 10
## Mean :-80.98 Mean :43.89 Mean : 2414
## 3rd Qu.:-71.56 3rd Qu.:45.82 3rd Qu.: 1000
## Max. :-63.00 Max. :47.82 Max. :40000
## NA's :1506
## PRIM_SOURCE FIELD_HERB YEAR
## GLIFWC :827 Field :1631 Min. :1879
## IPANE :553 Herbarium: 643 1st Qu.:2001
## NY_iMAP:308 Unknown : 81 Median :2004
## WIS : 85 Mean :1998
## CONN : 84 3rd Qu.:2008
## CM : 68 Max. :2012
## (Other):430 NA's :3
names(fral_pres)
## [1] "SPEC" "LONG" "LAT" "UNCERTAIN" "PRIM_SOURCE"
## [6] "FIELD_HERB" "YEAR"
str(fral_pres)
## 'data.frame': 2355 obs. of 7 variables:
## $ SPEC : Factor w/ 1 level "Frangula_alnus": 1 1 1 1 1 1 1 1 1 1 ...
## $ LONG : num -71.2 -71.2 -71.2 -71.9 -71.2 ...
## $ LAT : num 44.2 44.2 44.1 43.8 44.1 ...
## $ UNCERTAIN : int NA NA NA NA NA NA NA NA NA NA ...
## $ PRIM_SOURCE: Factor w/ 34 levels "A","ACAD","B",..: 33 33 33 33 33 33 33 33 33 33 ...
## $ FIELD_HERB : Factor w/ 3 levels "Field","Herbarium",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ YEAR : int 2002 2001 2006 2002 2002 2001 2006 2007 2005 2002 ...
Let’s say we realized that we had a mistake in our data. For example, one of UNCERTAIN
values was recorded incorrectly. How can we change this?
fral_pres_fixed <- fral_pres
fral_pres_fixed$UNCERTAIN[1]
## [1] NA
fral_pres_fixed$UNCERTAIN[1] <- 20
fral_pres_fixed$UNCERTAIN[1]
## [1] 20
Let’s get only a subset of these data, selecting from the data.frame
by columns.
fral_pres_subset <- fral_pres[c("SPEC", "LONG", "LAT")]
Next, let’s rename our columns so they are in the format used in Wallace.
names(fral_pres_subset)
## [1] "SPEC" "LONG" "LAT"
names(fral_pres_subset) <- c("name", "longitude", "latitude")
Let’s now make a new file with the fixed data.
write.csv(x = fral_pres_subset, file = "~/Dropbox/SCCS-Workshop/fral_pres.csv", row.names = FALSE)
Some statistics of note.
mean(fral_pres$LAT)
## [1] 43.89405
max(fral_pres$LAT)
## [1] 47.81744
min(fral_pres$LAT)
## [1] 38.6
median(fral_pres$LAT)
## [1] 43.67897
Use indexing and the functions we just learned to determin the mean, min, and max latitude and longitude of all of the Herbarium specimens vs. the Field specimens.
plot(x = fral_pres$LONG, y = fral_pres$LAT)