(Remi/David) Now, if you remember what we did yesterday, we loaded and cleaned my larval abundance data, but I did not follow the principles of ‘tidy’ data when I uploaded my data.
Again, this data comes from one of Remi’s PhD thesis chapters1 that was part of the first CHONe. We cleaned in the Data cleaning and raw data management and is also archived on Dryad so we would have cleaning to do!
Reminder: The ‘tidy’ concept is something we will revisit in the R section, but the rules of tidy data are2:
While it was certainly easier to enter the abundance data in a spreadsheet by giving each species a column, this does not abide by the ‘tidy’ rules and we will have a much easier time analysing and plotting our data if we first ‘tidy’ the data.
# let's load all packages and the data before we begin
library(sf)
library(raster)
library(marmap)
library(robis)
library(gridExtra)
library(tidyverse)
larvalAbundance <- read.csv("data/larvalAbundanceClean.csv", stringsAsFactors = FALSE)
head(larvalAbundance)
tidyverse
‘meta’ packageThe tidyverse
is a package that is made up of other packages commonly used together for data analysis and vizualization. You can find much more detail in the book R for Data Science by Garrett Grolemund and Hadley Wickham. Below we will learn about 3 of them:
tidyr
for tidying datadplyr
for manipulating dataggplot2
for plotting datatidyr
packageThis aptly named package will allow us to tidy our data. Our data is currently in the wide format since we have multiple observations of species abundances in seperate columns. We want our data to be in the long format in order to facilitate analysis and plotting (and to abide by the ‘tidy’ rules).
We currently have a few ID type variable (i.e. they are not observations; year
,month
,day
,time
,depth
,site
,long
,lat
) and an observational variable for each species.
names(larvalAbundance)
[1] "year" "month" "day" "time"
[5] "depth" "locationID" "decimalLatitude" "decimalLongitude"
[9] "Astyris.lunata" "Bittiolum.alternatum" "Margarites.spp." "Arrhoges.occidentalis"
[13] "Diaphana.minuta" "Crepidula.spp." "Other.Gastropods" "Mytilus.spp."
[17] "Modiolus.modiolus" "Anomia.simplex" "Other.Bivalve" "Electra.pilosa"
[21] "Membranipora.membranacea" "Carcinus.maenas" "Cancer.irroratus" "Neopanopeus.sayi"
[25] "Crangon.septemspinosa"
larvalAbundanceLong <- gather(larvalAbundance, # the old data frame
key = "scientificName", # what the column of old column names will be called
value = "larvalAbundanceIndPerm3", # what the gathered numbers will be called
Astyris.lunata:Crangon.septemspinosa) # which columns to gather
# alternatively, I could of specified which columns not to include instead
larvalAbundanceLong <- gather(larvalAbundance, # the old data frame
key = "scientificName", # what the column of old column names will be called
value = "larvalAbundanceIndPerm3", # what the gathered numbers will be called
-year,-month,-day,-time,-depth,-locationID,-decimalLongitude,-decimalLatitude) # which columns not to gather
head(larvalAbundanceLong)
We can also clean up and/or separate the scientificName
column
# first let's swap multiple species
larvalAbundanceLong <- separate(larvalAbundanceLong, #old data frame
col = scientificName, # column to be seperated
into = c("genus","species"), # to be seperated into
extra = "merge", # if there are too many ".", what to do with the extra column
remove = FALSE, # whether to delete the seperated column
sep = "\\.") # regex for ".", but need escape character "\\" since . is a wildcard
# Let's check out our work
head(larvalAbundanceLong)
# genus and species don't look terrible, but scientificName still has that pesky period
# we could use gsub again, but the regex is tricky since there are spp. for species
# pasting the genus and species together will be much easier
larvalAbundanceLong$scientificName <- paste(larvalAbundanceLong$genus,larvalAbundanceLong$species,sep = " ")
# also, spp. is not a species designation, let's get rid of them
larvalAbundanceLong$species[larvalAbundanceLong$species=="spp."] <- NA
# Let's check out our work again
head(larvalAbundanceLong)
dplyr
packageManipulation of dataframes means many things to many researchers, we often select certain observations (rows) or variables (columns), we often group the data by a certain variable(s), or we even calculate summary statistics. We can do these operations using the normal base R operations:
mean(larvalAbundanceLong[larvalAbundanceLong$scientificName == "Astyris lunata", "larvalAbundanceIndPerm3"])
[1] 13.85279
mean(larvalAbundanceLong[larvalAbundanceLong$scientificName == "Bittiolum alternatum", "larvalAbundanceIndPerm3"])
[1] 1.713291
mean(larvalAbundanceLong[larvalAbundanceLong$scientificName == "Arrhoges occidentalis", "larvalAbundanceIndPerm3"])
[1] 0.7773257
But this isn’t very nice because there is a fair bit of repetition. Repeating yourself will cost you time, both now and later, and potentially introduce some nasty bugs.
Luckily, the dplyr
package provides a number of very useful functions for manipulating dataframes in a way that will reduce the above repetition, reduce the probability of making errors, and probably even save you some typing. As an added bonus, you might even find the dplyr
grammar easier to read.
Here we’re going to cover 5 of the most commonly used functions as well as using pipes (%>%
) to combine them.
select()
filter()
group_by()
summarize()
mutate()
select()
If, for example, we wanted to move forward with only a few of the variables in our dataframe we could use the select()
function. This will keep only the variables you select.
yearAndSpecies <- select(larvalAbundanceLong,year,species,genus,larvalAbundanceIndPerm3)
If we open up yearAndSpecies
we’ll see that it only contains the year, and the larval abundance for each species.
Above we used ‘normal’ grammar, but the strengths of dplyr
lie in combining several functions using pipes (%>%
). Since the pipes grammar is unlike anything we’ve seen in R before, let’s repeat what we’ve done above using pipes.
yearTaxonAbun <- larvalAbundanceLong %>% select(year,species,genus,larvalAbundanceIndPerm3)
To help you understand why we wrote that in that way, let’s walk through it step by step. In both cases, the left side of <-
is yearTaxonAbun
. First we summon the larvalAbundanceLong
dataframe and pass it on, using the pipe symbol %>%
, to the next step, which is the select()
function. In this case we don’t specify which .data
argument we use in the select()
function since in gets that from the previous pipe. Essentially, the output of what is written before the pipe becomes the first argument, in this case .data
, of the function after the pipe. Fun Fact: There is a chance you have encountered pipes before in the shell. In R, a pipe symbol is %>%
while in the shell it is |
but the concept is the same!
Pro-tip
Did you get an error while running the
select()
function? Something like:Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘select’ for signature ‘"data.frame"’
That’s because many packages have aselect()
function; In this case, it’s theraster
package causing problems. It is possible to get around that problem the ‘hacky’ way by loadingtidyverse
after all the other packages so that thedplyr
(which is in thetidyverse
) version ofselect
sort of overwrites the one loaded fromraster
.Alternatively, you can designate which package using the
::
(e.g.package::function()
) when it’s not clear.yearTaxonAbun <- larvalAbundanceLong %>% dplyr::select(year,species,genus,larvalAbundanceIndPerm3)
filter()
If we now wanted to move forward with the above, but only with species for which we have a valid genus
and species
, we can combine select
and filter
yearTaxonAbunValid <- larvalAbundanceLong %>%
select(year,species,genus,larvalAbundanceIndPerm3) %>%
filter(genus!="Other") %>%
filter(!is.na(species))
# we can also string the two filter operations into one using the and symbol '&'
yearTaxonAbunValid <- larvalAbundanceLong %>%
select(year,species,genus,larvalAbundanceIndPerm3) %>%
filter(genus!="Other" & !is.na(species))
** Challenge 1 **
Write a single command (which can span multiple lines and includes pipes) that will produce a dataframe that has the values for
larvalAbundanceIndPerm3
for all species atlocationID
5, but not for other sites. I also want locationID, and lat/long removed from the output. How many rows does your dataframe have and why? Solution
group_by()
Now, we were supposed to be reducing the error prone repetitiveness of what can be done with base R, but up to now we haven’t done that since we would have to repeat the above for each species. Instead of filter()
, which will only pass observations that meet your criteria (in the above: locationID==5
), we can use group_by()
, which will essentially use every unique criteria that you could have used in filter.
str(larvalAbundanceLong)
'data.frame': 1343 obs. of 12 variables:
$ year : int 2008 2008 2008 2008 2008 2008 2008 2008 2008 2008 ...
$ month : int 8 8 8 8 8 8 8 8 8 8 ...
$ day : int 7 8 7 7 7 7 7 8 8 8 ...
$ time : chr "10:40:00" "9:05:00" "12:14:00" "13:47:00" ...
$ depth : int 3 12 12 12 12 12 12 12 12 12 ...
$ locationID : int 5 5 6 7 8 9 10 11 12 13 ...
$ decimalLatitude : num 45.8 45.8 45.8 45.8 45.8 ...
$ decimalLongitude : num -61.9 -61.9 -61.8 -61.7 -61.6 ...
$ scientificName : chr "Astyris lunata" "Astyris lunata" "Astyris lunata" "Astyris lunata" ...
$ genus : chr "Astyris" "Astyris" "Astyris" "Astyris" ...
$ species : chr "lunata" "lunata" "lunata" "lunata" ...
$ larvalAbundanceIndPerm3: num 4.478 12.548 31.834 21.162 0.324 ...
str(larvalAbundanceLong %>% group_by(locationID))
Classes ‘grouped_df’, ‘tbl_df’, ‘tbl’ and 'data.frame': 1343 obs. of 12 variables:
$ year : int 2008 2008 2008 2008 2008 2008 2008 2008 2008 2008 ...
$ month : int 8 8 8 8 8 8 8 8 8 8 ...
$ day : int 7 8 7 7 7 7 7 8 8 8 ...
$ time : chr "10:40:00" "9:05:00" "12:14:00" "13:47:00" ...
$ depth : int 3 12 12 12 12 12 12 12 12 12 ...
$ locationID : int 5 5 6 7 8 9 10 11 12 13 ...
$ decimalLatitude : num 45.8 45.8 45.8 45.8 45.8 ...
$ decimalLongitude : num -61.9 -61.9 -61.8 -61.7 -61.6 ...
$ scientificName : chr "Astyris lunata" "Astyris lunata" "Astyris lunata" "Astyris lunata" ...
$ genus : chr "Astyris" "Astyris" "Astyris" "Astyris" ...
$ species : chr "lunata" "lunata" "lunata" "lunata" ...
$ larvalAbundanceIndPerm3: num 4.478 12.548 31.834 21.162 0.324 ...
- attr(*, "vars")=List of 1
..$ : symbol locationID
- attr(*, "drop")= logi TRUE
- attr(*, "indices")=List of 16
..$ : int 0 1 12 23 34 45 46 62 63 79 ...
..$ : int 2 13 24 35 47 64 81 92 103 114 ...
..$ : int 3 14 25 36 48 65 82 93 104 115 ...
..$ : int 4 15 26 37 49 66 83 94 105 116 ...
..$ : int 5 16 27 38 50 67 84 95 106 117 ...
..$ : int 6 17 28 39 51 68 85 96 107 118 ...
..$ : int 7 18 29 40 52 69 86 97 108 119 ...
..$ : int 8 19 30 41 53 70 87 98 109 120 ...
..$ : int 9 20 31 42 54 71 88 99 110 121 ...
..$ : int 10 21 32 43 55 72 89 100 111 122 ...
..$ : int 11 22 33 44 56 73 90 101 112 123 ...
..$ : int 57 74 136 153 215 232 294 311 373 390 ...
..$ : int 58 75 137 154 216 233 295 312 374 391 ...
..$ : int 59 76 138 155 217 234 296 313 375 392 ...
..$ : int 60 77 139 156 218 235 297 314 376 393 ...
..$ : int 61 78 140 157 219 236 298 315 377 394 ...
- attr(*, "group_sizes")= int 153 102 102 102 102 102 102 102 102 102 ...
- attr(*, "biggest_group_size")= int 153
- attr(*, "labels")='data.frame': 16 obs. of 1 variable:
..$ locationID: int 5 6 7 8 9 10 11 12 13 14 ...
..- attr(*, "vars")=List of 1
.. ..$ : symbol locationID
..- attr(*, "drop")= logi TRUE
You will notice that the structure of the dataframe where we used group_by()
(grouped_df
) is not the same as the original larvalAbundanceLong
(data.frame
). A grouped_df
can be thought of as a list
where each item in the list
is a data.frame
which contains only the rows that correspond to the a particular value locationID
(at least in the example above).
summarize()
The above was a bit on the uneventful side because group_by()
much more exciting in conjunction with summarize()
. This will allow use to create new variable(s) by using functions that repeat for each of the locationID-specific data frames. That is to say, using the group_by()
function, we split our original dataframe into multiple pieces, then we can run functions (e.g. mean()
or sd()
) within summarize()
.
IndPerm3_byLocationID <- larvalAbundanceLong %>%
group_by(locationID) %>%
summarize(mean_larvalAbundanceIndPerm3 = mean(larvalAbundanceIndPerm3))
That allowed us to calculate the mean larval abundance for each location, but it gets so much better!
** Challenge 2 **
Calculate the mean larval abundance per locationID. Which has the highest mean larval abundance and which has the lowest mean larval abundance? Solution
The function group_by()
allows us to group by multiple variables. Let’s group by locationID
and scientificName
.
IndPerm3_byLocationID_bySpp <- larvalAbundanceLong %>%
group_by(locationID,scientificName ) %>%
summarize(mean_larvalAbundanceIndPerm3 = mean(larvalAbundanceIndPerm3))
That is already quite powerful, but it gets even better! You’re not limited to defining 1 new variable in summarize()
.
IndPerm3_byLocationID_bySpp <- larvalAbundanceLong %>%
group_by(locationID,scientificName ) %>%
summarize(mean_larvalAbundanceIndPerm3 = mean(larvalAbundanceIndPerm3),
sd_larvalAbundanceIndPerm3 = sd(larvalAbundanceIndPerm3),
mean_depth = mean(depth),
sd_depth= sd(depth))
mutate()
We can also create new variables prior to (or even after) summarizing information using mutate()
.
IndPerm3_byLocationID_bySpp <- larvalAbundanceLong %>%
mutate(larvalAbundanceIndPercm3=larvalAbundanceIndPerm3*1000000) %>%
group_by(locationID,scientificName ) %>%
summarize(mean_larvalAbundanceIndPercm3 = mean(larvalAbundanceIndPercm3),
sd_larvalAbundanceIndPercm3 = sd(larvalAbundanceIndPercm3),
mean_depth = mean(depth),
sd_depth= sd(depth))
R comes with basic plotting function, usually referred to as ‘base plot’, e.g:
plot(larvalAbundance$Margarites.spp.,larvalAbundance$Diaphana.minuta)
However, general concensus seems to be that ggplot2
(part of the tidyverse) is the way to go for plotting. I (Remi) have mixed feelings about this package (my age is showing again). It does make making quick plots easier and is compatible with the rest of the tidyverse, but until recently, ggplot2
really struggled with any spatial plotting (see ‘The classic and the cutting edge: sp
vs sf
’) so I had to know the base plot anyway. I still think base plot is more customizeable, but it may also be that I have not sufficiently mastered the ggplot world. We’re mostly going to cover ggplot2
, but be aware that there are other plotting paradigms in R (e.g. base, lattice, etc)
ggplot2
packageA system for ‘declaratively’ creating graphics, based on “The Grammar of Graphics”. You provide the data, tell ‘ggplot2’ how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details
So for the same plot as above, let me walk you through each line below step by step. The first line creates a new blank ggplot plot. Some people set aes()
and data
in ggplot()
, but since different layers can take different aes()
and data
, I like to set those in the geom functions.
Unlike other tidyverse
functions, you can tie together ggplot2
functions with +
instead of %>%
, but otherwise it works the same. So to our blank plot, we’re adding geom_point()
which is the function for a scatterplot. We’re telling geom_point()
that all the data it needs should be in larvalAbundance
and what aesthetic mapping we want (aes()
) which sets the variables used in the plot and takes arguments like x
, y
, colour
, size
.
Finally, I set the theme as theme_classic
because the default plots are ugly! (try it, I dare you) #
ggplot() +
geom_point(data = larvalAbundance, aes(x=Margarites.spp.,y=Diaphana.minuta)) +
theme_classic()
Yeah, that was way more code than base plot just to get a simple plot! But now that we have the basic ggplot skeleton, our investment is going to pay off because it’s now super easy to modify. Let’s assign different colours for different depths.
ggplot() +
geom_point(data = larvalAbundance, aes(x=Margarites.spp.,y=Diaphana.minuta,colour=as.character(depth))) +
theme_classic()
I had to convert year
and depth
to characters since ggplot assumes that numeric variables are continous. Let’s combine what we learned above about the tidyverse.
# prepare data for plotting
larvalAbundancePlot <- larvalAbundance %>%
mutate(Year = as.character(year),
Depth = as.character(depth))
# plot new data
ggplot() +
geom_point(data=larvalAbundancePlot,aes(x=Margarites.spp.,y=Diaphana.minuta,colour=Depth,shape=Year)) +
theme_classic()
Let’s add a linear regression and make the axes log scale:
ggplot() +
geom_point(data=larvalAbundancePlot,aes(x=Margarites.spp.,y=Diaphana.minuta,colour=Depth,shape=Year)) +
geom_smooth(method="lm")+
scale_x_log10()+
scale_y_log10()+
theme_classic()
Try commenting out (#
) some of those lines to see what they do.
Now that you understand how that works, let’s try a few different plot types
# prepare long data for plotting
larvalAbundanceLongPlot <- larvalAbundanceLong %>%
mutate(Year = as.character(year),
Depth = as.character(depth))
# plot new data
ggplot() +
geom_boxplot(data=larvalAbundanceLongPlot,aes(x=Depth,y=larvalAbundanceIndPerm3,fill=Depth)) +
scale_y_log10()+
theme_classic()
Ready to get your mind blown? By adding 1 line to the above, we can have a plot for each species
ggplot() +
geom_boxplot(data=larvalAbundanceLongPlot,aes(x=Depth,y=larvalAbundanceIndPerm3,fill=Depth)) +
scale_y_log10()+
facet_wrap(~scientificName)+
theme_classic()
Or swapping out the geom gives you a completely different (better) plot.
ggplot() +
geom_violin(data=larvalAbundanceLongPlot,aes(x=Depth,y=larvalAbundanceIndPerm3,fill=Depth)) +
scale_y_log10()+
facet_wrap(~scientificName)+
theme_classic()
Or swapping out the geom again and we can map the abundance.
# prepare long data for plotting
larvalAbundanceLongPlot <- larvalAbundanceLong %>%
group_by(scientificName,decimalLongitude,decimalLatitude) %>%
summarize(mean_IndPerm3 = mean(larvalAbundanceIndPerm3))
# plot new mean data
ggplot() +
geom_point(data=larvalAbundanceLongPlot,aes(x=decimalLongitude,y=decimalLatitude,size=log10(mean_IndPerm3))) +
facet_wrap(~scientificName)+
theme_classic()
Let’s add a bit more to these maps with some proper mapping tools. But before we move away from ggplot, bookmark this excellent “R Graph Catalog”. You can visually find a graph you want to make and get the code! Super useful!
You can always plot any spatial data and use latitude and longitude as normal Cartesian coordinates as we did above, but if you want to do anything with projections, or other GIS type operations (buffer, intersection, overlaps, etc), you really need to dig a little deeper.
Projection and datum information in R in both sp
and sf
(as well as many other applications) are handled by the proj.4 standard. We won’t go in to deep into the ‘how to choose a projection’, but there is substantial information regarding the parameters on the proj.4 website and this Projection Wizard is super helpful for recommending projection (and producing formatted proj4 strings) based on the extent of your map and what you want to optimize for (i.e. Equal-area, Conformal, Equidistant, Compromise).
Be aware that in most cases R will throw an error if you are trying to use multiple spatial objects with different transformations, so remember to transform them to a common projection before trying to combine them. Also, remember that some projections have units of latitude and longitude and some have units of meters and they are of course incompatible.
sp
vs sf
Before November 2016, if you wanted to do any spatial tasks in R your choice was a simple one. You needed to use the sp
package (along with rgeos
, and rgdal
). Now, the sf
package is replacing that set of packages, and that’s a good thing because it allows for faster computation and it is compatible with the tidyverse
. However, many of the helper packages that were built to interact with sp
have not been upgraded to sf
(yet).
sp
packageThe bread and butter ‘Spatial’ object class from the sp
package are of the form SpatialBLANKDataFrame
where blank can be Points
, Lines
, Polygons
, etc. Let’s create a SpatialPointsDataFrame
from our larvalAbundanceLong
# first make spatial points and set the projection (more on that below)
spoints <- SpatialPoints(coords = larvalAbundanceLong[,c("decimalLongitude","decimalLatitude")],
proj4string = CRS("+proj=longlat +datum=WGS84"))
# then make a SpatialPointsDataFrame by addint the data frame back in
spointsdf <- SpatialPointsDataFrame(spoints,larvalAbundanceLong)
# now for each point in spointsdf there is a row in the data frame which can be accessed by spointsdf@data, or if you want a particular variable spointsdf$year (or spointsdf@data$year), but the data and the geometries are a little disjoint.
# plot it
plot(spointsdf)
That’s not particularly useful without a basemap, so let’s use the raster
package to get some high resolution province level boundaries data (not the only way to get coastlines, but one of my favorite).
Canada <- getData('GADM', country="CAN", level=1)
plot(Canada)
# this is a little excessive since we only need Nova Scotia
NS <- Canada[Canada$NAME_1=="Nova Scotia",]
plot(NS,add=T,col='blue')
Now let’s combine my data with the coastline
# see the bounding box of the larval data
bbox(spointsdf)
min max
decimalLongitude -61.88518 -61.57103
decimalLatitude 45.71147 46.05218
# transform both Spatial objects
proj_aea <- "+proj=aea +lat_1=45.34229371956944 +lat_2=46.154426884765485 +lon_0=-61.70471191406251"
spointsdf_aea <- spTransform(spointsdf,proj_aea)
NS_aea <- spTransform(NS,proj_aea)
# see the bounding box of the larval data after projection
bbox(spointsdf_aea)
min max
decimalLongitude -13996.94 10365.71
decimalLatitude 4714435.00 4752308.03
plot(spointsdf_aea,pch=16, xlim = c(-15000,12000), ylim = c(4705000,4755000))
plot(NS_aea, col = 'grey',add=T)
You can find more info about the other Spatial classes in my GIS tutorial I wrote before sf
was created.
sf
packageThe sf
package is an an R implementation of “Simple Features” (officially Simple Feature Access) which is both an Open Geospatial Consortium (OGC) and International Organization for Standardization (ISO) standard ISO 19125 that specifies a common storage and access model of mostly two-dimensional geometries (point, line, polygon, multi-point, multi-line, etc.) used by geographic information systems.
The bread and butter object classes in the sf
package are sf
(created by st_sf()
) which are effectively a normal data frame with the addition of one (or more) simple features list column (sfc
; created by st_sfc()
). This sfc
column, which is really just another column in your data frame, contains the geometric information that gets plotted on a map. This geometric information is stored as a simple feature which can take the form of polygon(s) (created by st_polygon()
or st_multipolygon()
), line(s) (created by st_st_linestring()
or st_multist_linestring()
), or point(s) (created by st_point()
or st_multipoint()
)
Let’s start by turning our larvalAbundanceLong
into an sf
object and transform it into our desired projection in 1 command (reminder: sf
is compatible with the tidyverse):
larvalAbundanceLong_sf <- larvalAbundanceLong %>%
st_as_sf(coords = c("decimalLongitude", "decimalLatitude"),
crs="+proj=longlat +datum=WGS84",
remove=FALSE) %>%
st_transform(proj_aea)
ggplot()+
geom_sf(data = larvalAbundanceLong_sf)
As you may have noticed, geom_sf
does not need a defined aes()
since is already knows that latitude is your y-axis and longitude is the x-axis. You can of course toggle group
, colour
, size
, and fill
in the aes
based on other columns in your data frame.
Let’s plot is with a basemap! raster
only provides the GADM coastline data in sp
format, but we can convert with st_as_sf()
and then filter()
to get just the Maritime provinces with dplyr
MarProv_sf <- getData('GADM', country="CAN", level=1) %>%
st_as_sf() %>%
filter(NAME_1=="Nova Scotia"|
NAME_1=="Prince Edward Island"|
NAME_1=="New Brunswick")%>%
st_transform(proj_aea)
ggplot()+
geom_sf(data = larvalAbundanceLong_sf)+
geom_sf(data = MarProv_sf)
And to make things fun, let’s have one zoomed in plot with the dots, and a zoomed out plot with a ‘zoom box’
#define the corners (with a buffer)
larvalbbox <- st_buffer(larvalAbundanceLong_sf, dist = 10000) %>%
st_bbox()
#create a matrix that traces a line the whole way around your box
bbsfc <- cbind(
c(larvalbbox[1],larvalbbox[3],larvalbbox[3],larvalbbox[1],larvalbbox[1]),
c(larvalbbox[2],larvalbbox[2],larvalbbox[4],larvalbbox[4],larvalbbox[2])
) %>%
list() %>% # put that matrix in a list
st_polygon() %>% # put that list in a simple features polygon
st_sfc(crs = proj_aea) %>% # put that polygon into a simple features column and give projection info
st_sf(name="Study Site", geometry=.) # finally make it an sf object
# plot it!
p1 <- ggplot()+
geom_sf(data = MarProv_sf)+
geom_sf(data = bbsfc, size = 2, fill = "transparent")
p2 <- ggplot()+
geom_sf(data = larvalAbundanceLong_sf)+
geom_sf(data = MarProv_sf) +
coord_sf(xlim = larvalbbox[c(1,3)],ylim = larvalbbox[c(2,4)])
grid.arrange(p1,p2)
Finally, let’s bring facets back!
#define the corners (with a buffer)
larvalbbox <- st_buffer(larvalAbundanceLong_sf, dist = 10000) %>%
st_bbox()
#create a matrix that traces a line the whole way around your box
bbsfc <- cbind(
c(larvalbbox[1],larvalbbox[3],larvalbbox[3],larvalbbox[1],larvalbbox[1]),
c(larvalbbox[2],larvalbbox[2],larvalbbox[4],larvalbbox[4],larvalbbox[2])
) %>%
list() %>% # put that matrix in a list
st_polygon() %>% # put that list in a simple features polygon
st_sfc(crs = proj_aea) %>% # put that polygon into a simple features column and give projection info
st_sf(name="Study Site", geometry=.) # finally make it an sf object
# plot it!
p1 <- ggplot()+
geom_sf(data = MarProv_sf)+
geom_sf(data = bbsfc, size = 2, fill = "transparent")
p2 <- ggplot()+
geom_sf(data = larvalAbundanceLong_sf)+
geom_sf(data = MarProv_sf) +
coord_sf(xlim = larvalbbox[c(1,3)],ylim = larvalbbox[c(2,4)])
grid.arrange(p1,p2)
# oh, and if you want to save a plot, pick your format, size, and resolution
ggsave(filename="speciesabundance.tiff", plot = p3, width = 20, height = 16.5, unit = "cm", dpi = 300)
ggsave(filename="speciesabundance.pdf", plot = p3, width = 20, height = 16.5, unit = "cm", dpi = 300)
ggsave(filename="speciesabundance.png", plot = p3, width = 20, height = 16.5, unit = "cm", dpi = 300)
R for Data Science R Graph Catalog A GIS tutorial written by Remi (mostly in sp
) Projection stuff: proj.4 and Projection Wizard
Challenge 1 solution
site5 <- larvalAbundanceLong %>% filter(locationID==5) %>% select(-locationID,-decimalLatitude,-decimalLatitude)
Note: The order of operations is very important in this case. If we used ‘select’ first, filter would not be able to find the variable continent since we would have removed it in the previous step.
Challenge 2 solution
IndPerm3_byLocationID <- larvalAbundanceLong %>% group_by(locationID) %>% summarize(mean_IndPerm3 = mean(larvalAbundanceIndPerm3)) %>% filter(mean_IndPerm3 == min(mean_IndPerm3) | mean_IndPerm3 == max(mean_IndPerm3))
Daigle RM, Metaxas A, deYoung B (2014) Bay-scale patterns in the distribution, aggregation and spatial variability of larvae of benthic invertebrates. Marine Ecology Progress Series 503:139-156. http://dx.doi.org/10.3354/meps10734↩
Jeff Leek, The Elements of Data Analytic Style, Leanpub, 2015-03-02↩