This week we are going to continue learning how to manipulate and analyze our data. To do this we’ll need to use a few new packages. Packages in R are groups of functions, typically designed for a specific purpose, that come with detailed documentation on how to use them, and often sampe data sets. For example there are packages designed to work with geospatial data, perform specific statistical tests, and many other things. Today we are going to use two packages called lubridate and dplyr that are useful for working with dates and manipulating data, respectively.
In RStudio you can either install packages from using commands in the console, or using the Tools menu in the toolbar. We will use commands to do this:
install.packages(c("dplyr", "lubridate"))
This downloads our packages, but now we need to load them into memory for our R session. We do this using the library command. In general it is good practice to load any packages you need in the first lines of your scripts.
library(dplyr)
library(lubridate)
We can also use the help command to learn more about a package. Let’s look at the documentation for the lubridate
package.
help(package = "lubridate")
Now we have these packages loaded in memory in R. We’ll continue today using data from Siberia archived at the Arctic Data Center: https://arcticdata.io/catalog/view/doi:10.18739/A2S46H57S
Let’s start again by looking at air temperature.
# read in our air temperature data
t_air <- read.csv("https://arcticdata.io/metacat/d1/mn/v2/object/urn%3Auuid%3A3e962856-7ece-4ff8-870a-82511c31c974",header = T)
Today we are going to use several functions in the lubridate package to work with the dates in our data. First we need to combine all of the date and time attributes into a single character vector. In addition we need to separate out the hours and minutues. We can use the paste
function to combine the year, doy, hour, and minute. Remember R will operate on entire vectors.
# create a character vector with date and time information
d <- paste(t_air$year,t_air$doy,floor(t_air$hour),(t_air$hour-floor(t_air$hour))*60, sep = "_")
# have a look at our new vector
head(d)
## [1] "2016_181_18_30" "2016_181_19_0" "2016_181_19_30" "2016_181_20_0"
## [5] "2016_181_20_30" "2016_181_21_0"
Before going any further take a moment to figure out exactly what we did to create our new character vector with date and time information. What two new functions did we use? What were our inputs?
Now we can use the as_datetime
function to convert our character vector into a special kind of date object. Remember we use the $
operator to add new variables (columns) to a dataframe.
# convert our charcter vector to a date/time obejct and assign the output to a new variable in our dataframe. The format argument is where we tell R how the date is arranged.
t_air$dt <- as_datetime(d,format = "%Y_%j_%H_%M")
# have a look at our new date and time column
head(t_air$dt)
## [1] "2016-06-29 18:30:00 UTC" "2016-06-29 19:00:00 UTC"
## [3] "2016-06-29 19:30:00 UTC" "2016-06-29 20:00:00 UTC"
## [5] "2016-06-29 20:30:00 UTC" "2016-06-29 21:00:00 UTC"
# use the month function to create a month variable
t_air$month <- month(t_air$dt)
We used the as_datetime
and month
functions above. Now we can use the functions from the dplyr package to aggregate our date to monthly values. In addition to these new functions, we’ll be using a new operator %>%
to send the output from one command right into the next, this is called piping.
# create a new object called t_month that is identical to our t_air dataframe and pipe it to the next line
t_month <- t_air %>%
# group the data by year, month, site, and sensor height, and pipe to the next line
group_by(year,month,site,sensorZ) %>%
# summarize data - calculate the mean and standard deviation for number of observations for each month
summarize(t_mean = mean(Temp,na.rm=T),
t_sd = sd(Temp,na.rm=T),
n_obs <- n())
# check out the new data frame
head(t_month)
## # A tibble: 6 x 7
## # Groups: year, month, site [6]
## year month site sensorZ t_mean t_sd `n_obs <- n()`
## <int> <dbl> <fct> <int> <dbl> <dbl> <int>
## 1 2016 6 hd 700 10.3 3.51 59
## 2 2016 7 hd 700 12.0 5.41 1488
## 3 2016 7 ld 700 11.8 5.19 1462
## 4 2016 8 hd 700 12.6 5.69 1488
## 5 2016 8 ld 700 12.6 5.51 1488
## 6 2016 9 hd 700 4.55 5.28 1440
For our assignment this week we will continue to work on data manipulation skills. You will be working in pairs this week.
Read in the netR.csv data file from the Arctic Data Center webpage (net radiation, as from Lab1).
Calculate average daily incoming shortwave radiation for each site (high- and low-density) and sensor height (1m and 8m). (Hint - do you need to use the as_datetime
, or as_date
function?)
Use the plot
, lines
, and legend
functions to create a graph showing mean daily incoming shortwave radiation for the month of June 2017. The graph should have four lines, one for the 1m and 8m measurment at the high- and low-density sites. Be sure to label the axes appropriately and distinguish between the different sites and measurments heights.
Save your solutions as a single R script on your network folder ‘lastname_lastname_lab3.R’ in your lab folder on the network by next Friday.