1. Introduction to R

R is a powerful set of tools for manipulating, analyzing, and visulaizing data. These tools are open source, which means that they are and always will be free to use, and they are developed by a worldwide community of users. The R programming language is at the core of these tools. Using the R language you can write commands that will tell your computer what you want it to do with your data. While this may seem a bit daunting there are a number of advantages associated with using tools like R to work with data, instead of using spreadhseet programs where you point and click, for example. In particular you can perform complex calculatations with large amounts of data very quickly, and more importanly you can save a record of your commands to make everything you do transparent and reproducable. This is very important in science!

Let’s begin to explore R using a set of exercises from the book Hands-On Programing with R.

We’ll go through sections 2.1 - 2.3.

2. Scripts

We can use an R script to keep a record of our calculations and any other things we do in R. A script is essentially a text file with all of our commands in it.

To make a script in RStudio click on File > New File > R Script.

By default this will open in the upper lefthand corner of RStudio, above the console. It’s a good idea to save your script to your folder on the network drive before you start working. Now you can type commands in your script and have a record of what we are doing. Note that commands will not execute after you hit enter, instead your curser will just go to the next line. There are a few ways to execute commands in a script. You can press the Run button in the upper right of your script window, press CTRL ENTER - either of these options will execute only the line that your cursor is on. You can highlight multiple lines, and use either of these methods to execute them. Pressing the Source button will run every command in your script.

In an R script a hashtag # tells R to ignore that particular line. This is called commenting, and it is useful for leaving notes for yourself and others in your script.

3. More on Functions in R

There are a wide variety of functions available to use in R. Functions generally perform some sort of mathematical operation, and can be fairly sophisticated. Functions are followed by parentheses, and you typically write and argument inside the parentheses. For example, imagine that we have a vector of numbers from 1 to 20, and we want to fund the sum of all these numbers. It turns out there is a sum function, and if we supply our vector as the argument R will calculate the sum.

v <- 1:20

sum(v)
## [1] 210

This is convenient, but you may be asking how do I know what functions exist and what the arguments are? For that we can use the help function in R. If you know which function you want to use you can see the documentation in the help window by typing a question mark followed by the function name in the console

?sum

4. Reading in Data

By now it should be apparent that all of these tools are very powerful, but you may be wondering how you can work with your own data in R. The most common way to bring data into R is to import a spreadsheet, and there are a number of functions for that. Let’s use the read.csv function to import a data set. Use the help function to find out what this function does, and what the arguments are.

We’ll use a dataset of air temperature from one of my research sites in Siberia. Click the following link to see the data on the Arctic Data Center web page. https://arcticdata.io/catalog/view/doi:10.18739/A2S46H57S

To start let’s look at the Air_Temp.csv data set. You might be able to figure out what this is, but let’s click the ‘More info’ link. This will give us information about the data in the file, the Attribute Information section is especially important. What does this tell us?

OK, now let’s read the data into R. Right click on the Download button and select Copy Link Address into the read.csv function, and set the header argument to True.

What does header = T mean? Should we create an object here? Why or why not?

t_air <- read.csv("https://arcticdata.io/metacat/d1/mn/v2/object/urn%3Auuid%3A3e962856-7ece-4ff8-870a-82511c31c974",
                   header = T)

OK, now we should see our t_air object in the Environment pane. Let’s take a look at the first few rows of our data set using the head function.

head(t_air)
##   doy year hour Temp sensorZ sensorLoc site
## 1 181 2016 18.5 14.8     700 overstory   hd
## 2 181 2016 19.0 14.3     700 overstory   hd
## 3 181 2016 19.5 13.4     700 overstory   hd
## 4 181 2016 20.0 12.3     700 overstory   hd
## 5 181 2016 20.5 11.6     700 overstory   hd
## 6 181 2016 21.0 11.3     700 overstory   hd

We can also gain some information about structure of this object using the str function.

str(t_air)
## 'data.frame':    31200 obs. of  7 variables:
##  $ doy      : int  181 181 181 181 181 181 181 181 181 181 ...
##  $ year     : int  2016 2016 2016 2016 2016 2016 2016 2016 2016 2016 ...
##  $ hour     : num  18.5 19 19.5 20 20.5 21 21.5 22 22.5 23 ...
##  $ Temp     : num  14.8 14.3 13.4 12.3 11.6 11.3 10.4 9.2 9.1 8.4 ...
##  $ sensorZ  : int  700 700 700 700 700 700 700 700 700 700 ...
##  $ sensorLoc: Factor w/ 1 level "overstory": 1 1 1 1 1 1 1 1 1 1 ...
##  $ site     : Factor w/ 2 levels "hd","ld": 1 1 1 1 1 1 1 1 1 1 ...

From this we can see that we have a data frame, much like a spread sheet, a data frame groups our data into a two dimensional table. In a data frame each column is a vector that represents a certain variable. Information on each variable is shown in the rows returned by our call to the str function. The first piece of information is the variable name, followed by the data type, and finally the first few values for each variable.

What do you think data type means?

We can use the $ symbol to access a single variable from a data frame.

head(t_air$hour)
## [1] 18.5 19.0 19.5 20.0 20.5 21.0

Now we can use some basic functions to explore our data.


min(t_air$Temp)
## [1] NA

max(t_air$Temp)
## [1] NA

mean(t_air$Temp)
## [1] NA

Oh no, what happened? Let’s try that again.

min(t_air$Temp, na.rm = T)
## [1] -40

max(t_air$Temp, na.rm = T)
## [1] 30.4

mean(t_air$Temp, na.rm = T)
## [1] -1.436316

Much better. What did we do differently here?

Matrix Notation and Subsetting
We can use brackets to specify rows and columns in a matrix or dataframe. For example let’s look at the value in the first row of the first column in our temperature data.

t_air[1,1]
## [1] 181

Now let’s try the 40th row of the 3rd column.

t_air[40,3]
## [1] 14

We can use these brackets to subset our data by rows or columns. And we can use logical statements to subset our data by attributes. For example what if we want only data from the high-density site.

t_hd <- t_air[t_air$site == "hd",]
head(t_hd)
##   doy year hour Temp sensorZ sensorLoc site
## 1 181 2016 18.5 14.8     700 overstory   hd
## 2 181 2016 19.0 14.3     700 overstory   hd
## 3 181 2016 19.5 13.4     700 overstory   hd
## 4 181 2016 20.0 12.3     700 overstory   hd
## 5 181 2016 20.5 11.6     700 overstory   hd
## 6 181 2016 21.0 11.3     700 overstory   hd

5. Homework Assignment

OK, now that we know some of the basics of R let’s see if we can apply it to things we’ve been learning in class.

Your homework is to create a script that does the following things:

  1. From the same data webpage load the net radiation data into R
  2. Convert the downward facing longwave radiation data to temperature using the Stefan-Boltzman equation. (Use an effective emissivity of 0.96, and get the Boltzman constant from your text book)
  3. Calculate the minimum, maximum, and average radiometric land surface temperature

Save the script as ‘lastname_lastname_lab1.R’ in your lab1 folder on the network by next Friday.