Here is an analogy to start us off. If you were a pilot, R is an an airplane. You can use R to go places! With practice you’ll gain skills and confidence; you can fly further distances and get through tricky situations. You will become an awesome pilot and can fly your plane anywhere.

And if R were an airplane, RStudio is the airport. RStudio provides support! Runways, communication and other services, and just makes your overall life easier. So although you can fly your plane without an airport and we could learn R without RStudio, that’s not what we’re going to do.

We are learning R together with RStudio and its many supporting features.

Something else to start us off is to mention that you are learning a new language here. It’s an ongoing process, it takes time, you’ll make mistakes, it can be frustrating, but it will be overwhelmingly awesome in the long run. We all speak at least one language; it’s a similar process, really. And no matter how fluent you are, you’ll always be learning, you’ll be trying things in new contexts, etc, just like everybody else. And just like any form of communication, there will be miscommunications but hands down we are all better off because of it.

OK, let’s get going.


To learn R and RStudio we will be using Dr. Jenny Bryan’s lectures from STAT545 at UBC. I have modifed them slightly here for our purposes; to see them in their full and awesome entirety, visit stat545-ubc.github.io. Specifically, we’ll be using these lectures:

Something we won’t cover today but that will be helpful to you in the future is:

I’ve modified them in part with my own text and in part with text from Software Carpentry’s R for reproducible scientific analysis, specifically:

1 R basics, workspace and working directory, RStudio projects

(modified from Jenny Bryan’s STAT545)

1.1 R at the command line, RStudio goodies

Launch RStudio/R.

Notice the default panes:

  • Console (entire left)
  • Environment/History (tabbed in upper right)
  • Files/Plots/Packages/Help (tabbed in lower right)

FYI: you can change the default location of the panes, among many other things: Customizing RStudio.

There are other great features we don’t really have time for today as we walk through the IDE together. (IDE stands for integrated development environment.) Check out the webinar and RStudio IDE cheatsheet for more. (And this is my blog post about RStudio Awesomeness).

Go into the Console, where we interact with the live R process.

Make an assignment and then inspect the object you just created.

x <- 3 * 4
x
## [1] 12

In my head I hear, e.g., “x gets 12”.

All R statements where you create objects – “assignments” – have this form: objectName <- value.

I’ll write it in the command line with a hashtag #, which is the way R comments so it won’t be evaluated.

# objectName <- value

Object names cannot start with a digit and cannot contain certain other characters such as a comma or a space. You will be wise to adopt a convention for demarcating words in names.

# i_use_snake_case
# other.people.use.periods
# evenOthersUseCamelCase

Make an assignment

this_is_a_really_long_name <- 2.5

To inspect this variable, instead of typing it, we can press the up arrow key and call your command history, with the most recent commands first. Let’s do that, and then delete the assignment:

this_is_a_really_long_name
## [1] 2.5

Another way to inspect this variable is to begin typing this_…and RStudio will automagically have suggested completions for you that you can select by hitting the tab key, then press return.

Make another assignment

this_is_shorter <- 2 ^ 3

To inspect this, try out RStudio’s completion facility: type the first few characters, press TAB, add characters until you disambiguate, then press return.

this_is_shorter
## [1] 8

One more:

jenny_rocks <- 2

Let’s try to inspect:

jennyrocks
## Error in eval(expr, envir, enclos): object 'jennyrocks' not found

Implicit contract with the computer / scripting language: Computer will do tedious computation for you. In return, you will be completely precise in your instructions. Typos matter. Case matters. Get better at typing.

Remember that this is a language, not unsimilar to English! There are times you aren’t understood – your friend might say ‘what?’ but R will say ‘error’.

A moment about logical operators and expressions. We can ask questions about the objects we just made.

  • == means ‘is equal to’
  • != means ‘is not equal to’
  • < means ` is less than’
  • > means ` is greater than’
  • <= means ` is less than or equal to’
  • >= means ` is greater than or equal to’
jenny_rocks == 2
## [1] TRUE
jenny_rocks <= 30
## [1] TRUE
jenny_rocks != 5
## [1] TRUE

Shortcuts You will make lots of assignments and the operator <- is a pain to type. Don’t be lazy and use =, although it would work, because it will just sow confusion later. Instead, utilize RStudio’s keyboard shortcut: Alt + - (the minus sign). Notice that RStudio automagically surrounds <- with spaces, which demonstrates a useful code formatting practice. Code is miserable to read on a good day. Give your eyes a break and use spaces. RStudio offers many handy keyboard shortcuts. Also, Alt+Shift+K brings up a keyboard shortcut reference card.

My most common shortcuts include command-Z (undo), and combinations of arrow keys in combination with shift/option/command (moving quickly up, down, sideways, with or without highlighting.

1.2 R functions, help pages

R has a mind-blowing collection of built-in functions that are accessed like so

# functionName(arg1 = val1, arg2 = val2, and so on)

Let’s try using seq() which makes regular sequences of numbers and, while we’re at it, demo more helpful features of RStudio.

Type se and hit TAB. A pop up shows you possible completions. Specify seq() by typing more to disambiguate or using the up/down arrows to select. Notice the floating tool-tip-type help that pops up, reminding you of a function’s arguments. If you want even more help, press F1 as directed to get the full documentation in the help tab of the lower right pane.

Type the arguments 1, 10 and hit return.

seq(1, 10)
##  [1]  1  2  3  4  5  6  7  8  9 10

We could probably infer that the seq() function makes a sequence, but let’s learn for sure. Type (and you can autocomplete) and let’s explore the help page:

?seq 
help(seq) # same as ?seq
seq(from = 1, to = 10) # same as seq(1, 10); R assumes by position
##  [1]  1  2  3  4  5  6  7  8  9 10
seq(from = 1, to = 10, by = 2)
## [1] 1 3 5 7 9

The above also demonstrates something about how R resolves function arguments. You can always specify in name = value form. But if you do not, R attempts to resolve by position. So above, it is assumed that we want a sequence from = 1 that goes to = 10. Since we didn’t specify step size, the default value of by in the function definition is used, which ends up being 1 in this case. For functions I call often, I might use this resolve by position for the first argument or maybe the first two. After that, I always use name = value.

The help page tells the name of the package in the top left, and broken down into sections:

  • Description: An extended description of what the function does.
  • Usage: The arguments of the function and their default values.
  • Arguments: An explanation of the data each argument is expecting.
  • Details: Any important details to be aware of.
  • Value: The data the function returns.
  • See Also: Any related functions you might find useful.
  • Examples: Some examples for how to use the function.

The examples can be copy-pasted into the console for you to understand what’s going on. Let’s try it.

Exercise: Talk to your neighbor(s) and look up the help file for a function you know. Try the examples, see if you learn anything new. (need ideas??getwd(), ?plot()).

Help for when you only sort of remember the function name: double-questionmark:

??install 

Not all functions have (or require) arguments:

date()
## [1] "Thu Apr 14 22:36:25 2016"

Now look at your workspace – in the upper right pane. The workspace is where user-defined objects accumulate. You can also get a listing of these objects with commands:

objects()
## [1] "jenny_rocks"                "this_is_a_really_long_name"
## [3] "this_is_shorter"            "x"
ls()
## [1] "jenny_rocks"                "this_is_a_really_long_name"
## [3] "this_is_shorter"            "x"

If you want to remove the object named y, you can do this

rm(y)
## Warning in rm(y): object 'y' not found

To remove everything:

rm(list = ls())

or click the broom in RStudio’s Environment pane.

Exercise: Clear your workspace, then create a few new variables. Discuss what makes a good filename. Hint: give variables short informative names (lifeExp versus “X5”)

1.3 Working directories, RStudio projects, R scripts

One day you will need to quit R, go do something else and return to your analysis later.

One day you will have multiple analyses going that use R and you want to keep them separate.

One day you will want to collaborate with colleagues/friends–need a portable way to do this.

So, what about your analysis do you want to capture (what is ‘real’), and where should it ‘live’?

The Console is good for quick tests, but you really want to work in saved R scripts as “real”. Huge benefits:

  • with the input data and the R code you used, you can reproduce everything.
  • you can make your analysis fancier.
  • you can get to the bottom of puzzling results and discover and fix bugs in your code.
  • you can reuse the code to conduct similar analyses in new projects.
  • you can remake a figure with different aspect ratio or save is as TIFF instead of PDF.
  • you are ready for the future.

So we will talk about scripts in a moment, but first let’s talk about where they should live.

We’re not going to cover workspaces today, but this is another alternative to scripts. You can learn about it in this RStudio article: Working Directories and Workspaces.

1.3.1 Working directory

Any process running on your computer has a notion of its “working directory”. In R, this is where R will look, by default, for files you ask it to load. It also where, by default, any files you write to disk will go.

You can explicitly check your working directory with:

getwd()
## [1] "/Users/julialowndes/github/2016-04-15-UCSB/R_RStudio"

It is also displayed at the top of the RStudio console.

As a beginning R user, it’s OK let your home directory or any other weird directory on your computer be R’s working directory. Very soon, I urge you to evolve to the next level, where you organize your analytical projects into directories and, when working on Project A, set R’s working directory to Project A’s directory.

Although I do not recommend it, in case you’re curious, you can set R’s working directory at the command line like so. You could also do this in a script.

setwd("~/myCoolProject")

But there’s a better way. A way that also puts you on the path to managing your R work like an expert.

1.3.2 RStudio projects

Keeping all the files associated with a project organized together – input data, R scripts, analytical results, figures – is such a wise and common practice that RStudio has built-in support for this via its projects.

Using Projects

Let’s make one to use for the rest of this workshop/class.

Do this: File > New Project … New Directory > Empty Project. The directory name you choose here will be the project name. Call it whatever you want (or follow me for convenience).

I created a directory and, therefore RStudio project, called swc in my github directory, FYI. What do you notice about your RStudio pane? Look in the right corner–‘software-carpentry’.

Now check that the “home” directory for your project is the working directory of our current R process:

getwd()
# "/Users/julialowndes/tmp/software-carpentry" 

I can’t print my output here because this document itself does not reside in the RStudio Project we just created.

This is the absolute path, just like we learned in the shell this morning. But from here, your paths within this project can be relative, and so our files within our project could work on your computer or mine, without worrying about the absolute paths.

Let’s enter a few commands in the Console, as if we are just beginning a project. Since we’re learning a new language here, an example is often the best way to see how things work. So we’re going to make an introductory plot using the cars dataset that is loaded into R.

cars
plot(cars)  
z <- line(cars)
abline(coef(z), col = "purple")