1 Objectives

1.1 Today

Here’s an overview of techniques to be covered in Hadley Wickham and Garrett Grolemund of RStudio’s forthcoming book R for Data Science:

Today, we’ll briefly skim across these topics, except for Model:

  • Import: readr to read in simple text files (as comma-seperated values, ie CSV)
  • Tidy: tidyr to organize rows of data into unique values
  • Transform: dplyr to manipulate data based on subsetting by rows or columns, sorting and joining
  • Visualise:
    • ggplot2 static plots, using grammar of graphics principles
    • plotly interactive plots, having hover, zoom and pan capabilities
    • tmaps thematic maps, both static and interactive
  • Communicate
    • online website with Github Pages
    • version with git
    • dynamic documents with Rmarkdown

1.2 This Morning

  1. Create Github login
  2. Create project website with Github Pages
  3. Edit README.md in Markdown
  4. Create HTML website content with R Markdown

2 Git and Github

  • Git is a version control system that lets you track changes to files over time. These files can be any kind of file (eg doc, pdf, xls), but free text differences are most easily visible (eg txt, csv, md). You can rollback changes made by you, or others. This facilitates a playground for collaboration, without fear of experimentation (you can always rollback changes).

  • Github is a website for storing your git versioned files remotely. It has many nice features to be able visualize differences between images, rendering & diffing map data files, render text data files, and track changes in text.

2.1 Setup Github & Git

  1. Create Github account at http://github.com, if you don’t already have one. For username, I recommend all lower-case letters, short as you can. I recommned using your *ucsb.edu email, since you can request free private repositories via GitHub Education discount.You’re encouraged to upload a picture since it will get included in the students listing as part of this course repository.

  2. Configure git with global commands. Open up the Bash version of Git and type the following:

    # display your version of git
    git --version
    
    # replace USER with your Github user account
    git config –-global user.name USER
    
    # replace USER@UMAIL.UCSB.EDU with the email you used to register with Github
    git config –-global user.email USER@UMAIL.UCSB.EDU
    
    # list your config to confirm user.* variables set
    git config --list

2.2 Github Workflows

The two most common workflow models for working Github repositories is based on your permissions:

  1. writable: Push & Pull (simplest)

  2. read only: Fork & Pull Request (extra steps)

2.2.0.1 Push & Pull

repo location USER permission initialize edit update
github.com/OWNER/REPO read + write create
~/github/REPO read + write clone commit , push pull

Note that OWNER could be either an individual USER or group ORGANIZATION, which has member USERs.

2.2.1 Fork & Pull Request

repo location USER permission initialize edit update
github.com/OWNER/REPO read only merge [BB]
github.com/OWNER/REPO read + write fork pull request pull request , merge
~/github/REPO read + write clone commit , push pull

2.3 Fork & Pull Request Your People Entry

As an exercise for you to try out this fork & pull request model, you will add yourself to the people directory for this workshop which looks like this:

Because you cannot directly write to this course repository, fork it into your own USER space. You can further cloneo it onto your local machine, or simply create a New file through the web browser, similar to how you edited the README.md. Introduce yourself by adding a tiny file per your Github USERNAME USERNAME.json under the _data directory. Here’s an example for my Github username bbest, so in a file named bbest.json:

{
    "info": "Lecturer at Bren School"
}

Using the format above, replace the value for info to one of your choosing, ie replace Lecturer at Bren School with something of your own. If you cloned to your machine, be sure to commit and push the changes (or in Github Desktop App “Commit and Sync”), and create a pull request to the original repository remi-daigle/2016-04-15-UCSB.

The details of how this works (using Jekyll data files) is beyond the scope of this workshop, but provides a simple satisfying example for applying the Fork & Pull Request model to a repository for which you do not have write permissions and want to contribute towards.

2.4 Create Repository my-project

Now you will create a Github repository for a project.

  1. Create a repository called my-project.

    Please be sure to tick the box to Initialize this repository with a README. Otherwise defaults are fine.

  2. Create a branch called gh-pages.

    Per pages.github.com, since this will be a project site only web files in the gh-pages branch will show up at http://USER.github.io/REPO. For a user (or organization) site, the REPO must be named USER.github.io (or ORG.github.io) and then the default master branch will contain the web files for the website http://USER.github.io (or http://ORG.github.io). See also User, Organization, and Project Pages - Github Help.

  3. Set the default branch to gh-pages, NOT the default master.

  4. Delete the branch master, which will not be used.

2.5 Edit README.md in Markdown

Commit your first change by editing the README.md which is in markdown, simple syntax for conversion to HTML. Now update the contents of the README.md with the following, having a link and a numbered list:

# my-project

Playing with [Software Carpentry at UCSB](http://remi-daigle.github.io/2016-04-15-UCSB).

## Introduction

This repository demonstrates **software** and _formats_:

1. **Git**
1. **Github**
1. _Markdown_
1. _Rmarkdown_

## Conclusion

![](https://octodex.github.com/images/labtocat.png)

Now click on the Preview changes to see the markdown rendered as HTML:

Notice the syntax for:

  • numbered list gets automatically sequenced: 1., 1.
  • headers get rendered at multiple levels: #, ##
  • link: [](http://...)
  • image: ![](http://...)
  • italics: _word_
  • bold: **word**

See Mastering Markdown · GitHub Guides and add some more personalized content to the README of your own, like a bulleted list or blockquote.

2.6 Create index.html

By default index.html is served up. Go ahead and create a new file named index.html with the following basic HTML:

<!DOCTYPE html>
<html>
<body>

<h1>My First Heading</h1>

<p>My first paragraph.</p>

</body>
</html>

2.7 Clone Repository

Clone the repository onto your local machine. The easiest way to do this is simply clicking the button to open up the Github Desktop App.

You’ll be prompted to clone this repository into a folder on your local machine. I recommend creating a folder github under your user folder.

See GitHub Desktop User Guides for more. You could also do this from the Bash Shell for Git with the command git clone https://github.com/USER/REPO.git, replacing USER with your Github username and REPO with my_project. Or you can use the Github Desktop App menu File -> Clone Repository…

3 Rmarkdown from RStudio

3.1 Create RStudio Project

Open RStudio and under the menu File -> New Project… -> Existing Directory. Browse to the folder where you previously cloned my-project.

You’ll notice a couple new files created in the Files pane:

  • .gitignore stores all the files for git to ignore committing
  • my-project.Rproj stores the settings for this project

Open the Github Desktop App, enter a message like “new RStudio project” and click on “Commit and Sync gh-pages”. This will update https://github.com/USER/my-project.

3.2 Create index.Rmd in Rmarkdown

Back in RStudio, let’s create a new Rmarkdown file, which allows us to weave markdown text with chunks of R code to be evaluated and output content like tables and plots.

File -> New File -> Rmarkdown… -> Document of output format HTML, OK.

You can give it a Title of “My Project”. After you click OK, most importantly File -> Save as index (which will get named with the filename extension index.Rmd).

Some initial text is already provided for you. Let’s go ahead and “Knit HTML”.

Notice how the markdown is rendered similar to as before + R code chunks are surrounded by 3 backticks and {r LABEL}. These are evaluated and return the output text in the case of summary(cars) and the output plot in the case of plot(pressure).

Notice how the code plot(pressure) is not shown in the HTML output because of the R code chunk option echo=FALSE.

Before we continue exploring Rmarkdown, return to the Github Desktop App, enter a message like “added index” and click on “Commit and Sync gh-pages”. This will update https://github.com/USER/my-project, and now you can also see your project website with a default index.html viewable at http://USER.github.io/my-project

For more on Rmarkdown:

3.3 Merge Conflicts

merge conflicts