Here’s an overview of techniques to be covered in Hadley Wickham and Garrett Grolemund of RStudio’s forthcoming book R for Data Science:
Today, we’ll briefly skim across these topics, except for Model:
readr
to read in simple text files (as comma-seperated values, ie CSV)tidyr
to organize rows of data into unique valuesdplyr
to manipulate data based on subsetting by rows or columns, sorting and joiningggplot2
static plots, using grammar of graphics principlesplotly
interactive plots, having hover, zoom and pan capabilitiestmaps
thematic maps, both static and interactiveGit is a version control system that lets you track changes to files over time. These files can be any kind of file (eg doc, pdf, xls), but free text differences are most easily visible (eg txt, csv, md). You can rollback changes made by you, or others. This facilitates a playground for collaboration, without fear of experimentation (you can always rollback changes).
Github is a website for storing your git versioned files remotely. It has many nice features to be able visualize differences between images, rendering & diffing map data files, render text data files, and track changes in text.
Create Github account at http://github.com, if you don’t already have one. For username, I recommend all lower-case letters, short as you can. I recommned using your *ucsb.edu email, since you can request free private repositories via GitHub Education discount.You’re encouraged to upload a picture since it will get included in the students listing as part of this course repository.
Configure git with global commands. Open up the Bash version of Git and type the following:
# display your version of git
git --version
# replace USER with your Github user account
git config –-global user.name USER
# replace USER@UMAIL.UCSB.EDU with the email you used to register with Github
git config –-global user.email USER@UMAIL.UCSB.EDU
# list your config to confirm user.* variables set
git config --list
The two most common workflow models for working Github repositories is based on your permissions:
writable: Push & Pull (simplest)
read only: Fork & Pull Request (extra steps)
repo location | USER permission |
initialize | edit | update |
---|---|---|---|---|
github.com/OWNER/REPO |
read + write | create | ||
~/github/REPO |
read + write | clone | commit , push | pull |
Note that OWNER could be either an individual USER or group ORGANIZATION, which has member USERs.
repo location | USER permission |
initialize | edit | update |
---|---|---|---|---|
github.com/OWNER/REPO |
read only | merge [BB] | ||
github.com/OWNER/REPO |
read + write | fork | pull request | pull request , merge |
~/github/REPO |
read + write | clone | commit , push | pull |
As an exercise for you to try out this fork & pull request model, you will add yourself to the people directory for this workshop which looks like this:
Because you cannot directly write to this course repository, fork it into your own USER space. You can further cloneo it onto your local machine, or simply create a New file through the web browser, similar to how you edited the README.md. Introduce yourself by adding a tiny file per your Github USERNAME USERNAME.json
under the _data
directory. Here’s an example for my Github username bbest
, so in a file named bbest.json
:
{
"info": "Lecturer at Bren School"
}
Using the format above, replace the value for info
to one of your choosing, ie replace Lecturer at Bren School
with something of your own. If you cloned to your machine, be sure to commit and push the changes (or in Github Desktop App “Commit and Sync”), and create a pull request to the original repository remi-daigle/2016-04-15-UCSB
.
The details of how this works (using Jekyll data files) is beyond the scope of this workshop, but provides a simple satisfying example for applying the Fork & Pull Request model to a repository for which you do not have write permissions and want to contribute towards.
my-project
Now you will create a Github repository for a project.
Create a repository called my-project
.
Please be sure to tick the box to Initialize this repository with a README. Otherwise defaults are fine.
Create a branch called gh-pages
.
Per pages.github.com, since this will be a project site only web files in the gh-pages
branch will show up at http://USER.github.io/REPO
. For a user (or organization) site, the REPO must be named USER.github.io
(or ORG.github.io
) and then the default master
branch will contain the web files for the website http://USER.github.io
(or http://ORG.github.io
). See also User, Organization, and Project Pages - Github Help.
Set the default branch to gh-pages
, NOT the default master
.
Delete the branch master
, which will not be used.
README.md
in MarkdownCommit your first change by editing the README.md
which is in markdown, simple syntax for conversion to HTML. Now update the contents of the README.md
with the following, having a link and a numbered list:
# my-project
Playing with [Software Carpentry at UCSB](http://remi-daigle.github.io/2016-04-15-UCSB).
## Introduction
This repository demonstrates **software** and _formats_:
1. **Git**
1. **Github**
1. _Markdown_
1. _Rmarkdown_
## Conclusion
![](https://octodex.github.com/images/labtocat.png)
Now click on the Preview changes to see the markdown rendered as HTML:
Notice the syntax for:
1.
, 1.
#
, ##
[](http://...)
![](http://...)
_word_
**word**
See Mastering Markdown · GitHub Guides and add some more personalized content to the README of your own, like a bulleted list or blockquote.
index.html
By default index.html
is served up. Go ahead and create a new file named index.html
with the following basic HTML:
<!DOCTYPE html>
<html>
<body>
<h1>My First Heading</h1>
<p>My first paragraph.</p>
</body>
</html>
Clone the repository onto your local machine. The easiest way to do this is simply clicking the button to open up the Github Desktop App.
You’ll be prompted to clone this repository into a folder on your local machine. I recommend creating a folder github
under your user folder.
See GitHub Desktop User Guides for more. You could also do this from the Bash Shell for Git with the command git clone https://github.com/USER/REPO.git
, replacing USER with your Github username and REPO with my_project. Or you can use the Github Desktop App menu File -> Clone Repository…
Open RStudio and under the menu File -> New Project… -> Existing Directory. Browse to the folder where you previously cloned my-project
.
You’ll notice a couple new files created in the Files pane:
.gitignore
stores all the files for git to ignore committingmy-project.Rproj
stores the settings for this projectOpen the Github Desktop App, enter a message like “new RStudio project” and click on “Commit and Sync gh-pages”. This will update https://github.com/USER/my-project.
index.Rmd
in RmarkdownBack in RStudio, let’s create a new Rmarkdown file, which allows us to weave markdown text with chunks of R code to be evaluated and output content like tables and plots.
File -> New File -> Rmarkdown… -> Document of output format HTML, OK.
You can give it a Title of “My Project”. After you click OK, most importantly File -> Save as index
(which will get named with the filename extension index.Rmd
).
Some initial text is already provided for you. Let’s go ahead and “Knit HTML”.
Notice how the markdown is rendered similar to as before + R code chunks are surrounded by 3 backticks and {r LABEL}
. These are evaluated and return the output text in the case of summary(cars)
and the output plot in the case of plot(pressure)
.
Notice how the code plot(pressure)
is not shown in the HTML output because of the R code chunk option echo=FALSE
.
Before we continue exploring Rmarkdown, return to the Github Desktop App, enter a message like “added index” and click on “Commit and Sync gh-pages”. This will update https://github.com/USER/my-project, and now you can also see your project website with a default index.html
viewable at http://USER.github.io/my-project
For more on Rmarkdown: