This lesson is my customized version of the Software Carpentry lesson on The Unix Shell. Please see the original lesson for a more detailed walkthrough

Please download the data folder if you want to follow along

What is ‘the shell’?

For our purposes the shell is ‘that box where you type in commands’. It is a command-line interface that looks like DOS (if you’re old enough to remember DOS). It has many names and ‘flavours’; Windows users have the Command Prompt, Mac and Unix/Linux users will have the terminal. In this workshop we will learn the BASH shell because it is the most popular shell language as well as being the default for Unix/Linux/Mac’s (and can be easily installed on Windows machines).

Why would scientists want to us it?

Computer scientists will sometimes give the argument that ‘everyone’ should learn to use the shell because that ‘came first’. That argument does not fly with me! I am not a computer scientist and if it does not help me with my own research, I do not care!

We are here learning about the shell, so it must be helpful and here’s why:

  • Modern operating systems with graphical user interfaces are design for ‘typical use’. Scientists are not typical users and in the process of discovery, by definition should be doing something no one has done before. If something has never been done, it’s hard to believe an operating system has been designed to do it!
  • In this process of discovery, you will mess something up your computer. The command line is often the place to fix those issues
  • You can write it all down in a text editor and save it. Writing down the commands in a script means it is reproducible!

writing it down

  • Last, but certainly not least, many high performance computing servers will only have the shell as their interface.

Manipulating files and directories

Learning Objectives

  • Create a directory hierarchy that matches a given diagram.
  • Create files in that hierarchy using an editor or by copying and renaming existing files.
  • Display the contents of a directory using the command line.
  • Delete specified files and/or directories.

Creating directories: mkdir

Let’s create a new directory called thesis using the command mkdir thesis (which has no output):

$ mkdir thesis

As you might (or might not) guess from its name, mkdir means “make directory”. Since thesis is a relative path (i.e., doesn’t have a leading slash), the new directory is created in the current working directory:

$ ls -F
creatures/  north-pacific-gyre/  thesis/
data/       notes.txt            writing/
Desktop/    pizza.cfg
molecules/  solar.pdf

However, there’s nothing in it yet:

$ ls -F thesis

Let’s change our working directory to thesis using cd, then run a text editor called Nano to create a file called draft.txt:

$ cd thesis
$ nano draft.txt

Let’s type in a few lines of text. Once we’re happy with out text, we can press Ctrl-O (press the Ctrl key and, while holding it down, press the O key) to write our data to disk.

Nano in action

Once our file is saved, we can use Ctrl-X to quit the editor and return to the shell.

nano doesn’t leave any output on the screen after it exits, but ls now shows that we have created a file called draft.txt:

$ ls
draft.txt

Deleting files: rm and rmdir

Let’s tidy up by running rm draft.txt:

$ rm draft.txt

This command removes files (rm is short for “remove”). If we run ls again, its output is empty once more, which tells us that our file is gone:

$ ls

WARNING:Deleting Is Forever

The Unix shell doesn’t have a trash bin that we can recover deleted files from (though most graphical interfaces to Unix do). Instead, when we delete files, they are unhooked from the file system so that their storage space on disk can be recycled. Tools for finding and recovering deleted files do exist, but there’s no guarantee they’ll work in any particular situation, since the computer may recycle the file’s disk space right away.

DO NOT EVER USE THIS COMMAND

$ rm -rf /

I was nervous just typing it in this document! It is important to take the time to understand why this will delete you entire harddrive. The rm command will delete everything in /, which is your root directory so it contains everything, the -rf flags are 2 seperate toggles: -r which forces recursive deletion through all subdirectories and -f which forces deletion of read-only files without confirmation.

Let’s re-create that file and then move up one directory to /Users/nelle using cd ..:

$ pwd
/Users/nelle/thesis
$ nano draft.txt
$ ls
draft.txt
$ cd ..

If we try to remove the entire thesis directory using rm thesis, we get an error message:

$ rm thesis
rm: cannot remove `thesis': Is a directory

This happens because rm only works on files, not directories. The right command is rmdir, which is short for “remove directory”. It doesn’t work yet either, though, because the directory we’re trying to remove isn’t empty:

$ rmdir thesis
rmdir: failed to remove `thesis': Directory not empty

This little safety feature can save you a lot of grief, particularly if you are a bad typist. To really get rid of thesis we must first delete the file draft.txt:

$ rm thesis/draft.txt

The directory is now empty, so rmdir can delete it:

$ rmdir thesis

With Great Power Comes Great Responsibility

Removing the files in a directory just so that we can remove the directory quickly becomes tedious. Instead, we can use rm with the -r flag (which stands for “recursive”):

$ rm -r thesis

This removes everything in the directory, then the directory itself. If the directory contains sub-directories, rm -r does the same thing to them, and so on. It’s very handy, but can do a lot of damage if used without care.

Moving and renaming files: mv

Let’s create that directory and file one more time. (Note that this time we’re running nano with the path thesis/draft.txt, rather than going into the thesis directory and running nano on draft.txt there.)

$ pwd
/Users/nelle
$ mkdir thesis
$ nano thesis/draft.txt
$ ls thesis
draft.txt

draft.txt isn’t a particularly informative name, so let’s change the file’s name using mv, which is short for “move”:

$ mv thesis/draft.txt thesis/quotes.txt

The first parameter tells mv what we’re “moving”, while the second is where it’s to go. In this case, we’re moving thesis/draft.txt to thesis/quotes.txt, which has the same effect as renaming the file. Sure enough, ls shows us that thesis now contains one file called quotes.txt:

$ ls thesis
quotes.txt

One has to be careful when specifying the target file name, since mv will silently overwrite any existing file with the same name, which could lead to data loss. An additional flag, mv -i (or mv --interactive), can be used to make mv ask you for confirmation before overwriting.

Just for the sake of inconsistency, mv also works on directories — there is no separate mvdir command.

Let’s move quotes.txt into the current working directory. We use mv once again, but this time we’ll just use the name of a directory as the second parameter to tell mv that we want to keep the filename, but put the file somewhere new. (This is why the command is called “move”.) In this case, the directory name we use is the special directory name . that we mentioned earlier.

$ mv thesis/quotes.txt .

The effect is to move the file from the directory it was in to the current working directory. ls now shows us that thesis is empty:

$ ls thesis

Further, ls with a filename or directory name as a parameter only lists that file or directory. We can use this to see that quotes.txt is still in our current directory:

$ ls quotes.txt
quotes.txt

Copying files: cp

The cp command works very much like mv, except it copies a file instead of moving it. We can check that it did the right thing using ls with two paths as parameters — like most Unix commands, ls can be given thousands of paths at once:

$ cp quotes.txt thesis/quotations.txt
$ ls quotes.txt thesis/quotations.txt
quotes.txt   thesis/quotations.txt

To prove that we made a copy, let’s delete the quotes.txt file in the current directory and then run that same ls again.

$ rm quotes.txt
$ ls quotes.txt thesis/quotations.txt
ls: cannot access quotes.txt: No such file or directory
thesis/quotations.txt

This time it tells us that it can’t find quotes.txt in the current directory, but it does find the copy in thesis that we didn’t delete.

What’s In A Name?

You may have noticed that all of Nelle’s files’ names are “something dot something”, and in this part of the lesson, we always used the extension .txt. This is just a convention: we can call a file mythesis or almost anything else we want. However, most people use two-part names most of the time to help them (and their programs) tell different kinds of files apart. The second part of such a name is called the filename extension, and indicates what type of data the file holds: .txt signals a plain text file, .pdf indicates a PDF document, .cfg is a configuration file full of parameters for some program or other, .png is a PNG image, and so on.

This is just a convention, albeit an important one. Files contain bytes: it’s up to us and our programs to interpret those bytes according to the rules for plain text files, PDF documents, configuration files, images, and so on.

Naming a PNG image of a whale as whale.mp3 doesn’t somehow magically turn it into a recording of whalesong, though it might cause the operating system to try to open it with a music player when someone double-clicks it.

Renaming files

Suppose that you created a .txt file in your current directory to contain a list of the statistical tests you will need to do to analyze your data, and named it: statstics.txt

After creating and saving this file you realize you misspelled the filename! You want to correct the mistake, which of the following commands could you use to do so?

  1. cp statstics.txt statistics.txt
  2. mv statstics.txt statistics.txt
  3. mv statstics.txt .
  4. cp statstics.txt .

Moving and Copying

What is the output of the closing ls command in the sequence shown below?

$ pwd
/Users/jamie/data
$ ls
proteins.dat
$ mkdir recombine
$ mv proteins.dat recombine
$ cp recombine/proteins.dat ../proteins-saved.dat
$ ls
  1. proteins-saved.dat recombine
  2. recombine
  3. proteins.dat recombine
  4. proteins-saved.dat

Organizing Directories and Files

Jamie is working on a project and she sees that her files aren’t very well organized:

$ ls -F
analyzed/  fructose.dat    raw/   sucrose.dat

The fructose.dat and sucrose.dat files contain output from her data analysis. What command(s) covered in this lesson does she need to run so that the commands below will produce the output shown?

$ ls -F
analyzed/   raw/
$ ls analyzed
fructose.dat    sucrose.dat

Copy with Multiple Filenames

What does cp do when given several filenames and a directory name, as in:

$ mkdir backup
$ cp thesis/citations.txt thesis/quotations.txt backup

What does cp do when given three or more filenames, as in:

$ ls -F
intro.txt    methods.txt    survey.txt
$ cp intro.txt methods.txt survey.txt

Listing Recursively and By Time

The command ls -R lists the contents of directories recursively, i.e., lists their sub-directories, sub-sub-directories, and so on in alphabetical order at each level. The command ls -t lists things by time of last change, with most recently changed files or directories first. In what order does ls -R -t display things?

Loops and pipes

Learning Objectives

  • Redirect a command’s output to a file.
  • Process a file instead of keyboard input using redirection.
  • Construct command pipelines with two or more stages.
  • Explain what usually happens if a program or pipeline isn’t given any input to process.
  • Explain Unix’s “small pieces, loosely joined” philosophy.

Now that we know a few basic commands, we can finally look at the shell’s most powerful feature: the ease with which it lets us combine existing programs in new ways. We’ll start with a directory called molecules that contains six files describing some simple organic molecules. The .pdb extension indicates that these files are in Protein Data Bank format, a simple text format that specifies the type and position of each atom in the molecule.

$ ls molecules
cubane.pdb    ethane.pdb    methane.pdb
octane.pdb    pentane.pdb   propane.pdb

Let’s go into that directory with cd and run the command wc *.pdb. wc is the “word count” command: it counts the number of lines, words, and characters in files. The * in *.pdb matches zero or more characters, so the shell turns *.pdb into a list of all .pdb files in the current directory:

$ cd molecules
$ wc *.pdb
  20  156 1158 cubane.pdb
  12   84  622 ethane.pdb
   9   57  422 methane.pdb
  30  246 1828 octane.pdb
  21  165 1226 pentane.pdb
  15  111  825 propane.pdb
 107  819 6081 total

Wildcards

* is a wildcard. It matches zero or more characters, so *.pdb matches ethane.pdb, propane.pdb, and every file that ends with ‘.pdb’. On the other hand, p*.pdb only matches pentane.pdb and propane.pdb, because the ‘p’ at the front only matches filenames that begin with the letter ‘p’.

? is also a wildcard, but it only matches a single character. This means that p?.pdb matches pi.pdb or p5.pdb, but not propane.pdb. We can use any number of wildcards at a time: for example, p*.p?* matches anything that starts with a ‘p’ and ends with ‘.’, ‘p’, and at least one more character (since the ? has to match one character, and the final * can match any number of characters). Thus, p*.p?* would match preferred.practice, and even p.pi (since the first * can match no characters at all), but not quality.practice (doesn’t start with ‘p’) or preferred.p (there isn’t at least one character after the ‘.p’).

When the shell sees a wildcard, it expands the wildcard to create a list of matching filenames before running the command that was asked for. As an exception, if a wildcard expression does not match any file, Bash will pass the expression as a parameter to the command as it is. For example typing ls *.pdf in the molecules directory (which contains only files with names ending with .pdb) results in an error message that there is no file called *.pdf. However, generally commands like wc and ls see the lists of file names matching these expressions, but not the wildcards themselves. It is the shell, not the other programs, that deals with expanding wildcards, and this is another example of orthogonal design.

Challenge: Using wildcards

When run in the molecules directory, which ls command will produce this output?

ethane.pdb methane.pdb

  1. ls *t*ane.pdb
  2. ls *t?ne.*
  3. ls *t??ne.pdb
  4. ls ethane.*

If we run wc -l instead of just wc, the output shows only the number of lines per file:

$ wc -l *.pdb
  20  cubane.pdb
  12  ethane.pdb
   9  methane.pdb
  30  octane.pdb
  21  pentane.pdb
  15  propane.pdb
 107  total

We can also use -w to get only the number of words, or -c to get only the number of characters.

Redirecting: >

Which of these files is shortest? It’s an easy question to answer when there are only six files, but what if there were 6000? Our first step toward a solution is to run the command:

$ wc -l *.pdb > lengths.txt

The greater than symbol, >, tells the shell to redirect the command’s output to a file instead of printing it to the screen. (This is why there is no screen output: everything that wc would have printed has gone into the file lengths.txt instead.) The shell will create the file if it doesn’t exist. If the file exists, it will be silently overwritten, which may lead to data loss and thus requires some caution.

We can now send the content of lengths.txt to the screen using cat lengths.txt. cat stands for “concatenate”: it prints the contents of files one after another. There’s only one file in this case, so cat just shows us what it contains:

$ cat lengths.txt
  20  cubane.pdb
  12  ethane.pdb
   9  methane.pdb
  30  octane.pdb
  21  pentane.pdb
  15  propane.pdb
 107  total

Now let’s use the sort command to sort its contents. We will also use the -n flag to specify that the sort is numerical instead of alphabetical. This does not change the file; instead, it sends the sorted result to the screen:

$ sort -n lengths.txt
  9  methane.pdb
 12  ethane.pdb
 15  propane.pdb
 20  cubane.pdb
 21  pentane.pdb
 30  octane.pdb
107  total

We can put the sorted list of lines in another temporary file called sorted-lengths.txt by putting > sorted-lengths.txt after the command, just as we used > lengths.txt to put the output of wc into lengths.txt. Once we’ve done that, we can run another command called head to get the first few lines in sorted-lengths.txt:

$ sort -n lengths.txt > sorted-lengths.txt
$ head -n 1 sorted-lengths.txt
  9  methane.pdb

Using the parameter -n 1 with head tells it that we only want the first line of the file; -n 20 would get the first 20, and so on. Since sorted-lengths.txt contains the lengths of our files ordered from least to greatest, the output of head must be the file with the fewest lines.

If you think this is confusing, you’re in good company: even once you understand what wc, sort, and head do, all those intermediate files make it hard to follow what’s going on.

Piping: |

We can make it easier to understand by running sort and head together:

$ sort -n lengths.txt | head -n 1
  9  methane.pdb

The vertical bar, |, between the two commands is called a pipe. It tells the shell that we want to use the output of the command on the left as the input to the command on the right. The computer might create a temporary file if it needs to, or copy data from one program to the other in memory, or something else entirely; we don’t have to know or care.

Nothing prevents us from chaining pipes consecutively. That is, we can for example send the output of wc directly to sort, and then the resulting output to head. Thus we first use a pipe to send the output of wc to sort:

$ wc -l *.pdb | sort -n
   9 methane.pdb
  12 ethane.pdb
  15 propane.pdb
  20 cubane.pdb
  21 pentane.pdb
  30 octane.pdb
 107 total

And now we send the output ot this pipe, through another pipe, to head, so that the full pipeline becomes:

$ wc -l *.pdb | sort -n | head -n 1
   9  methane.pdb

Redirects and Pipes

Challenge: Piping commands together

In our current directory, we want to find the 3 files which have the least number of lines. Which command listed below would work?

  1. wc -l * > sort -n > head -n 3
  2. wc -l * | sort -n | head -n 1-3
  3. wc -l * | head -n 3 | sort -n
  4. wc -l * | sort -n | head -n 3

Challenge: Pipe reading comprehension

A file called animals.txt contains the following data:

2012-11-05,deer
2012-11-05,rabbit
2012-11-05,raccoon
2012-11-06,rabbit
2012-11-06,deer
2012-11-06,fox
2012-11-07,rabbit
2012-11-07,bear

What text passes through each of the pipes and the final redirect in the pipeline below?

cat animals.txt | head -n 5 | tail -n 3 | sort -r > final.txt

Loops: for

Loops are key to productivity improvements through automation as they allow us to execute commands repetitively. Similar to wildcards and tab completion, using loops also reduces the amount of typing (and typing mistakes). Suppose we have several hundred genome data files named basilisk.dat, unicorn.dat, and so on. In this example, we’ll use the creatures directory which only has two example files, but the principles can be applied to many many more files at once. We would like to modify these files, but also save a version of the original files, naming the copies original-basilisk.dat and original-unicorn.dat. We can’t use:

$ cp *.dat original-*.dat

because that would expand to:

$ cp basilisk.dat unicorn.dat original-*.dat

This wouldn’t back up our files, instead we get an error:

cp: target `original-*.dat' is not a directory

This problem arises when cp receives more than two inputs. When this happens, it expects the last input to be a directory where it can copy all the files it was passed. Since there is no directory named original-*.dat in the creatures directory we get an error.

Instead, we can use a loop to do some operation once for each thing in a list. Here’s a simple example that displays the first three lines of each file in turn:

$ for filename in basilisk.dat unicorn.dat
> do
>    head -n 3 $filename
> done
COMMON NAME: basilisk
CLASSIFICATION: basiliscus vulgaris
UPDATED: 1745-05-02
COMMON NAME: unicorn
CLASSIFICATION: equus monoceros
UPDATED: 1738-11-24

When the shell sees the keyword for, it knows it is supposed to repeat a command (or group of commands) once for each thing in a list. In this case, the list is the two filenames. Each time through the loop, the name of the thing currently being operated on is assigned to the variable called filename. Inside the loop, we get the variable’s value by putting $ in front of it: $filename is basilisk.dat the first time through the loop, unicorn.dat the second, and so on.

By using the dollar sign we are telling the shell interpreter to treat filename as a variable name and substitute its value on its place, but not as some text or external command. When using variables it is also possible to put the names into curly braces to clearly delimit the variable name: $filename is equivalent to ${filename}, but is different from ${file}name. You may find this notation in other people’s programs.

Finally, the command that’s actually being run is our old friend head, so this loop prints out the first three lines of each data file in turn.

Follow the Prompt

The shell prompt changes from $ to > and back again as we were typing in our loop. The second prompt, >, is different to remind us that we haven’t finished typing a complete command yet. A semicolon, ;, can be used to separate two commands written on a single line.

Imagine we want to extract the second line from each *.dat file. Here’s a slightly more complicated loop:

for filename in *.dat
do
    head -n 2 $filename | tail -n 1
done

The shell starts by expanding *.dat to create the list of files it will process. The loop body then executes the command for each of those files.

We can redirect the output of a loop to a file like we did before with wc:

for filename in *.dat
do
    head -n 2 $filename | tail -n 1
done > classifications.txt

Measure Twice, Run Once

A loop is a way to do many things at once — or to make many mistakes at once if it does the wrong thing. One way to check what a loop would do is to echo the commands it would run instead of actually running them. For example, we could write our file copying loop like this:

for filename in *.dat
do
    echo cp $filename original-$filename
done

Instead of running cp, this loop runs echo, which prints out:

cp basilisk.dat original-basilisk.dat
cp unicorn.dat original-unicorn.dat

without actually running those commands. We can then use up-arrow to redisplay the loop, back-arrow to get to the word echo, delete it, and then press Enter to run the loop with the actual cp commands. This isn’t foolproof, but it’s a handy way to see what’s going to happen when you’re still learning how loops work.

Shell scripts

Learning Objectives

  • Write a shell script that runs a command or series of commands for a fixed set of files.
  • Run a shell script from the command line.
  • Write a shell script that operates on a set of files defined by the user on the command line.
  • Create pipelines that include shell scripts you, and others, have written.

We are finally ready to see what makes the shell such a powerful programming environment. We are going to take the commands we repeat frequently and save them in files so that we can re-run all those operations again later by typing a single command. For historical reasons, a bunch of commands saved in a file is usually called a shell script, but make no mistake: these are actually small programs.

Let’s start by going back to molecules/ and putting the following line into a new file, middle.sh:

$ cd molecules
$ nano middle.sh

The command nano middle.sh opens the file middle.sh within the text editor “nano” (which runs within the shell). If the file does not exist, it will be created. We can use the text editor to directly edit the file. We’ll simply insert the following line:

head -n 15 octane.pdb | tail -n 5

This is a variation on the pipe we constructed earlier: it selects lines 11-15 of the file octane.pdb. Remember, we are not running it as a command just yet: we are putting the commands in a file.

Then we save the file (using Ctrl-O), and exit the text editor (using Ctrl-X). Check that the directory molecules now contains a file called middle.sh.

Once we have saved the file, we can ask the shell to execute the commands it contains. Our shell is called bash, so we run the following command:

$ bash middle.sh
ATOM      9  H           1      -4.502   0.681   0.785  1.00  0.00
ATOM     10  H           1      -5.254  -0.243  -0.537  1.00  0.00
ATOM     11  H           1      -4.357   1.252  -0.895  1.00  0.00
ATOM     12  H           1      -3.009  -0.741  -1.467  1.00  0.00
ATOM     13  H           1      -3.172  -1.337   0.206  1.00  0.00

Sure enough, our script’s output is exactly what we would get if we ran that pipeline directly.

Text vs. Whatever

We usually call programs like Microsoft Word or LibreOffice Writer “text editors”, but we need to be a bit more careful when it comes to programming. By default, Microsoft Word uses .docx files to store not only text, but also formatting information about fonts, headings, and so on. This extra information isn’t stored as characters, and doesn’t mean anything to tools like head: they expect input files to contain nothing but the letters, digits, and punctuation on a standard computer keyboard. When editing programs, therefore, you must either use a plain text editor, or be careful to save files as plain text.

What if we want to select lines from an arbitrary file? We could edit middle.sh each time to change the filename, but that would probably take longer than just retyping the command. Instead, let’s edit middle.sh and replace octane.pdb with a special variable called $1:

$ nano middle.sh

Now, within “nano”, replace the text octane.pdb with the special variable called $1:

head -n 15 "$1" | tail -n 5

Inside a shell script, $1 means “the first filename (or other parameter) on the command line”. We can now run our script like this:

$ bash middle.sh octane.pdb
ATOM      9  H           1      -4.502   0.681   0.785  1.00  0.00
ATOM     10  H           1      -5.254  -0.243  -0.537  1.00  0.00
ATOM     11  H           1      -4.357   1.252  -0.895  1.00  0.00
ATOM     12  H           1      -3.009  -0.741  -1.467  1.00  0.00
ATOM     13  H           1      -3.172  -1.337   0.206  1.00  0.00

or on a different file like this:

$ bash middle.sh pentane.pdb
ATOM      9  H           1       1.324   0.350  -1.332  1.00  0.00
ATOM     10  H           1       1.271   1.378   0.122  1.00  0.00
ATOM     11  H           1      -0.074  -0.384   1.288  1.00  0.00
ATOM     12  H           1      -0.048  -1.362  -0.205  1.00  0.00
ATOM     13  H           1      -1.183   0.500  -1.412  1.00  0.00

Double-Quotes Around Arguments

We put the $1 inside of double-quotes in case the filename happens to contain any spaces. The shell uses whitespace to separate arguments, so we have to be careful when using arguments that might have whitespace in them. If we left out these quotes, and $1 expanded to a filename like methyl butane.pdb, the command in the script would effectively be:

head -n 15 methyl butane.pdb | tail -n 5

This would call head on two separate files, methyl and butane.pdb, which is probably not what we intended.

We still need to edit middle.sh each time we want to adjust the range of lines, though. Let’s fix that by using the special variables $2 and $3 for the number of lines to be passed to head and tail respectively:

$ nano middle.sh
head -n "$2" "$1" | tail -n "$3"

We can now run:

$ bash middle.sh pentane.pdb 15 5
ATOM      9  H           1       1.324   0.350  -1.332  1.00  0.00
ATOM     10  H           1       1.271   1.378   0.122  1.00  0.00
ATOM     11  H           1      -0.074  -0.384   1.288  1.00  0.00
ATOM     12  H           1      -0.048  -1.362  -0.205  1.00  0.00
ATOM     13  H           1      -1.183   0.500  -1.412  1.00  0.00

By changing the arguments to our command we can change our script’s behaviour:

$ bash middle.sh pentane.pdb 20 5
ATOM     14  H           1      -1.259   1.420   0.112  1.00  0.00
ATOM     15  H           1      -2.608  -0.407   1.130  1.00  0.00
ATOM     16  H           1      -2.540  -1.303  -0.404  1.00  0.00
ATOM     17  H           1      -3.393   0.254  -0.321  1.00  0.00
TER      18              1

This works, but it may take the next person who reads middle.sh a moment to figure out what it does. We can improve our script by adding some comments at the top:

$ nano middle.sh
# Select lines from the middle of a file.
# Usage: middle.sh filename end_line num_lines
head -n "$2" "$1" | tail -n "$3"

A comment starts with a # character and runs to the end of the line. The computer ignores comments, but they’re invaluable for helping people understand and use scripts.

Fun stuff!

Reproducible Science with Docker

I am not a (Docker) expert, but it’s something I want to get into. It brings reproducibility to another level by providing a machine independent environment and keeping track of ALL your dependencies. The (ROpenSci tutorial)[http://ropenscilabs.github.io/r-docker-tutorial/] is a great tutorial about using the hadleyverse which is a pre-built docker container that has all your favourite R packages and includes RStudio Server.

High Performance Computing

In most cases, if you are using a High Performance Computing server, you will need to use the command line to submit your work. You can read about how to use the one that I use at the University if McGill (here)[http://www.hpc.mcgill.ca/index.php/starthere]. However, it’s probably more useful to look up your own local version.

In general, logging in can be down with 1 line (once you have a username and password): ~ {.bash} $ ssh username@hpcname ~

Stitching multipanel figures

Sometimes you just can’t figure out how to stitch those panels together, or your co-author gives you 20 panels every time you edit the document. In these cases, you can save alot of time by scripting this with (ImageMagick)[http://www.imagemagick.org/Usage/montage/#geometry_size]!

Investigate the shell scripts in the fA7_panels folder which is in the data folder