Open and Reproducible Neuroscience

The layers of computing

Overview

Teaching: 30 min
Exercises: 0 min
Questions
  • Where, exactly, is my data? Where should it be?

  • Where is the computer analyzing my data? Can I make it go faster?

  • How do all of these places relate to each other?

  • What is the shell and why do I care?

  • What is python? What is IPython? What is Juypter?

  • What are the tools we’ll be using in the course?

Objectives
  • Understand how the components in your computing stack relate to one another

  • Understand the different ways of interacting with your computer and what the pros and cons are

  • Explain when and why command-line interfaces should be used instead of graphical interfaces.

  • Explain the advantages of remote computing including cloud and high performance computing.

  • Explain how to access the NIMH dedicated computing resources in the most convenient manner.

  • Get exposed to IPython and learn it’s strengths.

  • Learn shortcuts and hotkeys to make command line computing fast and efficient

Getting started

source /data/DSST/scripts/repro_env_setup.sh
jupyter qtconsole &

Your screen should now look something like this:

image_of_shell

Before we do anything we should all type the following command. For now we don’t need to know what it does but it will help to avoid confusion as we learn about our command line interface:

%automagic
Automagic is OFF, % prefix IS needed for line magics.

Know who (or what) you’re talking to

It’s easy and common to click on an icon, open and window and typing, clicking, and analyzing in that window without much thought about how or where those commands are being executed and where the data is being stored. It’s worth the effort to take some time and understand exactly what your computing (stack)[https://en.wikipedia.org/wiki/Solution_stack] looks like and exactly where everything is happening. Understanding this stack can help you make your analysis more efficient and secure.

Where am I?

The desktop you’re looking at inside the NoMachine window is from a computer that lives over in building 12B at NIH’s high performance computing (HPC) center. This computer (helix or felix) is separate from but closely affiliated with the large cluster of computers known as the (Biowulf)[https://hpc.nih.gov/systems/].

NoMachine is a remote desktop program. It allows your to interact with the remote computer (helix/felix) almost as if you were sitting right in front of it. ((VNC)[https://en.wikipedia.org/wiki/Virtual_Network_Computing] is another common remote desktop program).

The window on the left of your NX desktop is running an IPython shell. On the right side, there is a window running a (Bash) shell. A (shell)[https://en.wikipedia.org/wiki/Shell_(computing)] refer to a user interface. It is designed to allow the user, you, to interact with the computer in an efficient and convenient way. It’s named a shell because it stands between the user and the “guts” of the computer system, which is called the kernel.

The shell kernel metaphor

Bash and tcsh shell vs. IPython

This is the first of many expandable boxes you’ll see in these lesson. They provide a bit more info that are not essential to the flow of the course. Only read them if you’re interested.

We’ll be working almost exclusively in the IPython shell in this course. IPython provides an easy way to explore and interact with your data that leverages the power of the Python programming language. The Bash shell is an extremely versatile tool designed to allow you to control and interact with files and your computer more generally. You’ve likely had exposure to Bash (or it’s close cousin, tcsh) if you’ve used analysis packages like AFNI, FSL, and FreeSurfer. The Bash shell is a great thing to learn, but we want to focus on teaching you reproducible computing concepts without getting bogged down in the syntax of too many different languages.

Challenge: Draw your stack

Expandable orange boxes like this one contain challenges for you to try.

Draw a diagram to illustrate how the programs and computers listed below relate to one another.

Now get with your partner and compare your drawings. Discuss the difference and edit as needed.

The Jupyter qtconsole

A note on the IPython prompt

 In [1]

The IPython prompt, as depicted above, sits before our blinking cursor. It consists of the word “In” followed by a number. It informs us that we are currently entering input. In this case the first input of the IPython kernel session. Every time we execute a command this number will increment by one.

The Jupyter qtconsole is a developing environment that is optimized for interactive computing typical of scientific analyses. It provides lots of convenient commands and shortcuts for this. Different languages can be used with this interface. In our case we will be using IPython. It provides us with a convenient interface to the system shell and the Python interpreter. We will start by typing in the IPython command:

%pwd
/Users/this_user

This returns the present working directory, the file-system-context for the commands that we execute. We can refer to these files by simply typing their names. We can list these files by using the next command:

%ls

The %ls command returns all the files in the present working directory. When we wish to perform actions on these files we can refer to them directly by their names and the shell will understand which files we are referring to. Later we will talk about how to add new files to the current directory but for now we will use the %mkdir command to add a directory into our present working directory:

%mkdir repro_course

IPython magics and system commands

The above commands can be entered into the IPython in slightly different ways. Instead of using the “%” sign we can instead use the “!” sign and we will still get the same results (although we will not be able to make the same directory again).

!pwd
!ls
!mkdir repro_course
/Users/this_user
... A list of files in the current directory...
mkdir: cannot create directory `repro_course': File exists

When we use the “%” sign it denotes that the command is an “IPython magic”. These are special commands available in IPython that make doing some common tasks easier on the command line. The “!” instead evalutate the command with the system shell. On a Linux or Mac OSX operating system this will likely be the Bash shell. All of these commands exist in Bash; however, on a Windows operating system the pwd system command does not exist. You can see we have to be careful about using system commands. Some will be operating system dependent. Since the IPython magics will work regardless of the operating system used, we shall instead use those when we can.

System commands start a new sub-process

Apart from the different system commands across platforms there is another reason we might have a preference for IPython magics. Each system command starts up a sub-process, executes the commands, and then closes. The effect of this can be seen very easily when we use the “!cd” system command to change directories.

%pwd
!cd repro_course
%pwd
/Users/this_user
/Users/this_user

This would not be the behavior we would wish from a command that changes our present working directory. This occurs because the changes made to the environment in the sub-process are not propagated back to the IPython shell. If we use the IPython magic %cd we get the behavior we want.

%pwd
%cd repro_course
%pwd
/Users/this_user
/Users/this_user/repro_course
/Users/this_user/repro_course

Understanding the file folder metaphor

Working on the command line requires a fluid understanding of how a file system is organized. Most file systems use the file folder metaphor to provide a system that would be familar to most computer users. This is also sometimes referred to an (inverted) directory “tree”. When someone asks you to “change up one directory” they are referencing this metaphor.

If you’d like a bit more practice understanding and navigating a directory tree, check out this lesson at Software Carpentry. There are also some exercises available at Code Academy.

Convenient commands to remember

Key Points