introduction to linux and supercomputers · introduction to linux and supercomputers doug crabill...

58
Big Data Training for Translational Omics Research Introduction to Linux and Supercomputers Doug Crabill Senior Academic IT Specialist Department of Statistics Purdue University [email protected]

Upload: others

Post on 17-Aug-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to Linux and Supercomputers · Introduction to Linux and Supercomputers Doug Crabill Senior Academic IT Specialist ... • The shell we use is called “bash” • Terminal

Big Data Training for Translational Omics Research

Introduction to Linux and Supercomputers

Doug CrabillSenior Academic IT Specialist

Department of StatisticsPurdue [email protected]

Page 2: Introduction to Linux and Supercomputers · Introduction to Linux and Supercomputers Doug Crabill Senior Academic IT Specialist ... • The shell we use is called “bash” • Terminal

Big Data Training for Translational Omics Research

What you will learn

• How to log into a Linux Supercomputer• Basics of a Linux shell, including

moving/editing/creating/deleting files• The basic filesystem structure for the data

files and scripts you will be using• How to launch programs, see their progress,

and terminate them early if necessary

Page 3: Introduction to Linux and Supercomputers · Introduction to Linux and Supercomputers Doug Crabill Senior Academic IT Specialist ... • The shell we use is called “bash” • Terminal

Big Data Training for Translational Omics Research

The rice.rcac.purdue.edu cluster

Page 4: Introduction to Linux and Supercomputers · Introduction to Linux and Supercomputers Doug Crabill Senior Academic IT Specialist ... • The shell we use is called “bash” • Terminal

Big Data Training for Translational Omics Research

The rice.rcac.purdue.edu cluster

Page 5: Introduction to Linux and Supercomputers · Introduction to Linux and Supercomputers Doug Crabill Senior Academic IT Specialist ... • The shell we use is called “bash” • Terminal

Big Data Training for Translational Omics Research

An individual node

Page 6: Introduction to Linux and Supercomputers · Introduction to Linux and Supercomputers Doug Crabill Senior Academic IT Specialist ... • The shell we use is called “bash” • Terminal

Big Data Training for Translational Omics Research

Rice supercomputer stats

• 660 Nodes, 13,200 total CPU cores• Each nodes has 20 CPU cores, 64GB RAM• 1.7 Petabytes of scratch space for this cluster

alone• Several Petabytes of long term storage

shared among all clusters• Currently #319 on top500.org, Conte is #100.

Page 7: Introduction to Linux and Supercomputers · Introduction to Linux and Supercomputers Doug Crabill Senior Academic IT Specialist ... • The shell we use is called “bash” • Terminal

Big Data Training for Translational Omics Research

Why Linux?

Page 8: Introduction to Linux and Supercomputers · Introduction to Linux and Supercomputers Doug Crabill Senior Academic IT Specialist ... • The shell we use is called “bash” • Terminal

Big Data Training for Translational Omics Research

Why Linux?

Page 9: Introduction to Linux and Supercomputers · Introduction to Linux and Supercomputers Doug Crabill Senior Academic IT Specialist ... • The shell we use is called “bash” • Terminal

Big Data Training for Translational Omics Research

Why Linux?

• Can be desktops, but tend to be larger servers in some remote, environmentally controlled data center

• Multiple CPU cores per server (~20)• Large amounts of RAM (64GB – 256GB is

common)• Multiple users can use the same computer

simultaneously

Page 10: Introduction to Linux and Supercomputers · Introduction to Linux and Supercomputers Doug Crabill Senior Academic IT Specialist ... • The shell we use is called “bash” • Terminal

Big Data Training for Translational Omics Research

Why Linux? (cont.)

• Can interact with a graphical interface• More common to interact with a text based

interface• Servers tend to stay up for a long time

between reboots (months)• Commonly launch programs and walk away

for days, weeks, or months as they run• Computations can scale up as servers added

Page 11: Introduction to Linux and Supercomputers · Introduction to Linux and Supercomputers Doug Crabill Senior Academic IT Specialist ... • The shell we use is called “bash” • Terminal

Big Data Training for Translational Omics Research

But where are the keyboards, mice, and monitors?

Page 12: Introduction to Linux and Supercomputers · Introduction to Linux and Supercomputers Doug Crabill Senior Academic IT Specialist ... • The shell we use is called “bash” • Terminal

Big Data Training for Translational Omics Research

But where are the keyboards, mice, and monitors?

Page 13: Introduction to Linux and Supercomputers · Introduction to Linux and Supercomputers Doug Crabill Senior Academic IT Specialist ... • The shell we use is called “bash” • Terminal

Big Data Training for Translational Omics Research

We use four computer systems

• Windows Desktop: computer in front of you• Thinlinc: Linux server accessed in a web

browser to gain a graphical interface• Rice front-end: four of them, used to access

Rice nodes, submit large jobs to the cluster• Rice node, aka compute node, back-end

node: 660 of them, this is where we compute

Page 14: Introduction to Linux and Supercomputers · Introduction to Linux and Supercomputers Doug Crabill Senior Academic IT Specialist ... • The shell we use is called “bash” • Terminal

Big Data Training for Translational Omics Research

Doug’s cute analogy

• Thinlinc: Like the bus you take from the airport to the hotel

• Rice front-end: Like the front desk at the hotel lobby where we make room reservations

• Rice node, aka compute node, back-end node: Like a private hotel room, where you are free to do your thing

Page 15: Introduction to Linux and Supercomputers · Introduction to Linux and Supercomputers Doug Crabill Senior Academic IT Specialist ... • The shell we use is called “bash” • Terminal

Big Data Training for Translational Omics Research

Thinlinc Linux graphical interface

• We will use Thinlinc in a web browser (Firefox) to provide a graphical user interface for us to log into a Rice front-end, then a Rice node, where we will do the real computing

• Here is what it looks like!

Page 16: Introduction to Linux and Supercomputers · Introduction to Linux and Supercomputers Doug Crabill Senior Academic IT Specialist ... • The shell we use is called “bash” • Terminal

Big Data Training for Translational Omics Research

Logging into Thinlinc

Page 17: Introduction to Linux and Supercomputers · Introduction to Linux and Supercomputers Doug Crabill Senior Academic IT Specialist ... • The shell we use is called “bash” • Terminal

Big Data Training for Translational Omics Research

Thinlinc screenshot

Page 18: Introduction to Linux and Supercomputers · Introduction to Linux and Supercomputers Doug Crabill Senior Academic IT Specialist ... • The shell we use is called “bash” • Terminal

Big Data Training for Translational Omics Research

Thinlinc sessions can persist!

• Programs/windows that are open and running can persist after closing browser

• Smile patiently while I demonstrate persistence

• If you explicitly click, you will be logged completely out and windows will not persist

Page 19: Introduction to Linux and Supercomputers · Introduction to Linux and Supercomputers Doug Crabill Senior Academic IT Specialist ... • The shell we use is called “bash” • Terminal

Big Data Training for Translational Omics Research

Terminal Window

Page 20: Introduction to Linux and Supercomputers · Introduction to Linux and Supercomputers Doug Crabill Senior Academic IT Specialist ... • The shell we use is called “bash” • Terminal

Big Data Training for Translational Omics Research

What is a “shell”?

• A text-based user interface used to launch programs

• Allows arguments to be passed to programs, specifying input/output files, running multiple programs at once

• The shell we use is called “bash”• Terminal is one way of accessing a shell

Page 21: Introduction to Linux and Supercomputers · Introduction to Linux and Supercomputers Doug Crabill Senior Academic IT Specialist ... • The shell we use is called “bash” • Terminal

Big Data Training for Translational Omics Research

Shell prompt

Page 22: Introduction to Linux and Supercomputers · Introduction to Linux and Supercomputers Doug Crabill Senior Academic IT Specialist ... • The shell we use is called “bash” • Terminal

Big Data Training for Translational Omics Research

Use ssh to connect to Linux

• Implementations of ssh, available on Windows, Mac, Linux, Android, IOS, etc.

• Many implementations of ssh exist on each OS, any can be used. Similar to how many web browsers exist, and any can be used

• ssh alone provides a text-based interface• When used in conjunction with other

technologies, graphical interface can be used

Page 23: Introduction to Linux and Supercomputers · Introduction to Linux and Supercomputers Doug Crabill Senior Academic IT Specialist ... • The shell we use is called “bash” • Terminal

Big Data Training for Translational Omics Research

Connect to Rice front-end

Page 24: Introduction to Linux and Supercomputers · Introduction to Linux and Supercomputers Doug Crabill Senior Academic IT Specialist ... • The shell we use is called “bash” • Terminal

Big Data Training for Translational Omics Research

Rice front-end prompt

Page 25: Introduction to Linux and Supercomputers · Introduction to Linux and Supercomputers Doug Crabill Senior Academic IT Specialist ... • The shell we use is called “bash” • Terminal

Big Data Training for Translational Omics Research

Connect to a Rice node

Page 26: Introduction to Linux and Supercomputers · Introduction to Linux and Supercomputers Doug Crabill Senior Academic IT Specialist ... • The shell we use is called “bash” • Terminal

Big Data Training for Translational Omics Research

Connect to a Rice node

Page 27: Introduction to Linux and Supercomputers · Introduction to Linux and Supercomputers Doug Crabill Senior Academic IT Specialist ... • The shell we use is called “bash” • Terminal

Big Data Training for Translational Omics Research

Page 28: Introduction to Linux and Supercomputers · Introduction to Linux and Supercomputers Doug Crabill Senior Academic IT Specialist ... • The shell we use is called “bash” • Terminal

Big Data Training for Translational Omics Research

Page 29: Introduction to Linux and Supercomputers · Introduction to Linux and Supercomputers Doug Crabill Senior Academic IT Specialist ... • The shell we use is called “bash” • Terminal

Big Data Training for Translational Omics Research

Let’s all log in!

Page 30: Introduction to Linux and Supercomputers · Introduction to Linux and Supercomputers Doug Crabill Senior Academic IT Specialist ... • The shell we use is called “bash” • Terminal

Big Data Training for Translational Omics Research

Multiple Terminal windows

• Always check your prompt name/color to make sure where your commands will run

• To open an additional Terminal window on the same server as an existing Terminal, type:

Terminal &

• If you omit the &, the first Terminal cannot be used again until the second is closed

• Type exit to log out of a shell

Page 31: Introduction to Linux and Supercomputers · Introduction to Linux and Supercomputers Doug Crabill Senior Academic IT Specialist ... • The shell we use is called “bash” • Terminal

Big Data Training for Translational Omics Research

Using copy/paste

• Using the Windows shortcuts Control-C and Control-V will generally not work, because those keys are special

• Either select the text and select Edit/Copy and Edit/Paste

• Or select the text which implicitly copies it, and press down on the mouse wheel to paste (don’t roll it, press down like it’s a button)

Page 32: Introduction to Linux and Supercomputers · Introduction to Linux and Supercomputers Doug Crabill Senior Academic IT Specialist ... • The shell we use is called “bash” • Terminal

Big Data Training for Translational Omics Research

Filesystems

• Filesystems on Linux similar to network drives on Windows, but without drive letters

• Example directories on different filesystems: /home/train01, /depot/nihomics, /scratch/rice/t/train01

• Hierarchical. Directory names separated by “/”, not by “\” like with Windows. Avoid spaces in filenames and directory names.

Page 33: Introduction to Linux and Supercomputers · Introduction to Linux and Supercomputers Doug Crabill Senior Academic IT Specialist ... • The shell we use is called “bash” • Terminal

Big Data Training for Translational Omics Research

Filesystems

• Filesystems on Linux similar to network drives on Windows, but without drive letters

• Example directories on different filesystems: /home/train01, /depot/nihomics, /scratch/rice/t/train01

• Hierarchical. Directory names separated by “/”, not by “\” like with Windows. Avoid spaces in filenames and directory names.

Page 34: Introduction to Linux and Supercomputers · Introduction to Linux and Supercomputers Doug Crabill Senior Academic IT Specialist ... • The shell we use is called “bash” • Terminal

Big Data Training for Translational Omics Research

Filesystems on Rice

Page 35: Introduction to Linux and Supercomputers · Introduction to Linux and Supercomputers Doug Crabill Senior Academic IT Specialist ... • The shell we use is called “bash” • Terminal

Big Data Training for Translational Omics Research

Filesystems visible from Windows

• You have an O: drive which corresponds to the /depot/nihomics course material folder on Rice. This area is read-only. You cannot create/delete/modify files there.

• You have an R: drive which corresponds to the scratch directory in your home directory on Rice (really /scratch/rice/t/trainNN). This is fully read-write for your accounts.

Page 36: Introduction to Linux and Supercomputers · Introduction to Linux and Supercomputers Doug Crabill Senior Academic IT Specialist ... • The shell we use is called “bash” • Terminal

Big Data Training for Translational Omics Research

Shell features

• Shell environment variables used to control settings for how certain things work

• Thousands of potential commands can be executed

• Commands available varies from one Linux computer to the next, depending on what has been installed, and the value of your PATH environment variable

Page 37: Introduction to Linux and Supercomputers · Introduction to Linux and Supercomputers Doug Crabill Senior Academic IT Specialist ... • The shell we use is called “bash” • Terminal

Big Data Training for Translational Omics Research

Shell features (cont.)

• Filename completion (using “Tab” key)• Command completion (using “Tab” key)• Command line editing using arrow keys (up-

arrow key to go to the previous command)

Page 38: Introduction to Linux and Supercomputers · Introduction to Linux and Supercomputers Doug Crabill Senior Academic IT Specialist ... • The shell we use is called “bash” • Terminal

Big Data Training for Translational Omics Research

Let’s get dirty!

Page 39: Introduction to Linux and Supercomputers · Introduction to Linux and Supercomputers Doug Crabill Senior Academic IT Specialist ... • The shell we use is called “bash” • Terminal

Big Data Training for Translational Omics Research

Listing files in Terminal

• Type ls to list files in the current directory• Type ls -l to list files with more detail• Type ll to list files with even more detail

Page 40: Introduction to Linux and Supercomputers · Introduction to Linux and Supercomputers Doug Crabill Senior Academic IT Specialist ... • The shell we use is called “bash” • Terminal

Big Data Training for Translational Omics Research

Navigating directories in Terminal

• Type pwd to see full path to current directory• Type cd dirname to change directories• Type cd .. to go to the parent directory, or cd ../.. to go to the grandparent, etc.

• Type cd ~ to go to your home directory• cd /depot/nihomics/data

• Absolute paths start with /, relative paths are relative to the current directory

Page 41: Introduction to Linux and Supercomputers · Introduction to Linux and Supercomputers Doug Crabill Senior Academic IT Specialist ... • The shell we use is called “bash” • Terminal

Big Data Training for Translational Omics Research

Special directories

• /home/trainNN – Your home directory• /depot/nihomics – Contains data files and

scripts used in the workshop (O: drive)• /home/trainNN/scratch – Scratch

directory where you will be saving output files, scripts (R: drive)

• /scratch/rice/t/trainNN – SAME PLACE AS ABOVE!

Page 42: Introduction to Linux and Supercomputers · Introduction to Linux and Supercomputers Doug Crabill Senior Academic IT Specialist ... • The shell we use is called “bash” • Terminal

Big Data Training for Translational Omics Research

Editing, copying, moving files

Page 43: Introduction to Linux and Supercomputers · Introduction to Linux and Supercomputers Doug Crabill Senior Academic IT Specialist ... • The shell we use is called “bash” • Terminal

Big Data Training for Translational Omics Research

Editing, copying, moving files

• gedit filename – Edits filename• mv oldname newname – Moves a file or

directory, possibly to a new directory, possibly renaming the file or directory in the process

• cp oldname newname – Copies files• cp –r olddir newdir – Copies olddir

and all files and subdirectories within

Page 44: Introduction to Linux and Supercomputers · Introduction to Linux and Supercomputers Doug Crabill Senior Academic IT Specialist ... • The shell we use is called “bash” • Terminal

Big Data Training for Translational Omics Research

Create/Remove directories, files

• rm filename – removes filename• mkdir dirname – creates dirname• Rmdir dirname – removes dirname, but

only if dirname is empty• Let’s practice, and use filename completion

and command line editing while we are at it!

Page 45: Introduction to Linux and Supercomputers · Introduction to Linux and Supercomputers Doug Crabill Senior Academic IT Specialist ... • The shell we use is called “bash” • Terminal

Big Data Training for Translational Omics Research

Can copy/move/delete with GUI

• You can always fall back to using the O: drive and R: drive on Windows to move/delete files

• On a rice node (green/bold prompt!), open a graphical file browser to move/delete files via: thunar &

• If you accidentally omit the &, your Terminal cannot be used again until thunar is closed

Page 46: Introduction to Linux and Supercomputers · Introduction to Linux and Supercomputers Doug Crabill Senior Academic IT Specialist ... • The shell we use is called “bash” • Terminal

Big Data Training for Translational Omics Research

Terminating a program

• If you are running a program in a terminal window that you would like to terminate, press Control-C

• This won’t work if you started that program it with an &

Page 47: Introduction to Linux and Supercomputers · Introduction to Linux and Supercomputers Doug Crabill Senior Academic IT Specialist ... • The shell we use is called “bash” • Terminal

Big Data Training for Translational Omics Research

See what programs are running

• ps xuww - Show what programs we are running now

• PID column shows the Process ID of each program

• Can use top to see most CPU intensive programs currently running by everyone on this server. Press q or just control-c to exit top

Page 48: Introduction to Linux and Supercomputers · Introduction to Linux and Supercomputers Doug Crabill Senior Academic IT Specialist ... • The shell we use is called “bash” • Terminal

Big Data Training for Translational Omics Research

Terminate or kill or program

• Must first know the process id number (PID) using either ps xuww or top

• kill NNNNN Will kill most programs • kill –HUP NNNNN Use if the previous

doesn’t work• kill -9 NNNNN Use if the previous

doesn’t work

Page 49: Introduction to Linux and Supercomputers · Introduction to Linux and Supercomputers Doug Crabill Senior Academic IT Specialist ... • The shell we use is called “bash” • Terminal

Big Data Training for Translational Omics Research

Let’s practice starting/killing progs

• On a Rice node, type busy 1000 &• Type it again a few times (use the up-arrow!)• Type top to see the PIDs of all the jobs

running, press q to quit• Kill all of the busy jobs by typing something

like: kill 24933 24937 24939 24944• Type top again to confirm they are gone

Page 50: Introduction to Linux and Supercomputers · Introduction to Linux and Supercomputers Doug Crabill Senior Academic IT Specialist ... • The shell we use is called “bash” • Terminal

Big Data Training for Translational Omics Research

Redirecting input/ouput

• Some programs write output to the Terminal/shell screen

• We can save it using output redirection• qstat -a > out1 Saves results of the

command qstat -a to the file out1• head < out1 See the first 10 lines of out1• head < out1 > out2 Save to out2

Page 51: Introduction to Linux and Supercomputers · Introduction to Linux and Supercomputers Doug Crabill Senior Academic IT Specialist ... • The shell we use is called “bash” • Terminal

Big Data Training for Translational Omics Research

Redirecting input/ouput

• Can only save the text output that would have normally appeared on the screen. If a program wouldn’t normally generate any text output, nothing will be saved

• Terminal > out3 (Nothing is saved!)

Page 52: Introduction to Linux and Supercomputers · Introduction to Linux and Supercomputers Doug Crabill Senior Academic IT Specialist ... • The shell we use is called “bash” • Terminal

Big Data Training for Translational Omics Research

Pipes

Page 53: Introduction to Linux and Supercomputers · Introduction to Linux and Supercomputers Doug Crabill Senior Academic IT Specialist ... • The shell we use is called “bash” • Terminal

Big Data Training for Translational Omics Research

Pipes

• Pipes are a feature of the shell that permit the output of one program to read as the input of another program, without having to save an intermediate file

• All programs involved run simultaneously and process data as they receive it

Page 54: Introduction to Linux and Supercomputers · Introduction to Linux and Supercomputers Doug Crabill Senior Academic IT Specialist ... • The shell we use is called “bash” • Terminal

Big Data Training for Translational Omics Research

Example without pipes• qstat -a > out1

• awk ’{print $2}’ < out1 > out2

• sort < out2 > out3

• uniq -c < out3 > out4

• sort -n < out4

Page 55: Introduction to Linux and Supercomputers · Introduction to Linux and Supercomputers Doug Crabill Senior Academic IT Specialist ... • The shell we use is called “bash” • Terminal

Big Data Training for Translational Omics Research

Same example using pipes• qstat -a | awk ’{print $2}’ | sort | uniq -c | sort –n

Page 56: Introduction to Linux and Supercomputers · Introduction to Linux and Supercomputers Doug Crabill Senior Academic IT Specialist ... • The shell we use is called “bash” • Terminal

Big Data Training for Translational Omics Research

Where to find these slides

• From Windows, in:O:\etc\doug\bigtap-comp1.pdf

• From a Rice node, in:/depot/nihomics/etc/doug/bigtap-comp1.pdf

Page 57: Introduction to Linux and Supercomputers · Introduction to Linux and Supercomputers Doug Crabill Senior Academic IT Specialist ... • The shell we use is called “bash” • Terminal

Big Data Training for Translational Omics Research

Page 58: Introduction to Linux and Supercomputers · Introduction to Linux and Supercomputers Doug Crabill Senior Academic IT Specialist ... • The shell we use is called “bash” • Terminal

Big Data Training for Translational Omics Research