barbera van schaik -...

10
Data analysis on unix/linux systems Barbera van Schaik KEBB, Bioinformatics Laboratory Academic Medical Center, Amsterdam [email protected] 8 March 2018 Barbera van Schaik (AMC) GS Unix 8 March 2018 1 / 10

Upload: others

Post on 13-Mar-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Barbera van Schaik - bioinformatics.amc.nlbioinformatics.amc.nl/wp-content/uploads/gs-unix/20180308-gs-unix.pdf · Barbera van Schaik (AMC) GS Unix 8 March 2018 2/10. Linux and bioinformatics

Data analysis on unix/linux systems

Barbera van Schaik

KEBB, Bioinformatics LaboratoryAcademic Medical Center, Amsterdam [email protected]

8 March 2018

Barbera van Schaik (AMC) GS Unix 8 March 2018 1 / 10

Page 2: Barbera van Schaik - bioinformatics.amc.nlbioinformatics.amc.nl/wp-content/uploads/gs-unix/20180308-gs-unix.pdf · Barbera van Schaik (AMC) GS Unix 8 March 2018 2/10. Linux and bioinformatics

At the end of today you are able to...

Download and install a command line program

Download a dataset

Run the program

Investigate and change parameters

Make data summaries using common linux commands

Barbera van Schaik (AMC) GS Unix 8 March 2018 2 / 10

Page 3: Barbera van Schaik - bioinformatics.amc.nlbioinformatics.amc.nl/wp-content/uploads/gs-unix/20180308-gs-unix.pdf · Barbera van Schaik (AMC) GS Unix 8 March 2018 2/10. Linux and bioinformatics

Linux and bioinformatics

Powerful software development environment

Flexibility of command line tools

Ability to create pipelines

Many open source software/code available for (re)use

Barbera van Schaik (AMC) GS Unix 8 March 2018 3 / 10

Page 4: Barbera van Schaik - bioinformatics.amc.nlbioinformatics.amc.nl/wp-content/uploads/gs-unix/20180308-gs-unix.pdf · Barbera van Schaik (AMC) GS Unix 8 March 2018 2/10. Linux and bioinformatics

Open source

Video

http://youtu.be/P_mS4CIXcLY

Stephen Fry for the 25th birthday of GNU

Free as in freedom

You can use, change, integrate, and review the codeOpen source allows sharing and promotes collaborationNo vendor lock-in

Barbera van Schaik (AMC) GS Unix 8 March 2018 4 / 10

Page 5: Barbera van Schaik - bioinformatics.amc.nlbioinformatics.amc.nl/wp-content/uploads/gs-unix/20180308-gs-unix.pdf · Barbera van Schaik (AMC) GS Unix 8 March 2018 2/10. Linux and bioinformatics

Open source

Software

Databases

Journals

Standards

Hardware

Art

Money

Drinks

Medicine

Fashion

Educationhttps://en.wikipedia.org/wiki/Open_source

Barbera van Schaik (AMC) GS Unix 8 March 2018 5 / 10

Page 6: Barbera van Schaik - bioinformatics.amc.nlbioinformatics.amc.nl/wp-content/uploads/gs-unix/20180308-gs-unix.pdf · Barbera van Schaik (AMC) GS Unix 8 March 2018 2/10. Linux and bioinformatics

Work environment

http://surfsara.nl/

Barbera van Schaik (AMC) GS Unix 8 March 2018 6 / 10

Page 7: Barbera van Schaik - bioinformatics.amc.nlbioinformatics.amc.nl/wp-content/uploads/gs-unix/20180308-gs-unix.pdf · Barbera van Schaik (AMC) GS Unix 8 March 2018 2/10. Linux and bioinformatics

SurfSara HPC cloud

http://surfsara.nl/

Barbera van Schaik (AMC) GS Unix 8 March 2018 7 / 10

Page 8: Barbera van Schaik - bioinformatics.amc.nlbioinformatics.amc.nl/wp-content/uploads/gs-unix/20180308-gs-unix.pdf · Barbera van Schaik (AMC) GS Unix 8 March 2018 2/10. Linux and bioinformatics

Exercises

Dataset

Metagenomics sample from glacier ice in Germany16S RNA has been sequencedRoche 454 dataset

Software

Linux tools to download data and softwareYour own program to convert between data formatsBasic Local Alignment Search ToolUse standard linux tools for summarizing data

Barbera van Schaik (AMC) GS Unix 8 March 2018 8 / 10

Page 9: Barbera van Schaik - bioinformatics.amc.nlbioinformatics.amc.nl/wp-content/uploads/gs-unix/20180308-gs-unix.pdf · Barbera van Schaik (AMC) GS Unix 8 March 2018 2/10. Linux and bioinformatics

A few ”warnings”

To take into account when you analyse data

Run time issues

Memory limits

Disk space

Long run times

Barbera van Schaik (AMC) GS Unix 8 March 2018 9 / 10

Page 10: Barbera van Schaik - bioinformatics.amc.nlbioinformatics.amc.nl/wp-content/uploads/gs-unix/20180308-gs-unix.pdf · Barbera van Schaik (AMC) GS Unix 8 March 2018 2/10. Linux and bioinformatics

HELP

I don’t know how this program works

RTMman commandcommand –helpcommand -hcommand[enter]

My program gives an error

File names with spacesTypoAnd thousands of other reasons

The internet can help you

Duck duck go is your friend - https://duckduckgo.com/Stack overflow - https://stackoverflow.com/SEQanswers forum - http://seqanswers.com/

Barbera van Schaik (AMC) GS Unix 8 March 2018 10 / 10