2014 05-27 - opinion: computing for genomics sucks

Post on 17-Aug-2014

1.410 Views

Category:

Devices & Hardware

15 Downloads

Preview:

Click to see full reader

DESCRIPTION

Some thoughts on 1. why the genomics bioinformaticians need hardware that differs from what traditional HPC providers provide 2. With input from @bmpvieira, @yeban, @gawbul .

TRANSCRIPT

Why computing for genomics research sucks.

y.wurm@qmul.ac.uk

BaltiBio 2014-05-27

Example Genomics Tasks

Repetitiveness “Disk” !Input/Output Memory Duration

per task

Build 10,000 trees 10,000x low low short

Trim FASTQ files 40-400x high low short

One de novo genome assembly 1 high high long

Many de novo genome assemblies 20-1000x high high long

Determine which of 10 new tools that

promise X can actually do X (once). !“genome hacking”

1 depends depends depends

Traditional High Performance Computing (HPC)

• Physics? Astronomy? Maths? Chemistry?

• Traditional HPC infrastructures are great at small tasks:

Repetitiveness “Disk” !Input/Output Memory Duration

per task

Build 10,000 trees 10,000x low low short

• And/or have mechanisms/tools that transform their challenges into many small tasks.

“We have 9999 cores!” - central IT admin

but they are inadequate

Big Ass Servers• e.g.: 1.5 TB ram; 48 cores -

SSH into it and do whatever you want.

Repetitiveness “Disk” !Input/Output Memory Duration

per task

Build 10,000 trees 10,000x low low short

Trim FASTQ files 40-400x high low short

One de novo genome assembly 1 high high long

Many de novo genome assemblies 20-1000x high high long

Determine which of 10 new tools that promise

X can actually do X (once). !

1 depends depends depends

Jeremy Leipzig

Additional challenges for biologists• Datasets continue growing fast!

• Generally:

• We lack computational training.

• Bioinformatics tools suck (badly written, badly tested, hard to install).

So what do we need? • access to machines of all shapes and sizes

• big and small machines

• direct access via ssh (for hacking & doing things few times)

• indirect access via queue (for doing things many times)

• fast I/O - cheap archival.

• single login: all files “feel” like they’re in one place

Swiss Institute of Bioinformatics: Vital-IT

So what do we need? • access to machines of all shapes and sizes

• big and small machines

• direct access via ssh (for hacking & doing things few times)

• indirect access via queue (for doing things many times)

• fast I/O - cheap archival.

• single login; all files “feel” like they’re in one place

• easily changeable software & OS versions

Easily changeable OS & software versions

https://www.docker.io

>docker-switch bio-linux7# do stuff >docker-switch pacbio-assembly-vm# do other stuff>docker-switch antlab-ubuntu# do more stuff

@bmpvieira

Easily changeable OS & software versions

https://www.docker.io

>docker-switch bio-linux7# do stuff >docker-switch pacbio-assembly-vm# do other stuff>docker-switch antlab-ubuntu# do more stuff FAK

E@bmpvieira

What if Apple/Google made an idiot-proof cloud computing

system for genomics?

What if Apple/Google made an idiot-proof cloud computing

system for genomics?• Always on - single place to connect to:

ssh mylab.awskiller.co.uk

• Dropbox-like shared directories & file checksumming.

• Easily switchable OS version / “VM”.

• Automagically & transparently migrates:• from small to huge machines (and back) as CPU and RAM

demands change.

What if Apple/Google made an idiot-proof cloud computing

system for genomics?• Always on - single place to connect to:

ssh mylab.awskiller.co.uk

• Dropbox-like shared directories & file checksumming.

• Easily switchable OS version / “VM”.

• Automagically & transparently migrates:• from small to huge machines (and back) as CPU and RAM

demands change. • from one physical site (huge dataset) to another

Summary• Broad range of needs:!

• some similar to traditional HPC.!• some very different!!

• Users are naive.!• Tools are experimental.!• Datasets are experimental.!• IT people have difficulty understanding this.

• Do not trust them when they say things will just work! !

• A lot of potential to make things not suck.

Evolutionary Genetics group & Queen Mary U London

Bruno Vieira - @bmpvieira

Steve Moss - @gawbul

Anurag Priyam - @yeban

Richard Christie & ITS Research Support team @ Queen Mary U London

Ioannis Xenarios & Vital-IT team @ Swiss Institute of Bioinformatics

http://yannick.poulet.orgy.wurm@qmul.ac.uk

top related