kallio bosc2010 chipster-cloud

12
Aleksi Kallio CSC – IT Center for Science, Finland Connecting Chipster genome browser to the cloud

Upload: bosc-2010

Post on 11-Jun-2015

400 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Kallio bosc2010 chipster-cloud

Aleksi KallioCSC – IT Center for Science, Finland

Connecting Chipster genome browser to the cloud

Page 2: Kallio bosc2010 chipster-cloud

Architecture of Chipster platform

Loosely coupled, independent components Message oriented communications Flexible, scalable, robust In other words, very cloud like

Clients

Authentication service

Management service

Computing services

Brokers

Message broker

File broker

Page 3: Kallio bosc2010 chipster-cloud
Page 4: Kallio bosc2010 chipster-cloud

Chipster in the cloud

1) Deploying compute nodes in the cloud• Easy, because architecture already loosely coupled and based

on message passing 2) Running large parallel jobs in the cloud

• Architecture allows this easily• Cloud compatible tools can be integrated quickly

3) Using cloud as a back end for interactive visualisations

• Not maybe so obvious• So let's dig into this further...

Page 5: Kallio bosc2010 chipster-cloud

Background: Chipster Genome Browser

Interactive Swing-based GUI Shows reads and analysis results in genomic context Interactive zooming from chromosome down to nucleotide level Ensembl annotations for genes and transcripts Integrated with the rest of the Chipster Parallel, distributed to some extent

Page 6: Kallio bosc2010 chipster-cloud
Page 7: Kallio bosc2010 chipster-cloud
Page 8: Kallio bosc2010 chipster-cloud
Page 9: Kallio bosc2010 chipster-cloud

Basic idea

Preprocess data with Hadoop / MapReduce Generate powers of two summaries for the data, like in

Google Earth• Doubles the data size

Current genome browser samples data to produce summaries

Now summaries can be read directly– Accurate results, significantly less disk seeks

Distribute data to scale into massive datasets• Use messaging to query independent data providers

Aggregate results as/if they appear to the visualiser

Page 10: Kallio bosc2010 chipster-cloud

Work in progress...

Genome browser up and running

Hadoop based data processing at very early stages

Currently trying to get it scale well

Page 11: Kallio bosc2010 chipster-cloud

What's the point?

Besides items (e.g., reads), visualiser can receive “superitems” (e.g., summaries of reads)

• Summarises coverage, quality, SNP's etc. of the original reads All kinds of advanced information can be generated in

the preprocessing step– Such as features that combine large number of genomes– Generators should be pluggable

We spend resources on the server side to improve user experience on the client side

• At server side CPU, memory and disk space required• But only for a short time (like in large batch jobs)• Cheap commodity servers can be used• And the experiment has already been expensive

Page 12: Kallio bosc2010 chipster-cloud

Summary

Use cheap server resources to enable better user experience

Goal: to make data analysis quicker (and more fun) Tackle server side unreliability on the client side Future development

– If this works out, it could be used in other Chipster visualisers also

– Integrating Hbase queries to interactive visualisations– Optimising data summarising for visual truthfulness

For more info: [email protected],