cloud computing for research roger s barga dennis gannon, jared jackson, wei lu, jaliya ekanayake,...

23
Cloud Computing for Research Roger S Barga Dennis Gannon, Jared Jackson, Wei Lu, Jaliya Ekanayake, Mohamed Fathalla Extreme Computing Group, MSR

Upload: julianne-slough

Post on 14-Dec-2015

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Cloud Computing for Research Roger S Barga Dennis Gannon, Jared Jackson, Wei Lu, Jaliya Ekanayake, Mohamed Fathalla Extreme Computing Group, MSR

Cloud Computing for ResearchRoger S Barga

Dennis Gannon, Jared Jackson, Wei Lu, Jaliya Ekanayake, Mohamed Fathalla

Extreme Computing Group, MSR

Page 2: Cloud Computing for Research Roger S Barga Dennis Gannon, Jared Jackson, Wei Lu, Jaliya Ekanayake, Mohamed Fathalla Extreme Computing Group, MSR

Three areas of focus for our team• Highly Scalable Research Services• Target high value research applications that currently impede

progress• Release as open source to research community

• Lower barriers between clouds and research – impedance mismatch• Do I really need to rewrite/port my application?• Do I really need to know that I am even using a cloud? (client +

cloud)

• Services for data scientists to explore extremely large data sets• Data Analytics as a Service• Raise the level of abstraction for deploying and using analytics

• Provide technical support to NSF Computing in Cloud PIs (groups), part of an international program (US, Europe, Asia,…)

Page 3: Cloud Computing for Research Roger S Barga Dennis Gannon, Jared Jackson, Wei Lu, Jaliya Ekanayake, Mohamed Fathalla Extreme Computing Group, MSR

Computational Resources in ResearchLack of Broad Access

70M

1M

14M

High Performance Data-intensive Capacity

80%

20%14M

1M

Scientists & Engineers

55M Little to no access to high performance data-intensive capacity

Page 4: Cloud Computing for Research Roger S Barga Dennis Gannon, Jared Jackson, Wei Lu, Jaliya Ekanayake, Mohamed Fathalla Extreme Computing Group, MSR

Reference DatabasesMultiple Disciplines

Sequencing Technology

Exponential Growth

Data Explosion in Bioinformatics & Life Sciences

The Response

Reference Databases Metagenomics• Biological Engineering• Genomics• Environmental Engineering• Oceanography, Climate Research

Sequencing Technology

The ChallengeEnable Discovery

NCBI Trace Library

Page 5: Cloud Computing for Research Roger S Barga Dennis Gannon, Jared Jackson, Wei Lu, Jaliya Ekanayake, Mohamed Fathalla Extreme Computing Group, MSR

NCBI BLAST

BLAST (Basic Local Alignment Search Tool) • One of the most important software in bioinformatics• Identify similarity between bio-sequences

Computationally intensive• Large number of pairwise alignment operations• A normal BLAST running could take 700 ~ 1000 CPU hours

For most biologists, two choices to run large jobs• Build a local cluster • Submit jobs to NCBI or EBI (long job queue times)

Page 6: Cloud Computing for Research Roger S Barga Dennis Gannon, Jared Jackson, Wei Lu, Jaliya Ekanayake, Mohamed Fathalla Extreme Computing Group, MSR

NCBI BLAST on Windows Azure• Parallel BLAST engine on Azure

• Query-segmentation, data-parallel pattern• split the input sequences• query partitions in parallel• merge results together when done

• Follows the general suggested application model for Window Azure • Web Role + Queue + Worker

• With three special considerations• Batch job management• Task parallelism on an elastic Cloud• Large data-set management

Page 7: Cloud Computing for Research Roger S Barga Dennis Gannon, Jared Jackson, Wei Lu, Jaliya Ekanayake, Mohamed Fathalla Extreme Computing Group, MSR

AzureBLAST Task-Flow

A simple split/Join pattern

Leverage multi-core of one instance • argument “–num_threads” of NCBI-BLAST

Task granularity • Large partition load imbalance • Small partition unnecessary overheads• NCBI-BLAST overhead• Data transferring overhead.

Best Practice: test runs to profile and determine optimal size…

BLAST task

Splitting task

BLAST task

BLAST task

BLAST task

Merging Task

Page 8: Cloud Computing for Research Roger S Barga Dennis Gannon, Jared Jackson, Wei Lu, Jaliya Ekanayake, Mohamed Fathalla Extreme Computing Group, MSR

Micro-Benchmarks Inform Design

Task size vs. Performance• Benefit of the warm cache effect• 100 sequences per partition is the best

choice

Instance size vs. Performance• Super-linear speedup with larger size

worker instances• Primarily due to the memory capability.

Task Size/Instance Size vs. Cost• Extra-large instance generated the best

and the most economical throughput• Fully utilize the resource

Page 9: Cloud Computing for Research Roger S Barga Dennis Gannon, Jared Jackson, Wei Lu, Jaliya Ekanayake, Mohamed Fathalla Extreme Computing Group, MSR

R. palustris as a platform for H2 productionIdentify key drivers for producing hydrogen, promising

alternative fuel – understand R. palustris well enough to be able to improve its H2 production;

Characterize a population of strains and use integrative genomics approaches to dissect the molecular networks of H2 production;

BLAST to query 16 strains to sort out genetic relationships• Each strain, estimated ~5,000 proteins • Jobs kicked off NCBI clusters before completion• Against NCBI non-redundant proteins in ~30 min• Against ~5,000 proteins from another strain < 30 sec• Publishable result in one day for roughly $150.

Eric Schadt, Pac Bio and Sam Phattarasukol Harwood Lab, UW

Page 10: Cloud Computing for Research Roger S Barga Dennis Gannon, Jared Jackson, Wei Lu, Jaliya Ekanayake, Mohamed Fathalla Extreme Computing Group, MSR

All-Against-All ExperimentDiscovering Homologs • BLAST Uniref100, non-redundant protein sequence database• Discover the interrelationships of known protein sequences

“All against All” query• The database is also the input query• The protein database is large (4.2 GB size)

• Total of 9,865,668 sequences to be queried• Theoretically, 100 billion sequence comparisons!

Performance estimation• Estimated completion, 3,216,731 minutes (6.1 years) on 8 core VM

One of largest BLAST jobs as far as we know• This scale of experiment is usually infeasible to most researchers

Page 11: Cloud Computing for Research Roger S Barga Dennis Gannon, Jared Jackson, Wei Lu, Jaliya Ekanayake, Mohamed Fathalla Extreme Computing Group, MSR

Our Approach• Allocated a total of ~4000 instances • 475 extra-large VMs (8 cores per VM)

• 8 deployments of AzureBLAST• Each deployment has its own co-located storage service

• Divided 10 million sequences into multiple segments• Each was submitted to one deployment as one job for execution• 300,000 tasks on ~4000 cores on Azure (70,000 bp or 35 sequences per

task)

Page 12: Cloud Computing for Research Roger S Barga Dennis Gannon, Jared Jackson, Wei Lu, Jaliya Ekanayake, Mohamed Fathalla Extreme Computing Group, MSR

Cloud System Upgrades

North Europe Data Center, totally 34,256 tasks processed

All 62 nodes lost tasks and then came back together. This is an update domain

~30 mins

~ 6 nodes in one group

Page 13: Cloud Computing for Research Roger S Barga Dennis Gannon, Jared Jackson, Wei Lu, Jaliya Ekanayake, Mohamed Fathalla Extreme Computing Group, MSR

35 Nodes experience blob

writing failure at the same time

Failures HappenWest Europe Datacenter; 30,976 tasks are completed, and job was killed

Reasonable guess: Fault Domain is

working

Page 14: Cloud Computing for Research Roger S Barga Dennis Gannon, Jared Jackson, Wei Lu, Jaliya Ekanayake, Mohamed Fathalla Extreme Computing Group, MSR

Impedance mismatch – Azure designed to manage long running services in a highly available, cost effective manner. Researchers operate quite differently…

• Business: “develop, deploy and forget”• Researcher “constantly changing codebase, tasks,

dependencies”

Anthill – Making Azure easier to use for researchers…

> AHill myCalc.exe mycalc will run on Azure

> AHill myCalc.exe d1 d2 d3…parameter sweep on Azure

…> AHill myCalc1 … concurrent execution using a VM pool

> AHill myCalc2 …> AHill myCalc3 …

Page 15: Cloud Computing for Research Roger S Barga Dennis Gannon, Jared Jackson, Wei Lu, Jaliya Ekanayake, Mohamed Fathalla Extreme Computing Group, MSR

Impedance mismatch – Azure designed to manage long running services in a highly available, cost effective manner. Researchers operate quite differently…

• Business: “develop, deploy and forget”• Researcher “constantly changing codebase, tasks,

dependencies”

Anthill – Making Azure easier to use for researchers…

Completed Support application parametric sweeps (various

patterns) Support for complex data types (any

ISerializable type) Support for scheduler fault tolerance (no single

point of failure)

Ongoing Complex schedules (workflows), in progress Prepare for an open release

Page 16: Cloud Computing for Research Roger S Barga Dennis Gannon, Jared Jackson, Wei Lu, Jaliya Ekanayake, Mohamed Fathalla Extreme Computing Group, MSR

Impedance mismatch – Azure designed to manage long running services in a highly available, cost effective manner. Researchers operate quite differently…

• Business: “develop, deploy and forget”• Researcher “constantly changing codebase, tasks,

dependencies”

Anthill – Making Azure easier to use for researchers…

Lessons Learned Master scheduling work into a pool of

slaves is highly efficient Lightweight workflow to coordinate task

flow Fault tolerance, data movement between

tasks (don’t always write results to long term storage, wait to see if future tasks reuse).

Page 17: Cloud Computing for Research Roger S Barga Dennis Gannon, Jared Jackson, Wei Lu, Jaliya Ekanayake, Mohamed Fathalla Extreme Computing Group, MSR

Excel DataScopeOffer data analytics as a service on Windows Azure that enables users to upload and extract patterns from data, identify hidden associations, discover similarities, forecast time series...

The project includes an extensible collection of data analytics and machine learning algorithms and runtime service on Azure that scales out the execution of these algorithms. Analysts can submit, sample, and analyze data from Excel through a customizable data analytics ribbon.

Page 18: Cloud Computing for Research Roger S Barga Dennis Gannon, Jared Jackson, Wei Lu, Jaliya Ekanayake, Mohamed Fathalla Extreme Computing Group, MSR

Offer data analytics as a service on Windows Azure that enables users to upload and extract patterns from data, identify hidden associations, discover similarities, forecast time series...

The project includes an extensible collection of data analytics and machine learning algorithms and runtime service on Azure that scales out the execution of these algorithms. Analysts can submit, sample, and analyze data from Excel through a customizable data analytics ribbon. So what are we building…• A common framework for implementing analytics algorithms and machine

learning, which can efficiently scale out to handle jobs of varying size;• Highly efficient MapReduce framework, from batch to streaming/iteration• In-memory processing algorithms, whenever possible• Minimize I/O overhead• Incremental processing

• Efficient jobs scheduling of MapReduce tasks across a shared pool of Azure VMs;

• Data services, from partitioning data across VMs to shared read-only working sets;

Excel DataScope

Page 19: Cloud Computing for Research Roger S Barga Dennis Gannon, Jared Jackson, Wei Lu, Jaliya Ekanayake, Mohamed Fathalla Extreme Computing Group, MSR

Excel DataScope

Page 20: Cloud Computing for Research Roger S Barga Dennis Gannon, Jared Jackson, Wei Lu, Jaliya Ekanayake, Mohamed Fathalla Extreme Computing Group, MSR

Observations and Experience• Clouds are the largest scale compute centers ever

constructed and have the potential to be important to large & small scale research.• There is an impedance mismatch between clouds and

research workloads

• Equally import they can increase participation in research, providing much needed resources to users and communities which lack ready access.

• Provide valuable fault tolerance and scalability abstractions

• Select the best fit VM for the job (CPU / Memory / Network)

• Guidance, recommendations, examples are just hints• Always measure if in doubt…

Page 21: Cloud Computing for Research Roger S Barga Dennis Gannon, Jared Jackson, Wei Lu, Jaliya Ekanayake, Mohamed Fathalla Extreme Computing Group, MSR

Resources: AzureScopehttp://azurescope.cloudapp.net • Simple benchmarks illustrating

basic performance for compute and storage services

• Benchmarks for reference algorithms

• Best Practice tips• Code Samples

Email us with questions at [email protected]

Page 22: Cloud Computing for Research Roger S Barga Dennis Gannon, Jared Jackson, Wei Lu, Jaliya Ekanayake, Mohamed Fathalla Extreme Computing Group, MSR

Resources: AzureScopehttp://azurescope.cloudapp.net • Simple benchmarks illustrating

basic performance for compute and storage services

• Benchmarks for reference algorithms

• Best Practice tips• Code Samples

Email us with questions at [email protected]

Page 23: Cloud Computing for Research Roger S Barga Dennis Gannon, Jared Jackson, Wei Lu, Jaliya Ekanayake, Mohamed Fathalla Extreme Computing Group, MSR

© 2010 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.