![Page 1: Judy Qiu - Unical › hpc2012 › pdfs › qiu.pdf · Zang, Tak‐Lon Wu and Judy Qiu, (UCC 2011) , Melbourne, Australia. Performance of Pleasingly Parallel Applications ... Improving](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0f63f17e708231d443ec7d/html5/thumbnails/1.jpg)
SALSA HPC Group http://salsahpc.indiana.edu
School of Informatics and ComputingIndiana University
Judy Qiu
CAREER Award
![Page 2: Judy Qiu - Unical › hpc2012 › pdfs › qiu.pdf · Zang, Tak‐Lon Wu and Judy Qiu, (UCC 2011) , Melbourne, Australia. Performance of Pleasingly Parallel Applications ... Improving](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0f63f17e708231d443ec7d/html5/thumbnails/2.jpg)
6/30/2012 Bill Howe, eScience Institute 2
"... computing may someday be organized as a public utility just
asthe telephone system is a public utility... The computer utility
could
become the basis of a new and important industry.”
Emeritus at Stanford
Inventor of LISP
‐‐
John McCarthy
1961
![Page 3: Judy Qiu - Unical › hpc2012 › pdfs › qiu.pdf · Zang, Tak‐Lon Wu and Judy Qiu, (UCC 2011) , Melbourne, Australia. Performance of Pleasingly Parallel Applications ... Improving](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0f63f17e708231d443ec7d/html5/thumbnails/3.jpg)
Joseph L. Hellerstein, Google
![Page 4: Judy Qiu - Unical › hpc2012 › pdfs › qiu.pdf · Zang, Tak‐Lon Wu and Judy Qiu, (UCC 2011) , Melbourne, Australia. Performance of Pleasingly Parallel Applications ... Improving](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0f63f17e708231d443ec7d/html5/thumbnails/4.jpg)
Challenges and Opportunities
• Iterative MapReduce – A Programming Model instantiating the paradigm of
bringing computation to data – Supporting for Data Mining and Data Analysis
• Interoperability– Using the same computational tools on HPC and Cloud– Enabling scientists to focus on science not programming
distributed systems• Reproducibility
– Using Cloud Computing for Scalable, Reproducible Experimentation
– Sharing results, data, and software
![Page 5: Judy Qiu - Unical › hpc2012 › pdfs › qiu.pdf · Zang, Tak‐Lon Wu and Judy Qiu, (UCC 2011) , Melbourne, Australia. Performance of Pleasingly Parallel Applications ... Improving](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0f63f17e708231d443ec7d/html5/thumbnails/5.jpg)
SALSAIntel’s Application Stack
![Page 6: Judy Qiu - Unical › hpc2012 › pdfs › qiu.pdf · Zang, Tak‐Lon Wu and Judy Qiu, (UCC 2011) , Melbourne, Australia. Performance of Pleasingly Parallel Applications ... Improving](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0f63f17e708231d443ec7d/html5/thumbnails/6.jpg)
SALSA
Linux HPCBare‐system
Amazon Cloud Windows Server
HPCBare‐system Virtualization
Cross Platform Iterative MapReduce (Collectives, Fault Tolerance, Scheduling)
Kernels, Genomics, Proteomics, Information Retrieval, Polar Science,
Scientific Simulation Data Analysis and Management, Dissimilarity
Computation, Clustering, Multidimensional Scaling, Generative Topological
Mapping
CPU Nodes
Virtualization
Applications
Programming Model
Infrastructure
Hardware
Azure Cloud
Security, Provenance, Portal
High Level Language
Distributed File Systems Data Parallel File System
Grid
Appliance
GPU Nodes
Support Scientific Simulations (Data Mining and Data Analysis)
Runtime
Storage
Services and Workflow
Object Store
![Page 7: Judy Qiu - Unical › hpc2012 › pdfs › qiu.pdf · Zang, Tak‐Lon Wu and Judy Qiu, (UCC 2011) , Melbourne, Australia. Performance of Pleasingly Parallel Applications ... Improving](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0f63f17e708231d443ec7d/html5/thumbnails/7.jpg)
SALSA
Ideal for data intensive loosely coupled (pleasingly parallel) applications
Ideal for data intensive loosely coupled (pleasingly parallel) applications
![Page 8: Judy Qiu - Unical › hpc2012 › pdfs › qiu.pdf · Zang, Tak‐Lon Wu and Judy Qiu, (UCC 2011) , Melbourne, Australia. Performance of Pleasingly Parallel Applications ... Improving](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0f63f17e708231d443ec7d/html5/thumbnails/8.jpg)
8
MapReduce in Heterogeneous Environment
MICROSOFT
![Page 9: Judy Qiu - Unical › hpc2012 › pdfs › qiu.pdf · Zang, Tak‐Lon Wu and Judy Qiu, (UCC 2011) , Melbourne, Australia. Performance of Pleasingly Parallel Applications ... Improving](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0f63f17e708231d443ec7d/html5/thumbnails/9.jpg)
• Twister[1]– Map‐>Reduce‐>Combine‐>Broadcast– Long running map tasks (data in memory)– Centralized driver based, statically scheduled.
• Daytona[3]– Iterative MapReduce on Azure using cloud services– Architecture similar to Twister
• Haloop[4]– On disk caching, Map/reduce input caching, reduce output caching• Spark[5]– Iterative Mapreduce Using Resilient Distributed Dataset to ensure the
fault tolerance• Pregel[6]– Graph processing from Google
Iterative MapReduce Frameworks
![Page 10: Judy Qiu - Unical › hpc2012 › pdfs › qiu.pdf · Zang, Tak‐Lon Wu and Judy Qiu, (UCC 2011) , Melbourne, Australia. Performance of Pleasingly Parallel Applications ... Improving](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0f63f17e708231d443ec7d/html5/thumbnails/10.jpg)
Others• Mate‐EC2[6]
– Local reduction object• Network Levitated Merge[7]
– RDMA/infiniband based shuffle & merge• Asynchronous Algorithms in MapReduce[8]
– Local & global reduce • MapReduce online[9]
– online aggregation, and continuous queries– Push data from Map to Reduce
• Orchestra[10]– Data transfer improvements for MR
• iMapReduce[11]– Async iterations, One to one map & reduce mapping, automatically
joins loop‐variant and invariant data• CloudMapReduce[12] & Google AppEngine MapReduce[13]
– MapReduce frameworks utilizing cloud infrastructure services
![Page 11: Judy Qiu - Unical › hpc2012 › pdfs › qiu.pdf · Zang, Tak‐Lon Wu and Judy Qiu, (UCC 2011) , Melbourne, Australia. Performance of Pleasingly Parallel Applications ... Improving](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0f63f17e708231d443ec7d/html5/thumbnails/11.jpg)
![Page 12: Judy Qiu - Unical › hpc2012 › pdfs › qiu.pdf · Zang, Tak‐Lon Wu and Judy Qiu, (UCC 2011) , Melbourne, Australia. Performance of Pleasingly Parallel Applications ... Improving](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0f63f17e708231d443ec7d/html5/thumbnails/12.jpg)
Distinction on static and variable data
Configurable long running (cacheable) map/reduce tasks
Pub/sub messaging based communication/data transfers
Broker Network for facilitating communication
![Page 13: Judy Qiu - Unical › hpc2012 › pdfs › qiu.pdf · Zang, Tak‐Lon Wu and Judy Qiu, (UCC 2011) , Melbourne, Australia. Performance of Pleasingly Parallel Applications ... Improving](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0f63f17e708231d443ec7d/html5/thumbnails/13.jpg)
configureMaps(..)
configureReduce(..)
runMapReduce(..)
while(condition){
} //end while
updateCondition()
close()
Combine()
operation
Reduce()
Map()
Worker Nodes
Communications/data transfers via the
pub‐sub broker network & direct TCP
Iterations
May send <Key,Value> pairs directly
Local Disk
Cacheable map/reduce tasks
• Main program may contain many MapReduce invocations or iterative
MapReduce invocations
Main program’s process space
![Page 14: Judy Qiu - Unical › hpc2012 › pdfs › qiu.pdf · Zang, Tak‐Lon Wu and Judy Qiu, (UCC 2011) , Melbourne, Australia. Performance of Pleasingly Parallel Applications ... Improving](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0f63f17e708231d443ec7d/html5/thumbnails/14.jpg)
Worker Node
Local Disk
Worker Pool
Twister Daemon
Master Node
Twister Driver
Main Program
B
BB
B
Pub/sub Broker Network
Worker Node
Local Disk
Worker Pool
Twister Daemon
Scripts perform:Data distribution, data collection,
and partition file creation
map
reduce Cacheable tasks
One broker
serves several
Twister daemons
![Page 15: Judy Qiu - Unical › hpc2012 › pdfs › qiu.pdf · Zang, Tak‐Lon Wu and Judy Qiu, (UCC 2011) , Melbourne, Australia. Performance of Pleasingly Parallel Applications ... Improving](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0f63f17e708231d443ec7d/html5/thumbnails/15.jpg)
Applications of Twister4Azure• Implemented
– Multi Dimensional Scaling– KMeans Clustering– PageRank– SmithWatermann‐GOTOH sequence alignment– WordCount– Cap3 sequence assembly– Blast sequence search– GTM & MDS interpolation
• Under Development– Latent Dirichlet Allocation
![Page 16: Judy Qiu - Unical › hpc2012 › pdfs › qiu.pdf · Zang, Tak‐Lon Wu and Judy Qiu, (UCC 2011) , Melbourne, Australia. Performance of Pleasingly Parallel Applications ... Improving](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0f63f17e708231d443ec7d/html5/thumbnails/16.jpg)
Twister4Azure ArchitectureTwister4Azure Architecture
Azure Queues for scheduling, Tables to store meta‐data and monitoring data, Blobs for
input/output/intermediate data storage.
![Page 17: Judy Qiu - Unical › hpc2012 › pdfs › qiu.pdf · Zang, Tak‐Lon Wu and Judy Qiu, (UCC 2011) , Melbourne, Australia. Performance of Pleasingly Parallel Applications ... Improving](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0f63f17e708231d443ec7d/html5/thumbnails/17.jpg)
Data Intensive Iterative Applications
• Growing class of applications– Clustering, data mining, machine learning & dimension
reduction applications– Driven by data deluge & emerging computation fields
Compute Communication Reduce/ barrier
New Iteration
Larger Loop‐
Invariant Data
Larger Loop‐
Invariant Data
Smaller Loop‐
Variant Data
Smaller Loop‐
Variant DataBroadcastBroadcast
![Page 18: Judy Qiu - Unical › hpc2012 › pdfs › qiu.pdf · Zang, Tak‐Lon Wu and Judy Qiu, (UCC 2011) , Melbourne, Australia. Performance of Pleasingly Parallel Applications ... Improving](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0f63f17e708231d443ec7d/html5/thumbnails/18.jpg)
Reduce
Reduce
MergeAdd
Iteration? No
Map Combine
Map Combine
Map Combine
Data Cache
Yes
Hybrid scheduling of the new iteration
Job Start
Job Finish
Iterative MapReduce for Azure Cloud
Merge stepMerge stepMerge step
http://salsahpc.indiana.edu/twister4azure
Extensions to support
broadcast data
Extensions to support Extensions to support
broadcast databroadcast data
Multi‐level
caching of static
data
MultiMulti‐‐level level
caching of static caching of static
datadata
Hybrid intermediate
data transfer
Hybrid intermediate Hybrid intermediate
data transferdata transfer
Cache‐aware
Hybrid Task
Scheduling
CacheCache‐‐aware aware
Hybrid Task Hybrid Task
SchedulingScheduling
Collective
Communication
Primitives
Collective Collective
Communication Communication
PrimitivesPrimitives
Portable Parallel Programming on Cloud and HPC: Scientific Applications of Twister4Azure, Thilina Gunarathne, BingJing
Zang, Tak‐Lon Wu and Judy Qiu, (UCC 2011) , Melbourne, Australia.
![Page 19: Judy Qiu - Unical › hpc2012 › pdfs › qiu.pdf · Zang, Tak‐Lon Wu and Judy Qiu, (UCC 2011) , Melbourne, Australia. Performance of Pleasingly Parallel Applications ... Improving](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0f63f17e708231d443ec7d/html5/thumbnails/19.jpg)
Performance of Pleasingly Parallel Applications Performance of Pleasingly Parallel Applications on Azureon Azure
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
80.00%
90.00%
100.00%
128 228 328 428 528 628 728
Parallel Efficiency
Number of Query Files
Twister4Azure
Hadoop‐Blast
DryadLINQ‐Blast
BLAST Sequence Search
50%55%60%65%70%75%80%85%90%95%
100%
Parallel Efficie
ncy
Num. of Cores * Num. of Files
Twister4Azure
Amazon EMR
Apache Hadoop
Cap3 Sequence Assembly
Smith Watermann Sequence Alignment
MapReduce in the Clouds for Science, Thilina Gunarathne, et al. CloudCom 2010, Indianapolis, IN.
![Page 20: Judy Qiu - Unical › hpc2012 › pdfs › qiu.pdf · Zang, Tak‐Lon Wu and Judy Qiu, (UCC 2011) , Melbourne, Australia. Performance of Pleasingly Parallel Applications ... Improving](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0f63f17e708231d443ec7d/html5/thumbnails/20.jpg)
Number of Executing Map Task Histogram
Strong Scaling with 128M Data PointsWeak Scaling
Task Execution Time Histogram
First iteration performs the
initial data fetch
First iteration performs the
initial data fetch
Overhead between iterationsOverhead between iterations
Scales better than Hadoop on
bare metal
Scales better than Hadoop on
bare metal
![Page 21: Judy Qiu - Unical › hpc2012 › pdfs › qiu.pdf · Zang, Tak‐Lon Wu and Judy Qiu, (UCC 2011) , Melbourne, Australia. Performance of Pleasingly Parallel Applications ... Improving](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0f63f17e708231d443ec7d/html5/thumbnails/21.jpg)
Weak Scaling Data Size Scaling
Performance adjusted for
sequential performance difference
Performance adjusted for
sequential performance difference
X: Calculate invV
(BX)MapMap Reduc
e
Reduc
e
Merg
e
Merg
e
BC: Calculate BX
MapMap Reduc
e
Reduc
e
Merg
e
Merg
e
Calculate Stress
MapMap Reduc
e
Reduc
e
Merg
e
Merg
e
New Iteration
Scalable Parallel Scientific Computing Using Twister4Azure. Thilina Gunarathne, BingJing Zang, Tak‐Lon Wu and Judy Qiu.
Submitted to Journal of Future Generation Computer Systems. (Invited as one of the best 6 papers of UCC 2011)
![Page 22: Judy Qiu - Unical › hpc2012 › pdfs › qiu.pdf · Zang, Tak‐Lon Wu and Judy Qiu, (UCC 2011) , Melbourne, Australia. Performance of Pleasingly Parallel Applications ... Improving](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0f63f17e708231d443ec7d/html5/thumbnails/22.jpg)
Parallel Data Analysis using Twister
• MDS (Multi Dimensional Scaling)• Clustering (Kmeans) • SVM (Scalable Vector Machine)• Indexing
Xiaoming Gao, Vaibhav Nachankar and Judy Qiu, Experimenting Lucene Index on HBase in an HPC Environment,
position paper in the proceedings of ACM High Performance
Computing meets Databases workshop (HPCDB'11) at SuperComputing 11, December 6, 2011
![Page 23: Judy Qiu - Unical › hpc2012 › pdfs › qiu.pdf · Zang, Tak‐Lon Wu and Judy Qiu, (UCC 2011) , Melbourne, Australia. Performance of Pleasingly Parallel Applications ... Improving](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0f63f17e708231d443ec7d/html5/thumbnails/23.jpg)
MDS projection of 100,000 protein sequences showing a few experimentally identified clusters in preliminary work with Seattle Children’s Research Institute
Application #1
![Page 24: Judy Qiu - Unical › hpc2012 › pdfs › qiu.pdf · Zang, Tak‐Lon Wu and Judy Qiu, (UCC 2011) , Melbourne, Australia. Performance of Pleasingly Parallel Applications ... Improving](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0f63f17e708231d443ec7d/html5/thumbnails/24.jpg)
Data Intensive Kmeans Clustering─
Image Classification: 1.5 TB; 500 features per image;10k clusters
1000 Map tasks; 1GB data transfer per Map task
Application #2
![Page 25: Judy Qiu - Unical › hpc2012 › pdfs › qiu.pdf · Zang, Tak‐Lon Wu and Judy Qiu, (UCC 2011) , Melbourne, Australia. Performance of Pleasingly Parallel Applications ... Improving](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0f63f17e708231d443ec7d/html5/thumbnails/25.jpg)
Broadcasting
Data could be large
Chain & MST
Map Collectives
Local merge
Reduce Collectives
Collect but no merge
Combine
Direct download or
Gather
Map Tasks Map Tasks
Map Collective
Reduce Tasks
Reduce
Collective
Gather
Map Collective
Reduce Tasks
Reduce
Collective
Map Tasks
Map Collective
Reduce Tasks
Reduce
Collective
BroadcastTwister Communications
![Page 26: Judy Qiu - Unical › hpc2012 › pdfs › qiu.pdf · Zang, Tak‐Lon Wu and Judy Qiu, (UCC 2011) , Melbourne, Australia. Performance of Pleasingly Parallel Applications ... Improving](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0f63f17e708231d443ec7d/html5/thumbnails/26.jpg)
Improving Performance of Map Collectives
Scatter and AllgatherFull Mesh Broker Network
![Page 27: Judy Qiu - Unical › hpc2012 › pdfs › qiu.pdf · Zang, Tak‐Lon Wu and Judy Qiu, (UCC 2011) , Melbourne, Australia. Performance of Pleasingly Parallel Applications ... Improving](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0f63f17e708231d443ec7d/html5/thumbnails/27.jpg)
Polymorphic Scatter‐Allgather in Twister
![Page 28: Judy Qiu - Unical › hpc2012 › pdfs › qiu.pdf · Zang, Tak‐Lon Wu and Judy Qiu, (UCC 2011) , Melbourne, Australia. Performance of Pleasingly Parallel Applications ... Improving](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0f63f17e708231d443ec7d/html5/thumbnails/28.jpg)
Twister Performance on Kmeans Clustering
![Page 29: Judy Qiu - Unical › hpc2012 › pdfs › qiu.pdf · Zang, Tak‐Lon Wu and Judy Qiu, (UCC 2011) , Melbourne, Australia. Performance of Pleasingly Parallel Applications ... Improving](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0f63f17e708231d443ec7d/html5/thumbnails/29.jpg)
Twister on InfiniBand
• InfiniBand successes in HPC community– More than 42% of Top500 clusters use InfiniBand– Extremely high throughput and low latency
• Up to 40Gb/s between servers and 1μsec latency– Reduce CPU overhead up to 90%
• Cloud community can benefit from InfiniBand– Accelerated Hadoop (sc11)– HDFS benchmark tests
• RDMA can make Twister faster– Accelerate static data distribution– Accelerate data shuffling between mappers and reducer
• In collaboration with ORNL on a large InfiniBand cluster
![Page 30: Judy Qiu - Unical › hpc2012 › pdfs › qiu.pdf · Zang, Tak‐Lon Wu and Judy Qiu, (UCC 2011) , Melbourne, Australia. Performance of Pleasingly Parallel Applications ... Improving](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0f63f17e708231d443ec7d/html5/thumbnails/30.jpg)
Using RDMA for Twister on InfiniBand
![Page 31: Judy Qiu - Unical › hpc2012 › pdfs › qiu.pdf · Zang, Tak‐Lon Wu and Judy Qiu, (UCC 2011) , Melbourne, Australia. Performance of Pleasingly Parallel Applications ... Improving](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0f63f17e708231d443ec7d/html5/thumbnails/31.jpg)
Twister Broadcast Comparison: Ethernet vs. InfiniBand
![Page 32: Judy Qiu - Unical › hpc2012 › pdfs › qiu.pdf · Zang, Tak‐Lon Wu and Judy Qiu, (UCC 2011) , Melbourne, Australia. Performance of Pleasingly Parallel Applications ... Improving](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0f63f17e708231d443ec7d/html5/thumbnails/32.jpg)
32
Building Virtual Clusters Towards Reproducible eScience in the Cloud
Separation of concerns between two layers•Infrastructure Layer
– interactions with the Cloud API
•Software Layer
– interactions with the running VM
![Page 33: Judy Qiu - Unical › hpc2012 › pdfs › qiu.pdf · Zang, Tak‐Lon Wu and Judy Qiu, (UCC 2011) , Melbourne, Australia. Performance of Pleasingly Parallel Applications ... Improving](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0f63f17e708231d443ec7d/html5/thumbnails/33.jpg)
33
Separation Leads to ReuseInfrastructure Layer = (*)
Software Layer = (#)
By separating layers, one can reuse software layer artifacts in separate clouds
![Page 34: Judy Qiu - Unical › hpc2012 › pdfs › qiu.pdf · Zang, Tak‐Lon Wu and Judy Qiu, (UCC 2011) , Melbourne, Australia. Performance of Pleasingly Parallel Applications ... Improving](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0f63f17e708231d443ec7d/html5/thumbnails/34.jpg)
34
Design and Implementation
Equivalent machine images (MI) built in separate clouds•Common underpinning in separate clouds for software
installations and configurations
• Configuration management used for software automation
Extend to Azure
![Page 35: Judy Qiu - Unical › hpc2012 › pdfs › qiu.pdf · Zang, Tak‐Lon Wu and Judy Qiu, (UCC 2011) , Melbourne, Australia. Performance of Pleasingly Parallel Applications ... Improving](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0f63f17e708231d443ec7d/html5/thumbnails/35.jpg)
35
Cloud Image Proliferation
![Page 36: Judy Qiu - Unical › hpc2012 › pdfs › qiu.pdf · Zang, Tak‐Lon Wu and Judy Qiu, (UCC 2011) , Melbourne, Australia. Performance of Pleasingly Parallel Applications ... Improving](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0f63f17e708231d443ec7d/html5/thumbnails/36.jpg)
Changes of Hadoop Versions
![Page 37: Judy Qiu - Unical › hpc2012 › pdfs › qiu.pdf · Zang, Tak‐Lon Wu and Judy Qiu, (UCC 2011) , Melbourne, Australia. Performance of Pleasingly Parallel Applications ... Improving](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0f63f17e708231d443ec7d/html5/thumbnails/37.jpg)
37
Implementation ‐ Hadoop Cluster
Hadoop cluster commands•knife hadoop launch {name} {slave count}•knife hadoop terminate {name}
![Page 38: Judy Qiu - Unical › hpc2012 › pdfs › qiu.pdf · Zang, Tak‐Lon Wu and Judy Qiu, (UCC 2011) , Melbourne, Australia. Performance of Pleasingly Parallel Applications ... Improving](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0f63f17e708231d443ec7d/html5/thumbnails/38.jpg)
38
Running CloudBurst on Hadoop
Running CloudBurst on a 10 node Hadoop Cluster•knife hadoop launch cloudburst 9•echo ‘{"run list": "recipe[cloudburst]"}' > cloudburst.json•chef-client -j cloudburst.json
CloudBurst on a 10, 20, and 50 node Hadoop Cluster
![Page 39: Judy Qiu - Unical › hpc2012 › pdfs › qiu.pdf · Zang, Tak‐Lon Wu and Judy Qiu, (UCC 2011) , Melbourne, Australia. Performance of Pleasingly Parallel Applications ... Improving](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0f63f17e708231d443ec7d/html5/thumbnails/39.jpg)
![Page 40: Judy Qiu - Unical › hpc2012 › pdfs › qiu.pdf · Zang, Tak‐Lon Wu and Judy Qiu, (UCC 2011) , Melbourne, Australia. Performance of Pleasingly Parallel Applications ... Improving](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0f63f17e708231d443ec7d/html5/thumbnails/40.jpg)
![Page 41: Judy Qiu - Unical › hpc2012 › pdfs › qiu.pdf · Zang, Tak‐Lon Wu and Judy Qiu, (UCC 2011) , Melbourne, Australia. Performance of Pleasingly Parallel Applications ... Improving](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0f63f17e708231d443ec7d/html5/thumbnails/41.jpg)
Applications & Different Interconnection PatternsMap Only Classic
MapReduceIterative MapReduce
TwisterLoosely
Synchronous
CAP3
AnalysisDocument conversion
(PDF ‐> HTML)Brute force searches in
cryptographyParametric sweeps
High Energy Physics
(HEP) HistogramsSWG
gene alignmentDistributed searchDistributed sortingInformation retrieval
Expectation
maximization
algorithmsClusteringLinear Algebra
Many MPI scientific
applications utilizing
wide variety of
communication
constructs including
local interactions
‐
CAP3 Gene Assembly
‐
PolarGrid Matlab data
analysis
‐
Information Retrieval ‐
HEP Data Analysis‐
Calculation of Pairwise
Distances for ALU
Sequences
‐ Kmeans ‐
Deterministic
Annealing Clustering‐
Multidimensional
Scaling MDS
‐
Solving Differential
Equations and ‐
particle dynamics
with short range forces
Input
Output
map
Inputmap
reduce
Inputmap
reduce
iterationsiterations
Pij
Domain of MapReduce and Iterative Extensions MPI
![Page 42: Judy Qiu - Unical › hpc2012 › pdfs › qiu.pdf · Zang, Tak‐Lon Wu and Judy Qiu, (UCC 2011) , Melbourne, Australia. Performance of Pleasingly Parallel Applications ... Improving](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0f63f17e708231d443ec7d/html5/thumbnails/42.jpg)
![Page 43: Judy Qiu - Unical › hpc2012 › pdfs › qiu.pdf · Zang, Tak‐Lon Wu and Judy Qiu, (UCC 2011) , Melbourne, Australia. Performance of Pleasingly Parallel Applications ... Improving](https://reader034.vdocuments.net/reader034/viewer/2022042400/5f0f63f17e708231d443ec7d/html5/thumbnails/43.jpg)
SALSA HPC Group http://salsahpc.indiana.edu
School of Informatics and ComputingIndiana University