guang r. gao founder et international inc newark, delaware usa ggao@etinternational
DESCRIPTION
HPC User Forum 2012 Panel on Potential Disruptive Technologies Emerging Parallel Programming Approaches. Guang R. Gao Founder ET International Inc Newark, Delaware USA [email protected]. Who is ETI ?. From “Cool Vendors” Report – By Gartner ( April 17,2012 ): [ - PowerPoint PPT PresentationTRANSCRIPT
Copyright 2012 ET International, Inc.
ET
Inte
rnat
ion
al
IDC-Panel 04-2012 1
HPC User Forum 2012 Panel on Potential Disruptive Technologies
Emerging Parallel Programming Approaches
Guang R. GaoFounder
ET International Inc
Newark, Delaware
USA
Copyright 2012 ET International, Inc.
ET
Inte
rnat
ion
al
IDC-Panel 04-2012 2
Who is ETI ?
From “Cool Vendors” Report – By Gartner (April 17,2012):
[ET InternationalNewark, Delaware (www.etinternational.com)
Analysis by Carl Claunch
Why Cool: ET International delivers its dataflow-oriented ETI Swarm environment for garnering high efficiency from highly parallel software, based on the alternative ParalleX execution model. As highly parallel execution becomes essential to addressing the more substantial computing tasks that HPC users face today, progress is increasingly being stymied by the application's inability to keep all the parallel strands working productively.…]
Copyright 2012 ET International, Inc.
ET
Inte
rnat
ion
al
IDC-Panel 04-2012 3
• Many-core is coming Current paradigms don't have the expressive power to harness
concurrency
• Hardware is getting more heterogeneous Current hybrid programming techniques (OpenMP+MPI+OpenCL) are
not maintainable: too complicated
• Caches are disappearing or becoming non-coherent Distributed memory is everywhere, and at different levels
• Fine grained power management Use what you need and turn off/down the rest
• Failure is the norm Resilience must be baked in the whole stack (application, compiler,
runtime, hardware)
• Increasing Application Computation/data Irregularity Static scheduling can no longer properly load balance
Motivation
Copyright 2012 ET International, Inc.
ET
Inte
rnat
ion
al
IDC-Panel 04-2012 4
ETI Vision
• We need new “Execution Models”!• Leverage ETI’s deep and growing IP position based on 25+ years of
applied R&D expertise and $20M+ in R&D software engineering and development (e.g. extensive system software base for Cyclops, CELL, SCC, Intel Runnemede,
Intel X86 based machines, Adapteva, etc)
• Provide high-performance SWARM software solutions to our OEM’s, partners and direct customers
• Advance SWARM solutions to address optimization opportunities driven by heterogeneous multi-/many- core processing including:
Big Compute (Private HPC Cloud) systems
Big Data HPC systems
HPC embedded appliances
etc
Copyright 2012 ET International, Inc.
ET
Inte
rnat
ion
al
IDC-Panel 04-2012 5
MPI, OpenMP, OpenCL SWARM
Asynchronous Event-Driven Tasks Dependencies Resources Active Messages Control Migration
Communicating Sequential Processes Bulk Synchronous Message Passing
Tim
e Tim
e
Active threads
Waiting
Execution Paradigm Comparisons
Copyright 2012 ET International, Inc.
ET
Inte
rnat
ion
al
IDC-Panel 04-2012 6
Tasks mapped to resources
CPU CPU CPU CPU
CPU CPU CPU
CPU CPU CPU
GPU
GPU
Enabled Tasks Tasks with Unsatisfied Dependencies
Dependencies
satisfied
Resources in Use
CPU
GPU
SWARM
Resources allocated
Tasks enabled
Available Resources
Resources released
SWARM Execution Overview
Copyright 2012 ET International, Inc.
ET
Inte
rnat
ion
al
IDC-Panel 04-2012 7
Case Studies of Fine-Gran Execution Models
• Static Dataflow Model (1970s - )• EARTH Model (1988 - )• TNT Model and Cyclops-64 (2003 - )• Codelet Model under
Intel-led DARPA/UHPC
04/19/2023 FT-06-09-2011-Gao 7
Copyright 2012 ET International, Inc.
ET
Inte
rnat
ion
al
IDC-Panel 04-2012 8
ET International, Inc.
CPU
Memory
ExecutionModel
Productivity
Resiliency
InterconnectFabric
HW/SWCo-Design
Event driven codeletsSelf-aware introspectionCode and data motion
Model-basedGoal-orientedSelf-morphing
1000X energy reductionOverhauled DRAM mArchResilient memory
<10% overheadCheckpoint with Flash/CPM
Security Through Sandboxing
Heterogeneous & taperedLarge local memory
1000X Energy reductionHeterogeneous, Tightly-CoupledSimple Architecture
Application Efficiency
System Management & Concurrency
Data Movement
Assured Operation
DARPA/Intel Runnemede Program
University of Illinois
Our Collaborators
Courtesy of The Intel DARPA UHPC Team
Copyright 2012 ET International, Inc.
ET
Inte
rnat
ion
al
IDC-Panel 04-2012 9
Progress & Proof Points To-Date
Copyright 2012 ET International, Inc.
ET
Inte
rnat
ion
al
IDC-Panel 04-2012 10
Barnes-HutSWARM vs OpenMP
Barnes-Hut
1 2 3 4 5 6 7 8 9 10 11 120
1
2
3
4
5
6
7
8
9
10
11
12
Number of Threads
Speedup o
ver
Seri
al
Ideal
SWARM
OpenMP
Barnes-Hut SWARM vs OpenMP
Copyright 2012 ET International, Inc.
ET
Inte
rnat
ion
al
IDC-Panel 04-2012 11
SWARM/MPI Performance Comparison
4 8 16 32 64 128 256 5120%
100%200%300%400%500%600%700%800%900%
1000%1100%1200%1300%1400%1500%
Lonestar
Redsky
Endeavor
Jaguar
Number of Nodes
SW
AR
M S
peed
up
MPI
Consistent Speed-up from 2X to 14.5X
Copyright 2012 ET International, Inc.
ET
Inte
rnat
ion
al
IDC-Panel 04-2012 12
Cholesky Decomposition (SWARM vs MKL/ScaLAPACK)
2 4 8 16 32 640
10
20
30
40
50
60
70
80
90
100Choelsky %Peak
MKL/ScaLAPACK % PeakSWARM %Peak
# Nodes
% P
ea
k
1 2 4 8 16 32 64 128 256 512 102410
100
1000
10000
100000 Choelsky Weak ScalingSWARMMKL_GFLOPSIdeal
# Cores
GF
LO
PS
Cholesky Decomposition (SWARM vs MKL/ScaLAPACK
Copyright 2012 ET International, Inc.
ET
Inte
rnat
ion
al
IDC-Panel 04-2012 13
Summary and Acknowledgements
• Summary (productivity observation)N-Body: 1 man-day, 3XG-500: 1 man-month, upto 14xCholesky: 2 man-week, 1.5x
NOTE: the base is performance of optimized code
• AcknowledgementsOur SponsorsOur Collaborators and ColleaguesMy HostOthers
.
Copyright 2012 ET International, Inc.
ET
Inte
rnat
ion
al
IDC-Panel 04-2012 14
Cholesky Profiles
SWARM
OpenMP