bridging the assessment modeling gap the perfect...

11
August 8, 2014 1 Bridging the Assessment Modeling Gap the PERFECT Way KEVIN J BARKER, DARREN J KERBYSON, ADOLFY HOISIE, ANDRES MARQUEZ, JOSEPH MANZANO, ROBERTO GIOIOSA, ANTONINO TUMEO, NATHAN TALLENT, GOKCEN KESTOR, LEON SONG Pacific Northwest National Laboratory Modeling and Simulation Workshop, Seattle 2014

Upload: others

Post on 14-Jul-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Bridging the Assessment Modeling Gap the PERFECT Wayhpc.pnl.gov/modsim/2014/Presentations/Barker.pdfSynthetic Aperture Radar (SAR) 2. Wide Area Motion Imaging (WAMI) 3. Space Time

August 8, 2014 1

Bridging the Assessment Modeling Gap the PERFECT Way KEVIN J BARKER, DARREN J KERBYSON, ADOLFY HOISIE, ANDRES MARQUEZ, JOSEPH MANZANO, ROBERTO GIOIOSA, ANTONINO TUMEO, NATHAN TALLENT, GOKCEN KESTOR, LEON SONG

Pacific Northwest National Laboratory Modeling and Simulation Workshop, Seattle 2014

Page 2: Bridging the Assessment Modeling Gap the PERFECT Wayhpc.pnl.gov/modsim/2014/Presentations/Barker.pdfSynthetic Aperture Radar (SAR) 2. Wide Area Motion Imaging (WAMI) 3. Space Time

Test and Validation (TAV) in DARPA PERFECT

August 10, 2014 2

!   DARPA’s PERFECT (Power Efficiency Revolution for Embedded Computing Technologies) !   Spearheading R&D into a multitude of diverse technologies !   75 Gflops/W for general-purpose embedding computing !   Envisioning 7nm technologies in 2018 – 2020 timeframe

!   Program split across 3 phases, beginning at the end of 2012

!   17 initial Performer teams !   Device technology, Architecture, Systems, Software, and Optimization

!   Test and Validation (TAV) team !   Quantitatively assess PERFECT technologies individually with respect to

the overall system & overall performance and power goals !   Use a combination of benchmarking, modeling and simulation

Page 3: Bridging the Assessment Modeling Gap the PERFECT Wayhpc.pnl.gov/modsim/2014/Presentations/Barker.pdfSynthetic Aperture Radar (SAR) 2. Wide Area Motion Imaging (WAMI) 3. Space Time

PERFECT Assessment: Challenges and Strategy

August 8, 2014 3

!   PERFECT Challenge !   Goal: Embedded system delivering “75

GFLOPS/W” !   Performers contribute only part of a

system (architecture to algorithms) !   TAV must assess Performer’s

contribution w.r.t. entire system

!   Three pillars to assessment strategy !   Baseline Architectures: quantifying

today’s state of the art !   PERFECT Suite: defining a workload !   Proxy Architecture: modeling

framework

NTV  (PVT)      MUGFETS  

Memory  Cells  

Compute  Cores    General   Accelerators  

Mem

ory  

 EC

C  

Register  Files    

ALUs    

Processing  Stages    Pipelining   Caches  

Concurrency  Management    

Power  M

anagement  

 

CommunicaCon  Avoidance    

Locality  

 

Resilience    

Uncertainty    

V  /F  GaC

ng/Scalin

g  

Overfetch  

Throughput  

Models    

Simulator  

 

Benchmarks    

Benchmarks    

Emulator  

 

 xRA

M  

SEE  Hardening  

Metrics    

3D  Stacking  

Signaling  

Models    

Emulator  

 Emulator  

 

Models    

Simulator  

  Simulator  

 

Benchmarks    

Legend  Circuit  Design  

Circuit  Modeling  &  Simula5on  

Architecture  Modeling  &  Sim  

System  So9ware  

Architecture  Design  

Algorithm  Modeling  &  Sim  

Algorithms  

Page 4: Bridging the Assessment Modeling Gap the PERFECT Wayhpc.pnl.gov/modsim/2014/Presentations/Barker.pdfSynthetic Aperture Radar (SAR) 2. Wide Area Motion Imaging (WAMI) 3. Space Time

Modeling and Simulation Challenges

August 11, 2014 4

!   Integrating Performance and Power Prediction !   Overall PERFECT goals stipulate efficiency !   PERFECT TAV effort has defined methodologies for Performance and

Power prediction thrusts !   Defining a suitable interface with Performer teams

!   What information is required to parameterize the models? !   What is the appropriate level of architectural abstraction?

!   Although not the goal of PERFECT, how can these methodologies be extended to large-scale systems? !   Potential for combining with existing scalability modeling methodologies

with PERFECT TAV tools

Page 5: Bridging the Assessment Modeling Gap the PERFECT Wayhpc.pnl.gov/modsim/2014/Presentations/Barker.pdfSynthetic Aperture Radar (SAR) 2. Wide Area Motion Imaging (WAMI) 3. Space Time

TAV Overall Approach: 3 Pillars

August 10, 2014 5

Performance  Analysis  

Proxy  Archite

cture  Fram

ework  

Architecture  Probe  

Performance  Kernel  Trace  

Technology  Model  Circuit  Model  

     Architecture  Model  

Tran

slat

or  

Power  Analysis  

Proxy  Architecture  

InterpolaCon  1  InterpolaCon  2  Back  ProjecCon  

System  Solver  Inner  Product  Outer  Product  

Image  RegistraCon  

Change  DetecCon  Debayer  

DWT  Hist.  EqualizaCon  

2D  ConvoluCon  

Architectural  Probes  

SAR  

WAMI  

STAP  

PFAP-­‐1  

SORT

 2D

   FFT  

1D    FFT  

Benchmark  Suite  

Performer  Technology  Developments  

Evalua

Con  of  Future  Im

pact    

nCore  BrownDwarf    Y-­‐Class  TI  Keystone  II  

NVIDIA  Kayla  

IBM  POWER7  

Intel  x86  Haswell  

Baseline  Architectures  

Modeling  and  Simula5on  focus  

Page 6: Bridging the Assessment Modeling Gap the PERFECT Wayhpc.pnl.gov/modsim/2014/Presentations/Barker.pdfSynthetic Aperture Radar (SAR) 2. Wide Area Motion Imaging (WAMI) 3. Space Time

PERFECT TAV Pillar 1: Baseline Architectures !   Architectures that reflect state-of-

the-art systems in Phase 1 !   Real world data points for power/

performance model validation !   Calibration of Performer’s modeling

and simulation environments !   Validation of TAV-developed

models on existing platforms

Pla^orm   #  Cores  (Threads)  

Peak  Perf  

(Gflops)  

Clock  (GHz)  

Peak  Power  (Waas)  

Gflops    per  Waa  

Memory  (GB)  

nCore  BD-­‐Y  TI  Keystone  II  

16+96  (ARM  +  DSP)  

614.4  (SP)  

1.2  +  1.4   36-­‐56   17.1  –  

11  (SP)   56  

Nvidia  Kayla   4+2(384)  (ARM+SMX)  

300  (SP)   1.2  /  1.05   27   11.1   2/1  

IBM  Power7   8(32)   265  (DP)   4.2   240   1.1   16  

Intel  X86  Haswell   4(8)   295  (SP)   2.3   45   6.5   16  

!   Power Instrumentation: !   Level 1: Watts-up power meters !   Level 2: Internal architecture-supplied counters (e.g., RAPL, Ameester, etc.) !   Level 3: High-fidelity DAQ

!   Baseline Architectures in place in EHPC lab at PNNL and are being used

nCore   Kayla   Power7   Haswell  

Page 7: Bridging the Assessment Modeling Gap the PERFECT Wayhpc.pnl.gov/modsim/2014/Presentations/Barker.pdfSynthetic Aperture Radar (SAR) 2. Wide Area Motion Imaging (WAMI) 3. Space Time

PERFECT TAV Pillar 2: PERFECT Suite

August 10, 2014 7

!   Kernels and applications that represent application domains

!   Benchmark-specific models will be validated on Baseline Architectures and used to predict performance and power consumption of Performer architectures

!   Selection criteria !   PERFECT’s domain of interest !   Alignment of app/kernels to Performer’s projects !   Reasonable input data set sizes selected

1.  Synthetic Aperture Radar (SAR) 2.  Wide Area Motion Imaging (WAMI) 3.  Space Time Adaptive Processing (STAP) 4.  PERFECT APPLICATION 1 (PFAP-1) 5.  3 “Core” Kernels (Sort, 1D and 2D FFTs)

!   All serial and CUDA reference kernels available !   http://hpc.pnnl.gov/projects/PERFECT

InterpolaCon  1  InterpolaCon  2  Back  ProjecCon  

System  Solver  Inner  Product  Outer  Product  

Image  RegistraCon  

Change  DetecCon  Debayer  

DWT  Hist.  EqualizaCon  

2D  ConvoluCon  

Architectural  Probes  

SAR  

WAMI  

STAP  

PFAP-­‐1  

SORT

 2D

   FFT  

1D    FFT  

Benchmark  Suite  

Page 8: Bridging the Assessment Modeling Gap the PERFECT Wayhpc.pnl.gov/modsim/2014/Presentations/Barker.pdfSynthetic Aperture Radar (SAR) 2. Wide Area Motion Imaging (WAMI) 3. Space Time

PERFECT TAV Pillar 3: Proxy Architecture Modeling Framework

8

!   An integrated approach to assist in the assessment of component technologies in the context of a complete architecture

!   The Proxy Architecture is a mechanism that accommodates solutions from diverse technology research (architectures to algorithms)

Two thrusts: !   Performance Analysis

!   Benchmark-specific models parameterized in terms of architecture performance capabilities to derive predicted performance ranges

!   Solution providers define architectural operations, latencies, and throughputs !   Power Analysis

!   Utilizes McPAT, an open-source Power, Area, and Timing modeling framework !   Input from Performer teams to define technology, circuit, and architecture

parameters

Performance  Analysis  

Proxy  Archite

cture  Fram

ework  

Architecture  Probe  

Performance  Kernel  Trace  

Technology  Model  Circuit  Model  

     Architecture  Model  

Tran

slat

or  

Power  Analysis  

Proxy  Architecture  

Page 9: Bridging the Assessment Modeling Gap the PERFECT Wayhpc.pnl.gov/modsim/2014/Presentations/Barker.pdfSynthetic Aperture Radar (SAR) 2. Wide Area Motion Imaging (WAMI) 3. Space Time

Current Status of PNNL TAV Activities

August 10, 2014 9

!   Baseline Architectures installed in the EHPC lab at PNNL and are being remotely accessed by Performers

!   Benchmark Suite in use by Performers and available to the community !   Kernels available now; full scale applications available shortly

!   Calibration of Performer’s simulation environments in progress !   Calibration against Baseline Architectures !   Provides confidence in each simulation environment

!   Proxy Architecture !   Power thrust: P-McPAT Maintenance and Infrastructure

!   TAV P-McPAT Architectural Validation !   Performance thrust: Internal validation of modeling methodology utilizing

kernels from the TAV Benchmark Suite and Baseline Architectures !   Initial decomposition of selected kernel into low-level operations !   Architectural micro-kernel framework to measure latencies and throughputs

Page 10: Bridging the Assessment Modeling Gap the PERFECT Wayhpc.pnl.gov/modsim/2014/Presentations/Barker.pdfSynthetic Aperture Radar (SAR) 2. Wide Area Motion Imaging (WAMI) 3. Space Time

PERFECT TAV Summary

August 10, 2014 10

!   Key challenge being tackled by TAV in DARPA PERFECT is: !   To quantitatively assess PERFECT technologies individually and with

respect to the programs overall performance and power goals !   Approach is to use a combination of benchmarking, modeling, and

simulation in 3 pillars: !   Baseline Architectures !   Benchmark Suite !   Proxy Architecture

!   Modeling and simulation work is in the Proxy Architecture pillar, and is the primary focus of current research

!   Approach is not restricted to DARPA PERFECT, but is generally applicable for assessing the potential of future technologies !   Unified approach capturing performance and power impacts !   Current effort on integrating resilience modeling as well

Page 11: Bridging the Assessment Modeling Gap the PERFECT Wayhpc.pnl.gov/modsim/2014/Presentations/Barker.pdfSynthetic Aperture Radar (SAR) 2. Wide Area Motion Imaging (WAMI) 3. Space Time

Gaps and Opportunities

August 10, 2014 11

!   PERFECT TAV strategy defines a consistent evaluation methodology !   Targeting diverse embedded computing technologies and architectures !   However, we are not limited to embedded systems; strategy can be

applied to systems across scales

!   What are the gaps? !   Architectural specification and benchmarking (e.g., metrics) !   Third component of PPR – Resilience

!   What are the opportunities? !   Opportunity for large-scale modeling tool kit from “first principles” !   Need well-defined interfaces between modeling layers

!   “Bag of tools” approach will allow different capabilities to be “plugged in” !   Modeling tools selected based on desired levels of abstraction !   Encourage interaction between researchers, designers, and vendors