best practices: large scale multiphysics

28
Best Prac*ces: Large Scale Mul*physics Frank Ham & Sanjeeb Bose, Cascade Technologies L Shunn, G Bres, D Philips, M Emory, A Saghafian, D Kim, B Hejazi, P Quang [Cascade] Acknowledging financial, technical, or computaMonal support from: NASA, Navair, Dept of Energy Office of Science, Oak Ridge NaMonal Labs, Argonne NaMonal Labs, ERDC (DoD), GE

Upload: insidehpc

Post on 11-Feb-2017

32 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Best Practices: Large Scale Multiphysics

Best  Prac*ces:  Large  Scale  Mul*physics  Frank  Ham  &  Sanjeeb  Bose,  Cascade  Technologies  L  Shunn,  G  Bres,  D  Philips,  M  Emory,  A  Saghafian,  D  Kim,  B  Hejazi,  P  Quang  [Cascade]      

1  

Acknowledging  financial,  technical,  or  computaMonal  support  from:  NASA,  Navair,  Dept  of  Energy  Office  of  Science,  Oak  Ridge  NaMonal  Labs,  Argonne  NaMonal  Labs,  ERDC  (DoD),  GE  

Page 2: Best Practices: Large Scale Multiphysics

2  

•  In  2014,  GE  observed  combus*on  dynamics  during  a  full-­‐scale  test  of  a  gas  turbine •  The  par*cular  frequency  had  not  been  predicted  by  lower-­‐fidelity  models,  and  had  not  been  observed  in  single  can  combustor  tes*ng • Amplitudes  were  not  a  concern,  and  could  be  managed  with  fueling  adjustments,  but  important  ques*ons  remained: • What  caused  the  observed  dynamics? • Would  it  manifest  in  a  new  engine  configura*on  with  new  technology  scheduled  for  tes*ng  in  late  2015?

Mo*va*on:  A  mul*physics  problem  Gas  Turbine  Self-­‐Excited  Dynamics  (SED)

Page 3: Best Practices: Large Scale Multiphysics

2015  Press  Release:  Super  Collabora*on    GE  and  Cascade  Apply  New  Technologies  and  Super-­‐Computer  Capabili*es  to  Improve  Gas  Turbine  Combus*on

3  

“The  enhanced  simula*on  and  visualiza*on  capabili*es  enabled  by  our  collabora*on  with  Cascade  can  help  us  deliver  even  higher  efficiency  and  lower  emissions  in  the  next  genera*on  of  gas  turbines,”   John  Lammas,  vice  president,  power  genera*on  engineering  at  GE  Power  and  Water

Source:  GE  Press  Release  June  22,  2015

Page 4: Best Practices: Large Scale Multiphysics

4  

The  *meline:    Simula*ng  Gas  Turbine  Self-­‐Excited  Dynamics  (SED)

2014  full  speed  full  load  test  uncovers  unexpected  instability  

2015  test  planned  for  new  engine  configura*on  including  new  technology

2015 2016

Joint  Development  Agreement  announced  between  GE  and  Cascade  to  develop  simula*on  to  improve  Gas  Turbine  Combus*on      

Page 5: Best Practices: Large Scale Multiphysics

5  

The  *meline:    Simula*ng  Gas  Turbine  Self-­‐Excited  Dynamics  (SED)

2014  full  speed  full  load  test  uncovers  unexpected  instability  

2015  test  planned  for  new  engine  configura*on  including  new  technology

2015 2016

Joint  Development  Agreement  announced  between  GE  and  Cascade  to  develop  simula*on  to  improve  Gas  Turbine  Combus*on      

Pre-­‐test  predic*on  one  month  prior  to  scheduled  full-­‐speed  full  load  

tes*ng

Lots  of  hard  work

Page 6: Best Practices: Large Scale Multiphysics

HPC  Partnerships:  cri*cal  for  success  stories  (and  the  likely  model  for  early  industrial  use  of  exascale)  

6  

-­‐  Owner  of  the  problem/challenge -­‐  Business  urgency  (read  $) -­‐  Deep  domain-­‐specific  knowledge -­‐  Compelling  energy  and  US  

compe**ve  arguments  

-­‐  Owner  of  the  hardware  (Titan) -­‐  Aligned  with  energy  and  US  

compe**ve  argument  

-­‐  Owner  of  the  socware -­‐  HPC  Socware/Mul*physics/

Numerics  development  team  able  to  move  fast  and  touch  all  parts  of  the  simula*on  workflow

Page 7: Best Practices: Large Scale Multiphysics

The  full  story  heps://www.olcf.ornl.gov/2016/05/31/beeer-­‐combus*on-­‐for-­‐power-­‐genera*on/  

7  

Page 8: Best Practices: Large Scale Multiphysics

Revolu*onary  Computa*onal  Aerosciences  5  revolu*ons  required…

•  Efficient  deployment  on  next-­‐genera*on  computer  hardware  and  algorithmic  innova*on •  Complex  geometries,  mesh  genera*on  and  adapta*on  remain  a  boeleneck • Wall  modeling,  and  current  inability  predict  separated  turbulent  flows •  The  need  for  more  than  a  single  predic*on:  sensi*vity  analysis,  design  op*miza*on,  UQ,  and  the  consequent  data  management  issues •  Mul*-­‐physics  modeling  –  push  towards  cons*tu*ve  models  that  reflect  the  underlying  physics  more  closely  than  ever  before

8  

Page 9: Best Practices: Large Scale Multiphysics

Revolu*onary  Computa*onal  Aerosciences  5  revolu*ons  required…

•  Efficient  deployment  on  next-­‐genera*on  computer  hardware  and  algorithmic  innova*on •  Complex  geometries,  mesh  genera*on  and  adapta*on  remain  a  boeleneck • Wall  modeling,  and  current  inability  predict  separated  turbulent  flows •  The  need  for  more  than  a  single  predic*on:  sensi*vity  analysis,  design  op*miza*on,  UQ,  and  the  consequent  data  management  issues •  Mul*-­‐physics  modeling  –  push  towards  cons*tu*ve  models  that  reflect  the  underlying  physics  more  closely  than  ever  before

9  

Page 10: Best Practices: Large Scale Multiphysics

Star*ng  point:  Cascade’s  CharLES  solver  2015

•  Finite  volume  method •  Explicit  RK3  *me  advancement • Compressible  reac*ng  flow •  Flamelet-­‐based  tabula*on  of  EOS • Arbitrary  unstructured  grids  with  parMe*s-­‐based  domain-­‐decomposi*on •  Significant  verifica*on  and  valida*on

10  

Pressure  and  temperature  field  of  a  screeching  supersonic  jet.  Image  from  “Stanford  researchers  break  million-­‐core  supercompuMng  barrier”.

Page 11: Best Practices: Large Scale Multiphysics

Can  we  do  grid  genera*on  on  the  HPC  resource?

• Genera*ng  grids  is  s*ll  a  boeleneck  for  most  solvers  working  with  geometries  of  engineering  relevance • Grids  not  going  away  for  turbulent  flows:  problems  are  non-­‐smooth  and  require  energy  es*mates  to  ensure  stability • Grid  genera*on/op*miza*on  should  ul*mately  be  part  of  the  simula*on  process  (to  minimize  grid-­‐related  only  known  during  the  simula*on)  

• What  about  Voronoi  diagrams?

11  

Page 12: Best Practices: Large Scale Multiphysics

Clipped  Voronoi  Diagrams

•  Key  technology:  Robust  direct  genera*on  of  the  clipped  Voronoi  diagram  inside  an  arbitrary  hull

•  Fully  and  uniquely  defines  the    the  mesh  from  a  set  of  points  (connec*vity,  volumes,  normals)   •  To  modify  the  mesh,  move,  add,  or  remove  points • mesh  changes  are  con*nuous  and  differen*able  with  point  mo*on • Voronoi  diagram  has  compact  support,  and  can  thus  be  essen*ally  “rendered”  from  the  given  point  set  

12  

Page 13: Best Practices: Large Scale Multiphysics

Voronoi  Genera*ng  Points

• HCP  =  hexagonal  close-­‐packed  laqce •  Leads  to  centroidal  Voronoi  meshes  of  self-­‐similar  truncated  octahedra   • User  specifies  regions/windows  and  desired  point  density

• HCP  mesh  inserted  using  fast  scanning  approach •  Similar  to  pixel  “rendering” •  Parallel,  scalable

13  

Op*mal  sphere  packing  is  the  mo*va*on  for  HCP-­‐laqce  based  Voronoi  diagrams

Page 14: Best Practices: Large Scale Multiphysics

Boundary  Recovery  using  Lloyd  Itera*on

•  Lloyd  itera*on  is  a  robust  and  powerful  mesh  smoothing  capability  for  Voronoi  diagrams  that  can  be  used  to: •  Improve  the  mesh  quality  either  locally  or  globally •  Introduce  boundary  alignment

14  

StarMng  mesh:  Voronoi  sites  and  mesh  centroids  very  different  

IteraMon  1:  regenerate  Voronoi  from  centroids  of  previous  mesh    

IteraMon  2:  regenerate  Voronoi  from  centroids  of  previous  mesh    

Page 15: Best Practices: Large Scale Multiphysics

Example  of  a  Voronoi  Mesh  around  an  airfoil

15  

Stretched  Cartesian  mesh  point  distribuMons  near  important  boundaries  

Uniform  point  distribuMons  in  the  far-­‐field:  Cartesian  or  hexagonal  close  packed  (HCP)    

Interface  volumes  and  connecMvity  uniquely  determined  by  Voronoi  diagram:  fully  conformal  

Page 16: Best Practices: Large Scale Multiphysics

CPU-­‐side  solver  op*miza*ons:  1/2

• Restructuring  data  layout  for  increased  cache  efficiency

• Aggressive  overlapping  computa*on/communica*on  (latency  hiding) • Vectorizing  instruc*ons,  aggressive  inlining •  Improved  par**oning  algorithm  (na*ve,  no  longer  ParMe*s)

16  

Page 17: Best Practices: Large Scale Multiphysics

CPU-­‐side  solver  op*miza*ons:  2/2

•  LLVM  bytecode:  load  opera*ons  for  internal  flux  calcula*on  [Dec  2015]

• Reordering  instruc*ons,  algebraic  factoriza*ons,  storing  intermediate  variables  in  internal  flux  rou*nes  to  reduce  repeated  load  instruc*ons

17  

Page 18: Best Practices: Large Scale Multiphysics

Execu*on  *me  breakdown  CPU-­‐op*mized  compressible  premixed  solver

18  

•  The  most  *me-­‐consuming  opera*ons  are  associated  with  tabulated  lookups  (cubic  interpola*on  on  a  Cartesian  grid)  -­‐  move  execu*on  of  these  kernels  to  the  GPU  (cuda  for  Titan’s  K20X  Keplers)  

Page 19: Best Practices: Large Scale Multiphysics

Speedup  from  CPU-­‐side  op*miza*ons  Including  flux  simplifica*ons  associated  with  Voronoi  diagrams

19  

• normalized  *ming:  core-­‐sec/M  cells/step  at  2  levels  of  strong  scaling • Approximately  speedup  of  5X  for  the  CPU  only  execu*on •  slightly  beeer  strong  scaling  due  to  latency  hiding

Page 20: Best Practices: Large Scale Multiphysics

Execu*on  *me  breakdown  CPU-­‐op*mized  +  GPU  port  of  tabula*on

20  

•  The  most  *me-­‐consuming  opera*ons  are  associated  with  tabulated  lookups  (cubic  interpola*on  on  a  Cartesian  grid)  -­‐  move  execu*on  of  these  kernels  to  the  GPU  (cuda  for  Titan’s  K20X  Keplers)  

Page 21: Best Practices: Large Scale Multiphysics

Execu*on  *me  breakdown  CPU-­‐op*mized  +  GPU  port  of  tabula*on

21  

•  The  most  *me-­‐consuming  opera*ons  are  associated  with  tabulated  lookups  (cubic  interpola*on  on  a  Cartesian  grid)  -­‐  move  execu*on  of  these  kernels  to  the  GPU  (cuda  for  Titan’s  K20X  Keplers)  

Page 22: Best Practices: Large Scale Multiphysics

Speedup  from  CPU+GPU  op*miza*ons  GPU  chemistry  tabula*on  

22  

• normalized  *ming:  core-­‐sec/M  cells/step  at  2  levels  of  strong  scaling • Approximately  speedup  of  9X  for  the  CPU+GPU  execu*on

Page 23: Best Practices: Large Scale Multiphysics

Great:  Simula*ons  are  running  fast  Now  how  do  we  look  at  them/learn  from  them?

23  

•  Ability  to  produce  data  exceeding  our  ability  to  move  data  by  5x/7years

Slope:  20x  /  7  years  

Slope:  100x  /  7  years  

History  of    supercomputer  performance History  of  network  bandwidth

Page 24: Best Practices: Large Scale Multiphysics

Solu*on:  Images  +  metadata  

24  Simula*on  result  file  size:  20-­‐200+  GB Image  size:  ~1  MB

(4-­‐5  orders  of  magnitude  smaller)

•  Instrument  solvers  to  directly  write  images  +  metadata •  Images  become  Cartesian  arrays  of  probes • Dimensional  reduc*on  +  lossless  compression  at  source •  Intrinsically  value

Page 25: Best Practices: Large Scale Multiphysics

Contours of streamwise velocity, u/c0 Rej = 418,000, Stsamp = 2, Data size = 50 MB

Treat each pixel in image as a data point (excluding geometry) analogous to PIV analysis

Supported  by  NAVAIR  STTR  project  with  FSU,  under  the  supervision  of  Dr.  John  T.  Spyropoulos.  Computer  Mme  provided  by  DoD  HPC  faciliMes  in  ERDC  DSRC  

Leveraging  the  PNG  standard  Images+metadata  as  Cartesian  arrays  of  probes

Page 26: Best Practices: Large Scale Multiphysics

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

0.001 0.01 0.1 1

Amplitude[relative]

St

separation

Near wall/shear layer

Identify salient flow structures/frequencies directly from images (similar to Schmid 2011, Schmid et al 2012 applied to expt data)

Quan*ta*ve  data  analysis  from  images  Dynamic  mode  decomposi*on

Page 27: Best Practices: Large Scale Multiphysics

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

0.001 0.01 0.1 1

Amplitude[relative]

St

separation

Exit boundary layer/shear layer

Identify salient flow structures/frequencies directly from images (similar to Schmid 2011, Schmid et al 2012 applied to expt data)

Quan*ta*ve  data  analysis  from  images  Dynamic  mode  decomposi*on

Page 28: Best Practices: Large Scale Multiphysics

Summary

•  Fast  development  +  *ght  coupling  with  internal  design  team  AND  hardware  team  allowed  Titan-­‐based  simula*ons  to  contribute  understanding  and  mi*gate  new  technology  risk  for  GE •  Solvers  realized  speedup  of  9X  rela*ve  to  star*ng  point  over  2  quarters •  3-­‐way  partnership  with  physics  socware  developer  in  the  loop  cri*cal  for  this  success •  Speedup  has  allowed  GE  to  bring  many  more  impacyul  simula*ons  in-­‐house  onto  internal  HPC    

28