an adventure in porting: adding gpu-acceleration to open ......title: an adventure in porting:...

34
S4599: An Adventure in Por2ng: Adding GPUAccelera0on to OpenSource 3D Elas0c Wave Modeling Robin M. Weiss and Jeffery Shragge March 26, 2014 GPU Technology Conference

Upload: others

Post on 12-Oct-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: An Adventure in Porting: Adding GPU-Acceleration to Open ......Title: An Adventure in Porting: Adding GPU-Acceleration to Open-Source 3D Elastic Wave Modeling Author: Robin Weiss Subject:

S4599:  An  Adventure  in  Por2ng:    Adding  GPU-­‐Accelera0on  to  Open-­‐Source  3D  

Elas0c  Wave  Modeling  Robin  M.  Weiss  and  Jeffery  Shragge  

 March  26,  2014  

GPU  Technology  Conference    

Page 2: An Adventure in Porting: Adding GPU-Acceleration to Open ......Title: An Adventure in Porting: Adding GPU-Acceleration to Open-Source 3D Elastic Wave Modeling Author: Robin Weiss Subject:

Seismic  Data  Modeling  •  Elas0c  modeling  

–  Anisotropic  and  heterogeneous  materials  

•  Seismic  imaging  with  elas0c  models  –  Increasing  interest  in  this  area  –  Solved  as  an  inverse  problem  –  Requires  accurate,  fast  forward  solvers  

•  Computa0onal  issues  –  Large  model  sizes  –  Generally  requires  a  ton  of  compute  power  

Page 3: An Adventure in Porting: Adding GPU-Acceleration to Open ......Title: An Adventure in Porting: Adding GPU-Acceleration to Open-Source 3D Elastic Wave Modeling Author: Robin Weiss Subject:

Open-­‐Source  3D  Elas0c  Modeling  •  Open-­‐source  computa0onal  geophysics  package  

–  Madagascar  –  www.reproducibility.org  (RSFSRC/user/rweiss)  –  Targets  reproducibility,  transparency,  and  clarity  

•  Madagascar’s  CPU-­‐based  (w/  OpenMP)  EWE  solver  implementa0on:  –  Paul  Sava,  Colorado  School  of  Mines  –  Anisotropic  elas0c  wave  equa0on  –  Stress-­‐S0ffness  formula0on  –  3D  Finite-­‐difference  solver,  centered  deriva0ves,  regular  grid  

•  8th  order  spa0al,  2nd  order  0me    Weiss  and  Shragge,  2013,  Solving  3D  anisotropic  elas2c  wave  equa2ons  on  parallel  GPU  devices,  Geophysics  

Page 4: An Adventure in Porting: Adding GPU-Acceleration to Open ......Title: An Adventure in Porting: Adding GPU-Acceleration to Open-Source 3D Elastic Wave Modeling Author: Robin Weiss Subject:

Finite-­‐Difference  and  GPUs  •  Represents  the  bulk  of  the  compute  work  •  Compact  FD  stencils  map  nicely  to  GPU  

–  Regular  memory  access  pa\ern  –  Some  data  reuse  (use  shared  memory)  –  But  s0ll  memory  bound  

•  GTC  2013  -­‐  S3176    Micikevicius,  2009,  3D  Finite  Difference    Computa2on  on  GPUs  using  CUDA,  NVIDIA  

Page 5: An Adventure in Porting: Adding GPU-Acceleration to Open ......Title: An Adventure in Porting: Adding GPU-Acceleration to Open-Source 3D Elastic Wave Modeling Author: Robin Weiss Subject:

CPUStart

Read Source Wavelet,

Stiffness, and Density Fields

GPU

Maximum # of Timesteps Completed?

Write data to

disk

No

Done

Yes

Initialize GPU data structures

and copy model data to

GPU

Stress Source?

Accel Source?

No

YesFree

Surface?

No

Displacement To Strain

...

.........

1Strain to Stress

...

......... ...

2Near Surface BC

...

3

Inject Stress Source4

Stress To Acceleration

...

.........

5

Inject Accel Source6

Advance Time

...

......... ...

7Boundary Conditions

8

Extract Receiver Data9

Yes

Page 6: An Adventure in Porting: Adding GPU-Acceleration to Open ......Title: An Adventure in Porting: Adding GPU-Acceleration to Open-Source 3D Elastic Wave Modeling Author: Robin Weiss Subject:

Op0mize:  The  usual  suspects…  •  Stash  data  in  shared  memory  and/or  registers  •  Find  good  block  dimensions  •  Repeated  expression  replacement  •  Kernel  fusing  – Keep  an  eye  on  register  usage  (-Xptxas -v)  

•  CUDA  Visual  Profiler  is  your  friend  

Page 7: An Adventure in Porting: Adding GPU-Acceleration to Open ......Title: An Adventure in Porting: Adding GPU-Acceleration to Open-Source 3D Elastic Wave Modeling Author: Robin Weiss Subject:

GPU-­‐CPU  Displacement  Comparison  

depth-­‐component  of  displacement  field  overlaid  on  velocity  model  

Page 8: An Adventure in Porting: Adding GPU-Acceleration to Open ......Title: An Adventure in Porting: Adding GPU-Acceleration to Open-Source 3D Elastic Wave Modeling Author: Robin Weiss Subject:

Off  to  the  races…  •  All  experiments  run  on  N3  data  sets  for  2000  0mesteps  •  CPU  results  computed  with  Intel  Sandy  Bridge  E5-­‐2670  2.60GHz  

–  2x  8-­‐core,  IBM  iDataPlex  dx360  M4  –  University  of  Chicago’s  Midway  Compute  Cluster  

•  GPU  results  computed  with  Nvidia  K40  –  4x  GPUs  per  node    –  Nvidia  PSG  cluster  –  thanks,  Nvidia!  

Page 9: An Adventure in Porting: Adding GPU-Acceleration to Open ......Title: An Adventure in Porting: Adding GPU-Acceleration to Open-Source 3D Elastic Wave Modeling Author: Robin Weiss Subject:

Off  to  the  races…  

1  

4  

16  

64  

256  

1024  

4096  

16384  

65536  

64   128   256   512  

Time  (secon

ds)  

Cube  Size  

GPU   16-­‐CPU  

Page 10: An Adventure in Porting: Adding GPU-Acceleration to Open ......Title: An Adventure in Porting: Adding GPU-Acceleration to Open-Source 3D Elastic Wave Modeling Author: Robin Weiss Subject:

Off  to  the  races…  

Cube  Size   Disp.  To  Strain   Strain  to  Stress   Stress  to  Accel.   Advance  Disp.  

100   1405   1657   1064   2976  148   1333   1702   1053   3077  196   1467   1714   1164   3051  244   1449   1662   1054   3051  292   1404   1703   993   2950  340   1282   1721   1021   3170  

Throughput  (Mpoints/sec)  

Page 11: An Adventure in Porting: Adding GPU-Acceleration to Open ......Title: An Adventure in Porting: Adding GPU-Acceleration to Open-Source 3D Elastic Wave Modeling Author: Robin Weiss Subject:

Overall  Throughput  

0  

50  

100  

150  

200  

250  

300  

350  

0   50   100   150   200   250   300   350   400   450  

Mpo

ints/sec  

Cube  Size  

Page 12: An Adventure in Porting: Adding GPU-Acceleration to Open ......Title: An Adventure in Porting: Adding GPU-Acceleration to Open-Source 3D Elastic Wave Modeling Author: Robin Weiss Subject:

Speedup  

0  

1  

2  

3  

4  

5  

6  

100   148   196   244   292   340   388  

Speedu

p  Factor  

Cube  Size  

GPU  vs.  16-­‐CPU  

Page 13: An Adventure in Porting: Adding GPU-Acceleration to Open ......Title: An Adventure in Porting: Adding GPU-Acceleration to Open-Source 3D Elastic Wave Modeling Author: Robin Weiss Subject:

Speedup  

0  

10  

20  

30  

40  

50  

60  

70  

80  

100   148   196   244   292   340   388  

Speedu

p  Factor  

Cube  Size  

GPU  vs.  16-­‐CPU   GPU  vs.  1-­‐CPU  

Page 14: An Adventure in Porting: Adding GPU-Acceleration to Open ......Title: An Adventure in Porting: Adding GPU-Acceleration to Open-Source 3D Elastic Wave Modeling Author: Robin Weiss Subject:

While  this  is  all  well  and  good…  

1  2  4  8  16  32  64  

128  256  512  1024  2048  4096  8192  

16384  32768  

64   128   256   512   1024  

Time  (secon

ds)  

Cube  Size  

GPU   16-­‐CPU  

Page 15: An Adventure in Porting: Adding GPU-Acceleration to Open ......Title: An Adventure in Porting: Adding GPU-Acceleration to Open-Source 3D Elastic Wave Modeling Author: Robin Weiss Subject:

…CPU  implementa0on  has  us  beat  

1  2  4  8  16  32  64  

128  256  512  1024  2048  4096  8192  

16384  32768  

64   128   256   512   1024  

Time  (secon

ds)  

Cube  Size  

GPU   16-­‐CPU  

Page 16: An Adventure in Porting: Adding GPU-Acceleration to Open ......Title: An Adventure in Porting: Adding GPU-Acceleration to Open-Source 3D Elastic Wave Modeling Author: Robin Weiss Subject:

GPU  Memory  Wall  •  Limited  GPU  memory  becomes  an  issue  •  In  our  3D  implementa0on,  each  grid  point  requires:  

–  1  density  value  –  9  s0ffness  coefficients  –  3  displacement  components  (x3  0me  steps)  –  6  strain/stress  field  components  –  3  accelera0on  components  

•  GlobalMemory  ≥  28  *  gridPoints  *  sizeof(float)  

Page 17: An Adventure in Porting: Adding GPU-Acceleration to Open ......Title: An Adventure in Porting: Adding GPU-Acceleration to Open-Source 3D Elastic Wave Modeling Author: Robin Weiss Subject:

Mul0ple  Devices  •  Increase  GPU  memory  (in  aggregate)  by  distribu0ng  grid  across  mul0ple  devices  

•  Not  quite  embarrassingly  parallel  –  FD  stencil  needs  halo  cells  from  neighbor  device(s)  – Halo  dance  happens  each  itera0on  –  Communica0on  can  become  costly  

•  Slice  on  the  slowest  varying  axis  –  Easy  to  compute  stride  

Page 18: An Adventure in Porting: Adding GPU-Acceleration to Open ......Title: An Adventure in Porting: Adding GPU-Acceleration to Open-Source 3D Elastic Wave Modeling Author: Robin Weiss Subject:

Mul0ple  Devices  

GPU 0 GPU 1 GPU 2 GPU 3

Page 19: An Adventure in Porting: Adding GPU-Acceleration to Open ......Title: An Adventure in Porting: Adding GPU-Acceleration to Open-Source 3D Elastic Wave Modeling Author: Robin Weiss Subject:

Mul0ple  Devices:  Take  1  •  One  host-­‐process  per  device  •  Communicate  through  MPI  

GPU 0 GPU 1

Node 0

Rank 0 Rank 1MPI

PCIe

PCIe

Page 20: An Adventure in Porting: Adding GPU-Acceleration to Open ......Title: An Adventure in Porting: Adding GPU-Acceleration to Open-Source 3D Elastic Wave Modeling Author: Robin Weiss Subject:

Mul0ple  Devices:  Take  1  •  Generalizes  nicely,  scales  to  mul0ple  nodes  •  Handle  arbitrary*  number  of  devices  &  nodes  

GPU 0 GPU 1

Node 0

Rank 0 Rank 1MPI

PCIe

PCIe

GPU 0 GPU 1

Node 1

Rank 0 Rank 1MPI

PCIe

PCIe

GPU 0 GPU 1

Node N

Rank 0 Rank 1MPI

PCIe

PCIe

MPI MPI

...

Page 21: An Adventure in Porting: Adding GPU-Acceleration to Open ......Title: An Adventure in Porting: Adding GPU-Acceleration to Open-Source 3D Elastic Wave Modeling Author: Robin Weiss Subject:

Mul0ple  Devices:  Take  1  •  But:  the  data  dance  is  awkward  (read:  slow)  

GPU 0 GPU 1

Node 0

Rank 0 Rank 1

MPI_Send / MPI_Recv

cuda

Mem

cpy

cuda

Mem

cpy

send_buf recv_buf

gpu0_buf gpu1_buf

Page 22: An Adventure in Porting: Adding GPU-Acceleration to Open ......Title: An Adventure in Porting: Adding GPU-Acceleration to Open-Source 3D Elastic Wave Modeling Author: Robin Weiss Subject:

Mul0ple  Devices:  Take  1  

16  

32  

64  

128  

256  

512  

1024  

2048  

64   128   256   512   1024  

Time  (secon

ds)  

Cube  Size  

MPI-­‐1   MPI-­‐2   MPI-­‐4   MPI-­‐8  

Page 23: An Adventure in Porting: Adding GPU-Acceleration to Open ......Title: An Adventure in Porting: Adding GPU-Acceleration to Open-Source 3D Elastic Wave Modeling Author: Robin Weiss Subject:

Mul0ple  Devices:  Take  1  •  Can  we  fix  this?  –  CUDA-­‐aware  MPI…?  – GPU  Direct…?  –  RDMA…?  –  Black  magic…?  

•  Open-­‐source  research  code  considera0ons  –  Keep  it  simple!  

Page 24: An Adventure in Porting: Adding GPU-Acceleration to Open ......Title: An Adventure in Porting: Adding GPU-Acceleration to Open-Source 3D Elastic Wave Modeling Author: Robin Weiss Subject:

Mul0ple  Devices:  Take  2  •  One  host-­‐process  per  node  •  Communicate  through  peer-­‐to-­‐peer  memcpy  

GPU 0 GPU 1

Node 0

Host Process

P2P

GPU 2 GPU 3

Page 25: An Adventure in Porting: Adding GPU-Acceleration to Open ......Title: An Adventure in Porting: Adding GPU-Acceleration to Open-Source 3D Elastic Wave Modeling Author: Robin Weiss Subject:

Mul0ple  Devices:  Take  2  

16  

32  

64  

128  

256  

512  

1024  

64   128   256   512   1024  

Time  (secon

ds)  

Cube  Size  

P2P-­‐1   P2P-­‐2   P2P-­‐4  

Page 26: An Adventure in Porting: Adding GPU-Acceleration to Open ......Title: An Adventure in Porting: Adding GPU-Acceleration to Open-Source 3D Elastic Wave Modeling Author: Robin Weiss Subject:

Overall  Throughput  

0  

200  

400  

600  

800  

1000  

1200  

1400  

0   100   200   300   400   500   600   700   800   900   1000  

Mpo

ints/sec  

Cube  Size  

P2P-­‐1   P2P-­‐2   P2P-­‐4  

Page 27: An Adventure in Porting: Adding GPU-Acceleration to Open ......Title: An Adventure in Porting: Adding GPU-Acceleration to Open-Source 3D Elastic Wave Modeling Author: Robin Weiss Subject:

Mul0ple  Devices:  Take  2  •  But  we’re  s0ll  limited  to  one  node…  

GPU 0 GPU 1

Node 0

Host Process

P2P

GPU 2 GPU 3

Page 28: An Adventure in Porting: Adding GPU-Acceleration to Open ......Title: An Adventure in Porting: Adding GPU-Acceleration to Open-Source 3D Elastic Wave Modeling Author: Robin Weiss Subject:

Mul0ple  Devices:  Take  2.1  •  Scale  to  mul0ple  nodes  with  MPI  •  Handle  arbitrary*  number  of  devices  &  nodes  

GPU 0 GPU 1

Node 0

Rank 0 MPI

GPU 2 GPU 3 GPU 0 GPU 1

Node 1

Rank 1

P2P

GPU 2 GPU 3 GPU 0 GPU 1

Node N

Rank N

P2P

GPU 2 GPU 3

P2P

MPI

...

Page 29: An Adventure in Porting: Adding GPU-Acceleration to Open ......Title: An Adventure in Porting: Adding GPU-Acceleration to Open-Source 3D Elastic Wave Modeling Author: Robin Weiss Subject:

Mul0ple  Devices:  Take  2.1  

16  

32  

64  

128  

256  

512  

1024  

2048  

64   128   256   512   1024  

Time  (secon

ds)  

Cube  Size  

P2P-­‐1   P2P-­‐2   P2P-­‐4   P2P-­‐4(2)  

Page 30: An Adventure in Porting: Adding GPU-Acceleration to Open ......Title: An Adventure in Porting: Adding GPU-Acceleration to Open-Source 3D Elastic Wave Modeling Author: Robin Weiss Subject:

Mul0ple  Devices:  Take  2.1  

16  

32  

64  

128  

256  

512  

1024  

2048  

64   128   256   512   1024  

Time  (secon

ds)  

Cube  Size  

P2P-­‐4(2)   MPI-­‐8  

Page 31: An Adventure in Porting: Adding GPU-Acceleration to Open ......Title: An Adventure in Porting: Adding GPU-Acceleration to Open-Source 3D Elastic Wave Modeling Author: Robin Weiss Subject:

Overall  Throughput  

0  

200  

400  

600  

800  

1000  

1200  

1400  

0   100   200   300   400   500   600   700   800   900   1000  

Mpo

ints/sec  

Cube  Size  

P2P-­‐1   P2P-­‐2   P2P-­‐4   P2P-­‐4(2)  

Page 32: An Adventure in Porting: Adding GPU-Acceleration to Open ......Title: An Adventure in Porting: Adding GPU-Acceleration to Open-Source 3D Elastic Wave Modeling Author: Robin Weiss Subject:

Speedup  

0  

5  

10  

15  

20  

25  

30  

100   148   196   244   292   340   388   436   484   532   580   628   676   724   772   820   868  

Speedu

p  Factor  

Cube  Size  

GPU  vs  16-­‐CPU   P2P-­‐4  vs  16-­‐CPU   P2P-­‐4(2)  vs  16-­‐CPU  

Page 33: An Adventure in Porting: Adding GPU-Acceleration to Open ......Title: An Adventure in Porting: Adding GPU-Acceleration to Open-Source 3D Elastic Wave Modeling Author: Robin Weiss Subject:

Wrapping  up  •  Elas0c  wave  modeling  represents  a  complex  computa0on  problem  

–  But,  GPU  is  a  promising  route  for  accelera0on  –  Flexible  mul0-­‐GPU  implementa0on  offers  considerable  speedup  on  large  grids  

•  Represents  a  seed  for  further  research,  modifica0ons,  and  extensions  

•  Check  out  our  code:  www.reproducibility.org  –  Exec:  sfewefd2d_gpu,  sfewefd3d_gpu_p2p,  sfewefd3d_multiNode!–  Source:  user/rweiss  –  Examples:  book/uwa/geo2013Elas0cModellingGPU/  –  Improvements  and  extensions  welcomed...                                                              ...bug  reports  welcome  as  well  

Page 34: An Adventure in Porting: Adding GPU-Acceleration to Open ......Title: An Adventure in Porting: Adding GPU-Acceleration to Open-Source 3D Elastic Wave Modeling Author: Robin Weiss Subject:

S4599:  An  Adventure  in  Por2ng:    Adding  GPU-­‐Accelera0on  to  Open-­‐Source  3D  

Elas0c  Wave  Modeling  Robin  M.  Weiss  and  Jeffery  Shragge  

 March  26,  2014  

GPU  Technology  Conference