simon mcintosh-smith [email protected] head of...

23
Exploiting OpenCL for heterogeneous computing: a case study Simon McIntosh-Smith [email protected] Head of Microelectronics Research 1

Upload: others

Post on 15-Jul-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Simon McIntosh-Smith simonm@cs.bris.ac.uk Head of ...simonm/publications/sms_bude_gpu_devcon_2011.pdfComputer Journal, September 12th 2011. DOI: 10.1093/comjnl/bxr091 • N. Gibbs,

Exploiting OpenCL for heterogeneous computing:

a case study

Simon McIntosh-Smith [email protected] Head of Microelectronics Research

1

Page 2: Simon McIntosh-Smith simonm@cs.bris.ac.uk Head of ...simonm/publications/sms_bude_gpu_devcon_2011.pdfComputer Journal, September 12th 2011. DOI: 10.1093/comjnl/bxr091 • N. Gibbs,

!  Collaborators •  Richard B. Sessions, Amaurys Avila Ibarra

•  University of Bristol, Biochemistry •  James Price

•  University of Bristol, Computer Science •  Tsuyoshi Hamada, Felipe Cruz

•  University of Nagasaki, Japan

2

Page 3: Simon McIntosh-Smith simonm@cs.bris.ac.uk Head of ...simonm/publications/sms_bude_gpu_devcon_2011.pdfComputer Journal, September 12th 2011. DOI: 10.1093/comjnl/bxr091 • N. Gibbs,

!  Molecular docking

3

Proteins typically O(1000) atoms Ligands typically O(100) atoms

Page 4: Simon McIntosh-Smith simonm@cs.bris.ac.uk Head of ...simonm/publications/sms_bude_gpu_devcon_2011.pdfComputer Journal, September 12th 2011. DOI: 10.1093/comjnl/bxr091 • N. Gibbs,

!  BUDE: Bristol University Docking Engine

Typical docking scoring functions

Empirical Free Energy Forcefield

BUDE

Free Energy calculations MM1,2 QM/MM3

Entropy: solvation configurational Electrostatics All atom Explicit solvent

No Yes Yes Approx Approx Yes ? Approx Yes No Yes Yes No No Yes

Accuracy

Speed

1. MD Tyka, AR Clarke, RB Sessions, J. Phys. Chem. B 110 17212-20 (2006)

2. MD Tyka, RB Sessions, AR Clarke, J. Phys. Chem. B 111 9571-80 (2007)

3. CJ Woods, FR Manby, AJ Mulholland, J. Chem. Phys. 128 014109 (2008) 4

Page 5: Simon McIntosh-Smith simonm@cs.bris.ac.uk Head of ...simonm/publications/sms_bude_gpu_devcon_2011.pdfComputer Journal, September 12th 2011. DOI: 10.1093/comjnl/bxr091 • N. Gibbs,

!  How BUDE’s EMC works

5

Page 6: Simon McIntosh-Smith simonm@cs.bris.ac.uk Head of ...simonm/publications/sms_bude_gpu_devcon_2011.pdfComputer Journal, September 12th 2011. DOI: 10.1093/comjnl/bxr091 • N. Gibbs,

!  Experimental results

6

Page 7: Simon McIntosh-Smith simonm@cs.bris.ac.uk Head of ...simonm/publications/sms_bude_gpu_devcon_2011.pdfComputer Journal, September 12th 2011. DOI: 10.1093/comjnl/bxr091 • N. Gibbs,

!  Redocking into Xray Structure

7 1CIL (Human carbonic anhydrase II)

Page 8: Simon McIntosh-Smith simonm@cs.bris.ac.uk Head of ...simonm/publications/sms_bude_gpu_devcon_2011.pdfComputer Journal, September 12th 2011. DOI: 10.1093/comjnl/bxr091 • N. Gibbs,

!  Another example

8 1EZQ (Human Factor XA)

Page 9: Simon McIntosh-Smith simonm@cs.bris.ac.uk Head of ...simonm/publications/sms_bude_gpu_devcon_2011.pdfComputer Journal, September 12th 2011. DOI: 10.1093/comjnl/bxr091 • N. Gibbs,

!  OpenCL for heterogeneous computing

GPU

CPU CPU A modern computer includes: – One or more CPUs – One or more GPUs

9

OpenCL (Open Compute Language) lets programmers write a single portable program that uses ALL

resources in the heterogeneous platform

GPU

Page 10: Simon McIntosh-Smith simonm@cs.bris.ac.uk Head of ...simonm/publications/sms_bude_gpu_devcon_2011.pdfComputer Journal, September 12th 2011. DOI: 10.1093/comjnl/bxr091 • N. Gibbs,

!  BUDE’s heterogeneous approach

1.  Discover all OpenCL platforms/devices, inc. both CPUs and GPUs

2.  Run a micro benchmark on each device, ideally a short piece of real work

3.  Load balance using micro benchmark results

4.  Re-run micro benchmark at regular intervals in case load changes

10

Page 11: Simon McIntosh-Smith simonm@cs.bris.ac.uk Head of ...simonm/publications/sms_bude_gpu_devcon_2011.pdfComputer Journal, September 12th 2011. DOI: 10.1093/comjnl/bxr091 • N. Gibbs,

!  Benchmark results

11

13.4 11.7

8.0 7.3 6.5

5.3 4.3

0.8

13.6 13.3 11.7 11.0

6.6 6.3 5.9 4.6 4.5

1.2 1.2 1.0 0.9 0.9 0.7 0.3 0.3 0 2 4 6 8

10 12 14 16

Spee

dup

(hig

her i

s be

tter)

Page 12: Simon McIntosh-Smith simonm@cs.bris.ac.uk Head of ...simonm/publications/sms_bude_gpu_devcon_2011.pdfComputer Journal, September 12th 2011. DOI: 10.1093/comjnl/bxr091 • N. Gibbs,

13.4

11.7

8.1 7.3

6.5 5.3

0

2

4

6

8

10

12

14

16

M2050 (x2) GTX-580 M2070 M2050 GTX-285 HD5870

Spee

dup

(hig

her i

s be

tter)

!  Selected performance results

12

GPUs

Page 13: Simon McIntosh-Smith simonm@cs.bris.ac.uk Head of ...simonm/publications/sms_bude_gpu_devcon_2011.pdfComputer Journal, September 12th 2011. DOI: 10.1093/comjnl/bxr091 • N. Gibbs,

1.0

0.9 0.9

0.7

0.3

0.0

0.2

0.4

0.6

0.8

1.0

1.2

E5462 (Fortran) x2

E5620 x2 i7-2600K i5-2500T A8-3850 CPU

Spee

dup

(hig

her i

s be

tter)

!  Selected performance results

13

CPUs

Page 14: Simon McIntosh-Smith simonm@cs.bris.ac.uk Head of ...simonm/publications/sms_bude_gpu_devcon_2011.pdfComputer Journal, September 12th 2011. DOI: 10.1093/comjnl/bxr091 • N. Gibbs,

13.6

11.7

5.9

1.2

0

2

4

6

8

10

12

14

16

M2050 (x2) & E5620 (x2)

GTX-580 & E8500 HD5870 & i5-2500T A8-3850 GPU & CPU

Spee

dup

(hig

her i

s be

tter)

!  Selected performance results

14

Heterogeneous (CPUs+GPUs)

Page 15: Simon McIntosh-Smith simonm@cs.bris.ac.uk Head of ...simonm/publications/sms_bude_gpu_devcon_2011.pdfComputer Journal, September 12th 2011. DOI: 10.1093/comjnl/bxr091 • N. Gibbs,

13.4

11.7

1.0

0

2

4

6

8

10

12

14

16

M2050 x2 GTX-580 E5462 (Fortran) x2

Spee

dup

(hig

her i

s be

tter)

!  Selected performance results

15

Where we started

Where we finished Where we could go

£340

Page 16: Simon McIntosh-Smith simonm@cs.bris.ac.uk Head of ...simonm/publications/sms_bude_gpu_devcon_2011.pdfComputer Journal, September 12th 2011. DOI: 10.1093/comjnl/bxr091 • N. Gibbs,

!  Relative energy and run-time

16

Measurements are for a constant amount of work. Energy measurements are “at the wall” and include any idle components.

88% reduction in energy 93% reduction in time

16.2

11.4 10.6

9.1 9.3

2.4 1.0

13.2

15.2 15.4

5.9 6.6

0.7 1.0

0

2

4

6

8

10

12

14

16

18

GTX-580 M2050 (x2)

M2050 (x2) +

E5620 (x2)

HD5870 HD5870 + i5-2500T

i5-2500T E5620 (x2)

Relative Performance Per Watt

Relative Performance

Page 17: Simon McIntosh-Smith simonm@cs.bris.ac.uk Head of ...simonm/publications/sms_bude_gpu_devcon_2011.pdfComputer Journal, September 12th 2011. DOI: 10.1093/comjnl/bxr091 • N. Gibbs,

!  What does this let us do?

17

Page 18: Simon McIntosh-Smith simonm@cs.bris.ac.uk Head of ...simonm/publications/sms_bude_gpu_devcon_2011.pdfComputer Journal, September 12th 2011. DOI: 10.1093/comjnl/bxr091 • N. Gibbs,

!  Potentially save lives

18

NDM-1 responsible for antibiotic resistance giving rise

to “superbugs”

Page 19: Simon McIntosh-Smith simonm@cs.bris.ac.uk Head of ...simonm/publications/sms_bude_gpu_devcon_2011.pdfComputer Journal, September 12th 2011. DOI: 10.1093/comjnl/bxr091 • N. Gibbs,

!  GPU-system DEGIMA

19

•  Used 144 GPUs in parallel for drug docking simulations •  ATI Radeon HD5870 & Intel i5-2500T

•  ~300 TFLOPS single precision •  Courtesy of Tsuyoshi Hamada and Felipe Cruz, Nagasaki

Page 20: Simon McIntosh-Smith simonm@cs.bris.ac.uk Head of ...simonm/publications/sms_bude_gpu_devcon_2011.pdfComputer Journal, September 12th 2011. DOI: 10.1093/comjnl/bxr091 • N. Gibbs,

!  NDM-1 experiment •  1 million candidate drug molecules times

20 conformers each à 20M dockings •  1.23x1017 atom-atom energies calculated •  267 days of GPU compute time and 224

days of CPU compute time •  ~55 hours actual wall-time •  A second run with 8 million molecules,

160M conformers on >200 GPUs is running right now!

20

Page 21: Simon McIntosh-Smith simonm@cs.bris.ac.uk Head of ...simonm/publications/sms_bude_gpu_devcon_2011.pdfComputer Journal, September 12th 2011. DOI: 10.1093/comjnl/bxr091 • N. Gibbs,

!  Conclusions •  OpenCL enables truly heterogeneous computing,

harnessing all hardware resources in a system •  GPUs can yield significant savings in energy costs

(and equipment costs) •  OpenCL can work just as well for multi-core CPUs

as it does for GPUs

It’s possible to screen libraries of millions of molecules against complex targets using highly accurate, computationally-expensive methods in one weekend using equipment costing O(£100K)

21

Page 22: Simon McIntosh-Smith simonm@cs.bris.ac.uk Head of ...simonm/publications/sms_bude_gpu_devcon_2011.pdfComputer Journal, September 12th 2011. DOI: 10.1093/comjnl/bxr091 • N. Gibbs,

!  For an introduction to GPUs The GPU Computing Revolution – a Knowledge Transfer Report from the London Mathematical Society and the KTN for Industrial Mathematics •  https://ktn.innovateuk.org/web/mathsktn/

articles/-/blogs/the-gpu-computing-revolution

22

Page 23: Simon McIntosh-Smith simonm@cs.bris.ac.uk Head of ...simonm/publications/sms_bude_gpu_devcon_2011.pdfComputer Journal, September 12th 2011. DOI: 10.1093/comjnl/bxr091 • N. Gibbs,

!  References •  S. McIntosh-Smith, T. Wilson, A.A. Ibarra, J.

Crisp and R.B. Sessions, "Benchmarking energy efficiency, power costs and carbon emissions on heterogeneous systems", The Computer Journal, September 12th 2011. DOI: 10.1093/comjnl/bxr091

•  N. Gibbs, A.R. Clarke & R.B. Sessions, "Ab-initio Protein Folding using Physicochemical Potentials and a Simplified Off-Lattice Model", Proteins 43:186-202,200

23