high performance computing with python (4 hour...

48
[email protected] - EuroPy 2011 High Performance Computing High Performance Computing with Python (4 hour tutorial) with Python (4 hour tutorial) EuroPython 2011

Upload: dangdan

Post on 27-Aug-2019

233 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: High Performance Computing with Python (4 hour tutorial)sethc23.github.io/wiki/Python/Euro_HPC_2013.pdfIan@IanOzsvald.com - EuroPy 2011 High Performance Computing with Python (4 hour

[email protected] - EuroPy 2011

High Performance Computing High Performance Computing with Python (4 hour tutorial)with Python (4 hour tutorial)

EuroPython 2011

Page 2: High Performance Computing with Python (4 hour tutorial)sethc23.github.io/wiki/Python/Euro_HPC_2013.pdfIan@IanOzsvald.com - EuroPy 2011 High Performance Computing with Python (4 hour

[email protected] - EuroPy 2011

GoalGoal

ā€¢ Get you writing faster code for CPU-bound problems using Python

ā€¢ Your task is probably in pure Python, is CPU bound and can be parallelised (right?)

ā€¢ We're not looking at network-bound problems

ā€¢ Profiling + Tools == Speed

Page 3: High Performance Computing with Python (4 hour tutorial)sethc23.github.io/wiki/Python/Euro_HPC_2013.pdfIan@IanOzsvald.com - EuroPy 2011 High Performance Computing with Python (4 hour

[email protected] - EuroPy 2011

Get the source please!Get the source please!

ā€¢ http://tinyurl.com/europyhpcā€¢ (original:

http://ianozsvald.com/wp-content/hpc_tutorial_code_europython2011_v1.zipā€¢ )ā€¢ google: ā€œgithub ianozsvaldā€, get HPC full

source (but you can do this after!)

Page 4: High Performance Computing with Python (4 hour tutorial)sethc23.github.io/wiki/Python/Euro_HPC_2013.pdfIan@IanOzsvald.com - EuroPy 2011 High Performance Computing with Python (4 hour

[email protected] - EuroPy 2011

About me (Ian Ozsvald)About me (Ian Ozsvald)

ā€¢ A.I. researcher in industry for 12 yearsā€¢ C, C++, (some) Java, Python for 8 years

ā€¢ Demo'd pyCUDA and Headroid last year

ā€¢ Lecturer on A.I. at Sussex Uni (a bit)ā€¢ ShowMeDo.com co-founderā€¢ Python teacher, BrightonPy co-founder

ā€¢ IanOzsvald.com - MorConsulting.com

Page 5: High Performance Computing with Python (4 hour tutorial)sethc23.github.io/wiki/Python/Euro_HPC_2013.pdfIan@IanOzsvald.com - EuroPy 2011 High Performance Computing with Python (4 hour

[email protected] - EuroPy 2011

Overview (pre-requisites)Overview (pre-requisites)

ā€¢ cProfile, line_profiler, runsnakeā€¢ numpyā€¢ Cython and ShedSkin

ā€¢ multiprocessing

ā€¢ ParallelPythonā€¢ PyPyā€¢ pyCUDA

Page 6: High Performance Computing with Python (4 hour tutorial)sethc23.github.io/wiki/Python/Euro_HPC_2013.pdfIan@IanOzsvald.com - EuroPy 2011 High Performance Computing with Python (4 hour

[email protected] - EuroPy 2011

We won't be looking at...We won't be looking at...

ā€¢ Algorithmic choices, clusters or cloudā€¢ Gnumpy (numpy->GPU)

ā€¢ Theano (numpy(ish)->CPU/GPU)

ā€¢ CopperHead (numpy(ish)->GPU)ā€¢ BottleNeck (Cython'd numpy)ā€¢ Map/Reduce

ā€¢ pyOpenCL

Page 7: High Performance Computing with Python (4 hour tutorial)sethc23.github.io/wiki/Python/Euro_HPC_2013.pdfIan@IanOzsvald.com - EuroPy 2011 High Performance Computing with Python (4 hour

[email protected] - EuroPy 2011

Something to considerSomething to consider

ā€¢ ā€œProebsting's Lawā€ā€¢ http://research.microsoft.com/en-us/um/people/toddpro/papers/law.htm

ā€¢ Compiler advances (generally) unhelpful (sort-of ā€“ consider auto vectorisation!)

ā€¢ Multi-core commonā€¢ Very-parallel (CUDA, OpenCL, MS AMP,

APUs) should be considered

Page 8: High Performance Computing with Python (4 hour tutorial)sethc23.github.io/wiki/Python/Euro_HPC_2013.pdfIan@IanOzsvald.com - EuroPy 2011 High Performance Computing with Python (4 hour

[email protected] - EuroPy 2011

What can we expect?What can we expect?

ā€¢ Close to C speeds (shootout):ā€“ http://attractivechaos.github.com/plb/ā€“ http://shootout.alioth.debian.org/u32/which-programming-languages-are-fastest.php

ā€¢ Depends on how much work you put inā€¢ nbody JavaScript much faster than Python

but we can catch it/beat it (and get close to C speed)

Page 9: High Performance Computing with Python (4 hour tutorial)sethc23.github.io/wiki/Python/Euro_HPC_2013.pdfIan@IanOzsvald.com - EuroPy 2011 High Performance Computing with Python (4 hour

[email protected] - EuroPy 2011

Practical result - PANalyticalPractical result - PANalytical

Numpy+Py +More Numpy +'if' added! Numpy+Py+Cy All +Multiprocessing0

50

100

150

200

250234

167

126

9081

20

Sec

ond

s

Page 10: High Performance Computing with Python (4 hour tutorial)sethc23.github.io/wiki/Python/Euro_HPC_2013.pdfIan@IanOzsvald.com - EuroPy 2011 High Performance Computing with Python (4 hour

[email protected] - EuroPy 2011

Mandelbrot results (Desktop i3)Mandelbrot results (Desktop i3)

Py 2.7PyPy 1.5

Numpy v.NumExpr

Cython np.ShedSkin

pyCUDA pypyCUDA C

ParallelPython

0

5

10

15

20

25

30

35

40

35

10

36

10

0.3 0.3

3.5

0.07

9

Sec

ond

s

Page 11: High Performance Computing with Python (4 hour tutorial)sethc23.github.io/wiki/Python/Euro_HPC_2013.pdfIan@IanOzsvald.com - EuroPy 2011 High Performance Computing with Python (4 hour

[email protected] - EuroPy 2011

Our codeOur code

ā€¢ pure_python.py ā€¢ numpy_vector.py ā€¢ pure_python.py 1000 1000 # RUNā€¢ Our two building blocksā€¢ Google ā€œgithub ianozsvaldā€ ->

EuroPython2011_HighPerformanceComputing

ā€¢ https://github.com/ianozsvald/EuroPython2011_HighPerformanceComputing

Page 12: High Performance Computing with Python (4 hour tutorial)sethc23.github.io/wiki/Python/Euro_HPC_2013.pdfIan@IanOzsvald.com - EuroPy 2011 High Performance Computing with Python (4 hour

[email protected] - EuroPy 2011

Profiling bottlenecksProfiling bottlenecks

ā€¢ python -m cProfile -o rep.prof pure_python.py 1000 1000

ā€¢ import pstatsā€¢ p = pstats.Stats('rep.prof')ā€¢ p.sort_stats('cumulative').pri

nt_stats(10)

Page 13: High Performance Computing with Python (4 hour tutorial)sethc23.github.io/wiki/Python/Euro_HPC_2013.pdfIan@IanOzsvald.com - EuroPy 2011 High Performance Computing with Python (4 hour

[email protected] - EuroPy 2011

cProfile outputcProfile output

51923594 function calls (51923523 primitive calls) in 74.301 seconds

ncalls tottime percall cumtime percall

pure_python.py:1(<module>)

1 0.034 0.034 74.303 74.303

pure_python.py:23(calc_pure_python)

1 0.273 0.273 74.268 74.268

pure_python.py:9(calculate_z_serial_purepython)

1 57.168 57.168 73.580 73.580

{abs}

51,414,419 12.465 0.000 12.465 0.000

...

Page 14: High Performance Computing with Python (4 hour tutorial)sethc23.github.io/wiki/Python/Euro_HPC_2013.pdfIan@IanOzsvald.com - EuroPy 2011 High Performance Computing with Python (4 hour

[email protected] - EuroPy 2011

RunSnakeRunRunSnakeRun

Page 15: High Performance Computing with Python (4 hour tutorial)sethc23.github.io/wiki/Python/Euro_HPC_2013.pdfIan@IanOzsvald.com - EuroPy 2011 High Performance Computing with Python (4 hour

[email protected] - EuroPy 2011

Let's profile python.pyLet's profile python.py

ā€¢ python -m cProfile -o res.prof pure_python.py 1000 1000

ā€¢ runsnake res.profā€¢ Let's look at the result

Page 16: High Performance Computing with Python (4 hour tutorial)sethc23.github.io/wiki/Python/Euro_HPC_2013.pdfIan@IanOzsvald.com - EuroPy 2011 High Performance Computing with Python (4 hour

[email protected] - EuroPy 2011

What's the problem?What's the problem?

ā€¢ What's really slow?ā€¢ Useful from a high level...ā€¢ We want a line profiler!

Page 17: High Performance Computing with Python (4 hour tutorial)sethc23.github.io/wiki/Python/Euro_HPC_2013.pdfIan@IanOzsvald.com - EuroPy 2011 High Performance Computing with Python (4 hour

[email protected] - EuroPy 2011

line_profiler.pyline_profiler.py

ā€¢ kernprof.py -l -v pure_python_lineprofiler.py 1000 1000

ā€¢ Warning...slow! We might want to use 300 100

Page 18: High Performance Computing with Python (4 hour tutorial)sethc23.github.io/wiki/Python/Euro_HPC_2013.pdfIan@IanOzsvald.com - EuroPy 2011 High Performance Computing with Python (4 hour

[email protected] - EuroPy 2011

kernprof.py outputkernprof.py output

...% Time Line Contents

=====================

@profile

def calculate_z_serial_purepython(q, maxiter, z):

0.0 output = [0] * len(q)

1.1 for i in range(len(q)):

27.8 for iteration in range(maxiter):

35.8 z[i] = z[i]*z[i] + q[i]

31.9 if abs(z[i]) > 2.0:

Page 19: High Performance Computing with Python (4 hour tutorial)sethc23.github.io/wiki/Python/Euro_HPC_2013.pdfIan@IanOzsvald.com - EuroPy 2011 High Performance Computing with Python (4 hour

[email protected] - EuroPy 2011

Dereferencing is slowDereferencing is slow

ā€¢ Dereferencing involves lookups ā€“ slowā€¢ Our 'i' changes slowlyā€¢ zi = z[i]; qi = q[i] # DO ITā€¢ Change all z[i] and q[i] referencesā€¢ Run kernprof againā€¢ Is it cheaper?

Page 20: High Performance Computing with Python (4 hour tutorial)sethc23.github.io/wiki/Python/Euro_HPC_2013.pdfIan@IanOzsvald.com - EuroPy 2011 High Performance Computing with Python (4 hour

[email protected] - EuroPy 2011

We have faster codeWe have faster code

ā€¢ pure_python_2.py is faster, we'll use this as the basis for the next steps

ā€¢ There are tricks:ā€“ sets over lists if possibleā€“ use dict[] rather than dict.get()ā€“ build-in sort is fastā€“ list comprehensionsā€“ map rather than loops

Page 21: High Performance Computing with Python (4 hour tutorial)sethc23.github.io/wiki/Python/Euro_HPC_2013.pdfIan@IanOzsvald.com - EuroPy 2011 High Performance Computing with Python (4 hour

[email protected] - EuroPy 2011

PyPy 1.5PyPy 1.5

ā€¢ Confession ā€“ I'm a newbieā€¢ Probably cool tricks to learnā€¢ pypy pure_python_2.py 1000 1000ā€¢ PIL support, numpy isn'tā€¢ My (bad) code needs numpy for display

(maybe you can fix that?)ā€¢ pypy -m cProfile -o

runpypy.prof pure_python_2.py 1000 1000 # abs but no range

Page 22: High Performance Computing with Python (4 hour tutorial)sethc23.github.io/wiki/Python/Euro_HPC_2013.pdfIan@IanOzsvald.com - EuroPy 2011 High Performance Computing with Python (4 hour

[email protected] - EuroPy 2011

CythonCython

ā€¢ Manually add types, converts to Cā€¢ .pyx files (built on Pyrex)ā€¢ Win/Mac/Lin with gcc, msvc etcā€¢ 10-100* speed-upā€¢ numpy integrationā€¢ http://cython.org/

Page 23: High Performance Computing with Python (4 hour tutorial)sethc23.github.io/wiki/Python/Euro_HPC_2013.pdfIan@IanOzsvald.com - EuroPy 2011 High Performance Computing with Python (4 hour

[email protected] - EuroPy 2011

Cython on pure_python_2.pyCython on pure_python_2.py

ā€¢ # ./cython_pure_pythonā€¢ Make calculate_z.py, test it worksā€¢ Turn calculate_z.py to .pyxā€¢ Add setup.py (see Getting Started doc)ā€¢ python setup.py build_ext

--inplaceā€¢ cython -a calculate_z.pyx to get

profiling feedback (.html)

Page 24: High Performance Computing with Python (4 hour tutorial)sethc23.github.io/wiki/Python/Euro_HPC_2013.pdfIan@IanOzsvald.com - EuroPy 2011 High Performance Computing with Python (4 hour

[email protected] - EuroPy 2011

Cython typesCython types

ā€¢ Help Cython by adding annotations:ā€“ list q zā€“ int ā€“ unsigned int # hint no negative

indices with for loop ā€“ complex and complex double

ā€¢ How much faster?

Page 25: High Performance Computing with Python (4 hour tutorial)sethc23.github.io/wiki/Python/Euro_HPC_2013.pdfIan@IanOzsvald.com - EuroPy 2011 High Performance Computing with Python (4 hour

[email protected] - EuroPy 2011

Compiler directivesCompiler directives

ā€¢ http://wiki.cython.org/enhancements/compilerdirectivesā€¢ We can go faster (maybe):

ā€“ #cython: boundscheck=Falseā€“ #cython: wraparound=False

ā€¢ Profiling:ā€“ #cython: profile=True

ā€¢ Check profiling worksā€¢ Show _2_bettermath # FAST!

Page 26: High Performance Computing with Python (4 hour tutorial)sethc23.github.io/wiki/Python/Euro_HPC_2013.pdfIan@IanOzsvald.com - EuroPy 2011 High Performance Computing with Python (4 hour

[email protected] - EuroPy 2011

ShedSkinShedSkin

ā€¢ http://code.google.com/p/shedskin/ā€¢ Auto-converts Python to C++ (auto type

inference)ā€¢ Can only import modules that have been

implementedā€¢ No numpy, PIL etc but great for writing

new fast modulesā€¢ 3000 SLOC 'limit', always improving

Page 27: High Performance Computing with Python (4 hour tutorial)sethc23.github.io/wiki/Python/Euro_HPC_2013.pdfIan@IanOzsvald.com - EuroPy 2011 High Performance Computing with Python (4 hour

[email protected] - EuroPy 2011

Easy to useEasy to use

ā€¢ # ./shedskin/ā€¢ shedskin shedskin1.pyā€¢ makeā€¢ ./shedskin1 1000 1000ā€¢ shedskin shedskin2.py; makeā€¢ ./shedskin2 1000 1000 # FAST!ā€¢ No easy profiling, complex is slow (for

now)

Page 28: High Performance Computing with Python (4 hour tutorial)sethc23.github.io/wiki/Python/Euro_HPC_2013.pdfIan@IanOzsvald.com - EuroPy 2011 High Performance Computing with Python (4 hour

[email protected] - EuroPy 2011

numpy vectorsnumpy vectors

ā€¢ http://numpy.scipy.org/ā€¢ Vectors not brilliantly suited to Mandelbrot

(but we'll ignore that...)ā€¢ numpy is very-parallel for CPUsā€¢ a = numpy.array([1,2,3,4])ā€¢ a *= 3 ->

numpy.array([3,6,9,12])

Page 29: High Performance Computing with Python (4 hour tutorial)sethc23.github.io/wiki/Python/Euro_HPC_2013.pdfIan@IanOzsvald.com - EuroPy 2011 High Performance Computing with Python (4 hour

[email protected] - EuroPy 2011

Vector outline...Vector outline...

# ./numpy_vector/numpy_vector.py

for iteration...

z = z*z + q

done = np.greater(abs(z), 2.0)

q = np.where(done,0+0j, q)

z = np.where(done,0+0j, z)

output = np.where(done,

iteration, output)

Page 30: High Performance Computing with Python (4 hour tutorial)sethc23.github.io/wiki/Python/Euro_HPC_2013.pdfIan@IanOzsvald.com - EuroPy 2011 High Performance Computing with Python (4 hour

[email protected] - EuroPy 2011

Profiling some moreProfiling some more

ā€¢ python numpy_vector.py 1000 1000

ā€¢ kernprof.py -l -v numpy_vector.py 300 100

ā€¢ How could we break out early?ā€¢ How big is 250,000 complex numbers?ā€¢ # .nbytes, .size

Page 31: High Performance Computing with Python (4 hour tutorial)sethc23.github.io/wiki/Python/Euro_HPC_2013.pdfIan@IanOzsvald.com - EuroPy 2011 High Performance Computing with Python (4 hour

[email protected] - EuroPy 2011

Cache sizesCache sizes

ā€¢ Modern CPUs have 2-6MB cachesā€¢ Tuning is hard (and may not be

worthwhile)ā€¢ Heuristic: Either keep it tiny (<64KB) or

worry about really big data sets (>20MB)ā€¢ # numpy_vector_2.py

Page 32: High Performance Computing with Python (4 hour tutorial)sethc23.github.io/wiki/Python/Euro_HPC_2013.pdfIan@IanOzsvald.com - EuroPy 2011 High Performance Computing with Python (4 hour

[email protected] - EuroPy 2011

Speed vs cache size (Core2/i3)Speed vs cache size (Core2/i3)

250k 90k 50k 45k 20k 10k 5k 1k 1000

20

40

60

80

100

120

140

160

180

200

54 5245 45 42 43 45

62

180

Sec

ond

s

Page 33: High Performance Computing with Python (4 hour tutorial)sethc23.github.io/wiki/Python/Euro_HPC_2013.pdfIan@IanOzsvald.com - EuroPy 2011 High Performance Computing with Python (4 hour

[email protected] - EuroPy 2011

NumExprNumExpr

ā€¢ http://code.google.com/p/numexpr/ā€¢ This is magicā€¢ With Intel MKL it goes even fasterā€¢ # ./numpy_vector_numexpr/ā€¢ python numpy_vector_numexpr.py

1000 1000ā€¢ Now convert your numpy_vector.py

Page 34: High Performance Computing with Python (4 hour tutorial)sethc23.github.io/wiki/Python/Euro_HPC_2013.pdfIan@IanOzsvald.com - EuroPy 2011 High Performance Computing with Python (4 hour

[email protected] - EuroPy 2011

numpy and iterationnumpy and iteration

ā€¢ Normally there's no point using numpy if we aren't using vector operations

ā€¢ python numpy_loop.py 1000 1000ā€¢ Is it any faster?ā€¢ Let's run kernprof.py on this and the

earlier pure_python_2.pyā€¢ Any significant differences?

Page 35: High Performance Computing with Python (4 hour tutorial)sethc23.github.io/wiki/Python/Euro_HPC_2013.pdfIan@IanOzsvald.com - EuroPy 2011 High Performance Computing with Python (4 hour

[email protected] - EuroPy 2011

Cython on numpy_loop.pyCython on numpy_loop.py

ā€¢ Can low-level C give us a speed-up over vectorised C?

ā€¢ # ./cython_numpy_loop/ā€¢ http://docs.cython.org/src/tutorial/numpy.htmlā€¢ Your task ā€“ make .pyx, start without types,

make it work from numpy_loop.pyā€¢ Add basic types, use cython -a

Page 36: High Performance Computing with Python (4 hour tutorial)sethc23.github.io/wiki/Python/Euro_HPC_2013.pdfIan@IanOzsvald.com - EuroPy 2011 High Performance Computing with Python (4 hour

[email protected] - EuroPy 2011

multiprocessingmultiprocessing

ā€¢ Using all our CPUs is cool, 4 are common, 8 will be common

ā€¢ Global Interpreter Lock (isn't our enemy)ā€¢ Silo'd processes are easiest to paralleliseā€¢ http://docs.python.org/library/multiprocessing.html

Page 37: High Performance Computing with Python (4 hour tutorial)sethc23.github.io/wiki/Python/Euro_HPC_2013.pdfIan@IanOzsvald.com - EuroPy 2011 High Performance Computing with Python (4 hour

[email protected] - EuroPy 2011

multiprocessing Poolmultiprocessing Pool

ā€¢ # ./multiprocessing/multi.pyā€¢ p = multiprocessing.Pool()ā€¢ po = p.map_async(fn, args)ā€¢ result = po.get() # for all po

objectsā€¢ join the result items to make full result

Page 38: High Performance Computing with Python (4 hour tutorial)sethc23.github.io/wiki/Python/Euro_HPC_2013.pdfIan@IanOzsvald.com - EuroPy 2011 High Performance Computing with Python (4 hour

[email protected] - EuroPy 2011

Making chunks of workMaking chunks of work

ā€¢ Split the work into chunks (follow my code)ā€¢ Splitting by number of CPUs is goodā€¢ Submit the jobs with map_asyncā€¢ Get the results back, join the lists

Page 39: High Performance Computing with Python (4 hour tutorial)sethc23.github.io/wiki/Python/Euro_HPC_2013.pdfIan@IanOzsvald.com - EuroPy 2011 High Performance Computing with Python (4 hour

[email protected] - EuroPy 2011

Code outlineCode outline

ā€¢ Copy my chunk code

output = []

for chunk in chunks:

out = calc...(chunk)

output += out

Page 40: High Performance Computing with Python (4 hour tutorial)sethc23.github.io/wiki/Python/Euro_HPC_2013.pdfIan@IanOzsvald.com - EuroPy 2011 High Performance Computing with Python (4 hour

[email protected] - EuroPy 2011

ParallelPythonParallelPython

ā€¢ Same principle as multiprocessing but allows >1 machine with >1 CPU

ā€¢ http://www.parallelpython.com/ā€¢ Seems to work poorly with lots of data

(e.g. 8MB split into 4 lists...!)ā€¢ We can run it locally, run it locally via

ppserver.py and run it remotely tooā€¢ Can we demo it to another machine?

Page 41: High Performance Computing with Python (4 hour tutorial)sethc23.github.io/wiki/Python/Euro_HPC_2013.pdfIan@IanOzsvald.com - EuroPy 2011 High Performance Computing with Python (4 hour

[email protected] - EuroPy 2011

ParallelPython + binariesParallelPython + binaries

ā€¢ We can ask it to use modules, other functions and our own compiled modules

ā€¢ Works for Cython and ShedSkinā€¢ Modules have to be in PYTHONPATH (or

current directory for ppserver.py)ā€¢ parallelpython_cython_pure_pyth

on

Page 42: High Performance Computing with Python (4 hour tutorial)sethc23.github.io/wiki/Python/Euro_HPC_2013.pdfIan@IanOzsvald.com - EuroPy 2011 High Performance Computing with Python (4 hour

[email protected] - EuroPy 2011

Challenge...Challenge...

ā€¢ Can we send binaries (.so/.pyd) automatically?

ā€¢ It looks like we couldā€¢ We'd then avoid having to deploy to

remote machines ahead of time...ā€¢ Anybody want to help me?

Page 43: High Performance Computing with Python (4 hour tutorial)sethc23.github.io/wiki/Python/Euro_HPC_2013.pdfIan@IanOzsvald.com - EuroPy 2011 High Performance Computing with Python (4 hour

[email protected] - EuroPy 2011

pyCUDApyCUDA

ā€¢ NVIDIA's CUDA -> Python wrapperā€¢ http://mathema.tician.de/software/pycudaā€¢ Can be a pain to install...ā€¢ Has numpy-like interface and two lower

level C interfaces

Page 44: High Performance Computing with Python (4 hour tutorial)sethc23.github.io/wiki/Python/Euro_HPC_2013.pdfIan@IanOzsvald.com - EuroPy 2011 High Performance Computing with Python (4 hour

[email protected] - EuroPy 2011

pyCUDA demospyCUDA demos

ā€¢ # ./pyCUDA/ā€¢ I'm using float32/complex64 as my CUDA

card is too old :-( (Compute 1.3)ā€¢ numpy-like interface is easy but slowā€¢ elementwise requires C thinkingā€¢ sourcemodule gives you complete controlā€¢ Great for prototyping and moving to C

Page 45: High Performance Computing with Python (4 hour tutorial)sethc23.github.io/wiki/Python/Euro_HPC_2013.pdfIan@IanOzsvald.com - EuroPy 2011 High Performance Computing with Python (4 hour

[email protected] - EuroPy 2011

Birds of Feather?Birds of Feather?

ā€¢ numpy is cool but CPU boundā€¢ pyCUDA is cool and is numpy-likeā€¢ Could we monkey patch numpy to auto-

run CUDA(/openCL) if a card is present?ā€¢ Anyone want to chat about this?

Page 46: High Performance Computing with Python (4 hour tutorial)sethc23.github.io/wiki/Python/Euro_HPC_2013.pdfIan@IanOzsvald.com - EuroPy 2011 High Performance Computing with Python (4 hour

[email protected] - EuroPy 2011

Future trendsFuture trends

ā€¢ multi-core is obviousā€¢ CUDA-like systems are inevitableā€¢ write-once, deploy to many targets ā€“ that

would be lovelyā€¢ Cython+ShedSkin could be coolā€¢ Parallel Cython could be coolā€¢ Refactoring with rope is definitely cool

Page 47: High Performance Computing with Python (4 hour tutorial)sethc23.github.io/wiki/Python/Euro_HPC_2013.pdfIan@IanOzsvald.com - EuroPy 2011 High Performance Computing with Python (4 hour

[email protected] - EuroPy 2011

Bits to considerBits to consider

ā€¢ Cython being wired into Python (GSoC)ā€¢ CorePy assembly -> numpy

http://numcorepy.blogspot.com/ā€¢ PyPy advancing nicelyā€¢ GPUs being interwoven with CPUs (APU)ā€¢ numpy+NumExpr->GPU/CPU mix?ā€¢ Learning how to massively parallelise is

the key

Page 48: High Performance Computing with Python (4 hour tutorial)sethc23.github.io/wiki/Python/Euro_HPC_2013.pdfIan@IanOzsvald.com - EuroPy 2011 High Performance Computing with Python (4 hour

[email protected] - EuroPy 2011

FeedbackFeedback

ā€¢ I plan to write this upā€¢ I want feedback (and maybe a testimonial

if you found this helpful?)ā€¢ [email protected]ā€¢ Thank you :-)