pystokes: a case study of accelerating python using...

33
PyStokes: A case study of accelerating Python using Cython Rajesh Singh Jan 31, 2015 1

Upload: truongliem

Post on 30-Apr-2018

222 views

Category:

Documents


4 download

TRANSCRIPT

PyStokes: A case study of accelerating Pythonusing Cython

Rajesh Singh

Jan 31, 2015

1

Contributors

2

Outline

• Stokes law

• Rigid body motion of active colloids

• Python

• Cython

• Python and Cython

• PyStokes

• Benchmarks

3

Outline

• Stokes law

• Rigid body motion of active colloids

• Python

• Cython

• Python and Cython

• PyStokes

• Benchmarks

3

Outline

• Stokes law

• Rigid body motion of active colloids

• Python

• Cython

• Python and Cython

• PyStokes

• Benchmarks

3

Outline

• Stokes law

• Rigid body motion of active colloids

• Python

• Cython

• Python and Cython

• PyStokes

• Benchmarks

3

Outline

• Stokes law

• Rigid body motion of active colloids

• Python

• Cython

• Python and Cython

• PyStokes

• Benchmarks

3

Outline

• Stokes law

• Rigid body motion of active colloids

• Python

• Cython

• Python and Cython

• PyStokes

• Benchmarks

3

Stokes law

Vn =Fn

6πηa

Ωn =Tn

8πηa3

4

Stokes law

Vn =Fn

6πηa

Ωn =Tn

8πηa3

4

Stokes law

Vn =Fn

6πηa

Ωn =Tn

8πηa3

4

Rigid body motion of active colloids

Vn =N∑

m=1

[µTTnm · FB

m + µTRnm · TB

m

]+

∑lσ,m

π(T , lσ)nm · V

(lσ)m ,

Ωn =N∑

m=1

[µRTnm · FB

m + µRRnm · TB

m

]+

∑lσ,m

π(R, lσ)nm · V

(lσ)m .

J. Stat. Mech. (2015) P06017

5

Python

1 Free and open source

2 High level & interpreted

3 Interactive environment

4 Object-oriented

5 Speed

6 Dictionary lookups

7 Function calling overheads

8 GIL - global interpreter lock

6

Python

1 Free and open source

2 High level & interpreted

3 Interactive environment

4 Object-oriented

5 Speed

6 Dictionary lookups

7 Function calling overheads

8 GIL - global interpreter lock

6

Cython

• Attempt to make a superset of python

• High level coolness of python along with the speed of C

• Compiled

• Cdef variables, attributes, functions

• Supports parallelism (openMP) by opening GIL

• Memoryviews allow efficient access to memory buffers likenumpy arrays without python overhead

• Steps involves in building a cython code

1 A .pyx file is compiled by cython to .c2 The .c file is then compiled by the C compiler3 Building can be done in a single step using a setup.py

7

Cython

• Attempt to make a superset of python

• High level coolness of python along with the speed of C

• Compiled

• Cdef variables, attributes, functions

• Supports parallelism (openMP) by opening GIL

• Memoryviews allow efficient access to memory buffers likenumpy arrays without python overhead

• Steps involves in building a cython code

1 A .pyx file is compiled by cython to .c2 The .c file is then compiled by the C compiler3 Building can be done in a single step using a setup.py

7

Cython

• Attempt to make a superset of python

• High level coolness of python along with the speed of C

• Compiled

• Cdef variables, attributes, functions

• Supports parallelism (openMP) by opening GIL

• Memoryviews allow efficient access to memory buffers likenumpy arrays without python overhead

• Steps involves in building a cython code

1 A .pyx file is compiled by cython to .c2 The .c file is then compiled by the C compiler3 Building can be done in a single step using a setup.py

7

Cython

• Attempt to make a superset of python

• High level coolness of python along with the speed of C

• Compiled

• Cdef variables, attributes, functions

• Supports parallelism (openMP) by opening GIL

• Memoryviews allow efficient access to memory buffers likenumpy arrays without python overhead

• Steps involves in building a cython code

1 A .pyx file is compiled by cython to .c2 The .c file is then compiled by the C compiler3 Building can be done in a single step using a setup.py

7

Cython

• Attempt to make a superset of python

• High level coolness of python along with the speed of C

• Compiled

• Cdef variables, attributes, functions

• Supports parallelism (openMP) by opening GIL

• Memoryviews allow efficient access to memory buffers likenumpy arrays without python overhead

• Steps involves in building a cython code

1 A .pyx file is compiled by cython to .c2 The .c file is then compiled by the C compiler3 Building can be done in a single step using a setup.py

7

Cython

• Attempt to make a superset of python

• High level coolness of python along with the speed of C

• Compiled

• Cdef variables, attributes, functions

• Supports parallelism (openMP) by opening GIL

• Memoryviews allow efficient access to memory buffers likenumpy arrays without python overhead

• Steps involves in building a cython code

1 A .pyx file is compiled by cython to .c2 The .c file is then compiled by the C compiler3 Building can be done in a single step using a setup.py

7

Cython

• Attempt to make a superset of python

• High level coolness of python along with the speed of C

• Compiled

• Cdef variables, attributes, functions

• Supports parallelism (openMP) by opening GIL

• Memoryviews allow efficient access to memory buffers likenumpy arrays without python overhead

• Steps involves in building a cython code

1 A .pyx file is compiled by cython to .c

2 The .c file is then compiled by the C compiler3 Building can be done in a single step using a setup.py

7

Cython

• Attempt to make a superset of python

• High level coolness of python along with the speed of C

• Compiled

• Cdef variables, attributes, functions

• Supports parallelism (openMP) by opening GIL

• Memoryviews allow efficient access to memory buffers likenumpy arrays without python overhead

• Steps involves in building a cython code

1 A .pyx file is compiled by cython to .c2 The .c file is then compiled by the C compiler

3 Building can be done in a single step using a setup.py

7

Cython

• Attempt to make a superset of python

• High level coolness of python along with the speed of C

• Compiled

• Cdef variables, attributes, functions

• Supports parallelism (openMP) by opening GIL

• Memoryviews allow efficient access to memory buffers likenumpy arrays without python overhead

• Steps involves in building a cython code

1 A .pyx file is compiled by cython to .c2 The .c file is then compiled by the C compiler3 Building can be done in a single step using a setup.py

7

Python and Cython

8

Python and Cython

9

Python and Cython

• Cython is three orders of magnitude FASTER than Python!

• The Cython code is as fast as C code!!

10

Python and Cython

• Cython is three orders of magnitude FASTER than Python!

• The Cython code is as fast as C code!!

10

Python and Cython

• Cython is three orders of magnitude FASTER than Python!

• The Cython code is as fast as C code!!

10

PyStokes: README.md

• Cython library for computing Stokes flows produced by spheres

• The library computes flow and RBM

• Geometries supported• unbounded• wall-bounded• periodic

• Free and open source

• Planned developments• linear solvers• fast multipole accelerations• data layout

• References• R. Singh, A. Laskar, and R. Adhikari. PyStokes: Hampi, Nov 2014.• S. Ghose and R. Adhikari. Phys. Rev. Lett., 112(11):118102, 2014.• R. Singh, S. Ghose and R. Adhikari. J. Stat. Mech, P06017, 2015

11

PyStokes: Usage

import p y s t o k e s , p y f o r c e simport numpy as np

a , Np = 1 , 100L , dim = 128 , 3dt , T = 0 . 0 1 , 100v = np . z e r o s ( dim∗Np ) ; r = v ; F = v ;

pRbm = p y s t o k e s . p e r i o d i c . Rbm( a , Np , L )f f = p y f o r c e s . f o r c e F i e l d s . F o r c e s (Np)

f o r t t i n range (T ) :f f . s e d i m e n t a t i o n (F , g=−10)pRbm . s t o k e s l e t V ( v , r , F )r = ( r + ( F / ( 0 . 7 5∗ a ) + v )∗ dt)%L

12

PyStokes: Example

Figure: Crowley instability in a sedimenting lattice.

13

PyStokes: Benchmarks

Figure: Propulsion matrix calculation using the PyStokes library

Present implementation scales

• linearly with # cores• quadratic with # particles

14

Summary

• Free and open source library

• Efficient and fast evaluation of Stokes flow

• A python front end for the user

• Present implementation scales• linearly with # cores• quadratic with # particles

15