numerical estimation of the area of the mandelbrot set · numerical estimation of the area of the...

17
Numerical estimation of the area of the Mandelbrot set Thorsten Förstemann October 18, 2012 1I NTRODUCTION 1.1 P ROBLEM STATEMENT An analytical expression of the area of the Mandelbrot set 1 is unknown. There are numerical estimations of the area 2 e.g. by Munfao (2012) 3 . The aim of this approach is to improve the accuracy of the numerical area estimation. At the moment the most accurate estimation by Munfao uses a Monte Carlo or pixel counting integration method 4 . To increase the precision of the area estimate by a factor of 10 (i.e. one digit) the number of counted pixels must be increased by a factor of 100 = 10 2 . CPU time grows approximately proportional to the number of counted pixels. Thus, Monte Carlo estimations become inefficient at higher precisions. 1.2 S OLUTION STATEMENT Monte Carlo implementations are quite trivial. They can easily be parallelized and implemented using OpenCL 5 . Efficient OpenCL/GPU implementations can run up to 100 times faster on recent hardware compared to GPU implementations. A speedup by 100 yields one more digit precision at a given CPU time, as mentioned earlier. 1 http://en.wikipedia.org/wiki/Mandelbrot_set 2 http://oeis.org/A098403 3 http://www.mrob.com/pub/muency/pixelcounting.html 4 http://en.wikipedia.org/wiki/Monte_Carlo_method 5 http://en.wikipedia.org/wiki/OpenCL 1

Upload: vunga

Post on 09-Apr-2018

215 views

Category:

Documents


1 download

TRANSCRIPT

Numerical estimation of the area of theMandelbrot set

Thorsten Förstemann

October 18, 2012

1 INTRODUCTION

1.1 PROBLEM STATEMENT

An analytical expression of the area of the Mandelbrot set1 is unknown. There are numericalestimations of the area2 e.g. by Munfao (2012)3. The aim of this approach is to improve theaccuracy of the numerical area estimation.

At the moment the most accurate estimation by Munfao uses a Monte Carlo or pixel countingintegration method4. To increase the precision of the area estimate by a factor of 10 (i.e. onedigit) the number of counted pixels must be increased by a factor of 100 = 102. CPU time growsapproximately proportional to the number of counted pixels. Thus, Monte Carlo estimationsbecome inefficient at higher precisions.

1.2 SOLUTION STATEMENT

Monte Carlo implementations are quite trivial. They can easily be parallelized and implementedusing OpenCL5. Efficient OpenCL/GPU implementations can run up to 100 times faster onrecent hardware compared to GPU implementations. A speedup by 100 yields one more digitprecision at a given CPU time, as mentioned earlier.

1http://en.wikipedia.org/wiki/Mandelbrot_set2http://oeis.org/A0984033 http://www.mrob.com/pub/muency/pixelcounting.html4http://en.wikipedia.org/wiki/Monte_Carlo_method5http://en.wikipedia.org/wiki/OpenCL

1

Table 2.1: Comparison of current Mandelbrot area estimates

Estimate & error Details

Förstemann (2012)1.506 591 884 9 87 960 930 222 520 calculated pixels0.000 000 002 8 3 028 136 sec. (35 days) CPU/GPU time

29 047 880 pixel/secMunafo (2012)1.506 591 856 16 77 721 600 000 calculated pixels0.000 000 025 6 691 255 sec. (8.00 days) CPU time

2 427 066 pixel/secFörstemann (2011)1.506 591 881 35 046 933 135 360 calculated pixels0.000 000 006 5 8 640 000 sec. (100 days) CPU/GPU time

4 056 358 pixel/sec

Table 2.2: Parameters of grids calculated so far

grid size area estimate error CPU [sec.] av.pixel/s iter.depth

16 384 1.506 590 913 0 0.000 002 171 7 955 5 621 685 67 108 86432 768 1.506 591 383 0 0.000 000 678 8 2 539 8 457 990 134 217 72865 536 1.506 592 037 0 0.000 000 251 4 8 952 9 595 548 268 435 456

131 072 1.506 591 911 7 0.000 000 135 8 23 979 14 329 096 536 870 912262 144 1.506 591 918 4 0.000 000 040 0 58 394 23 536 486 1 073 741 824524 288 1.506 591 883 3 0.000 000 021 7 194 027 28 333 985 2 147 483 648

1 048 576 1.506 591 883 1 0.000 000 007 3 745 509 29 496 938 4 294 967 2962 097 152 1.506 591 884 9 0.000 000 002 8 3 028 136 29 047 880 8 589 934 592

An efficient OpenGL implementation is not trivial due to the chaotic behavior of the Man-delbrot set. The actual OpenCL/GPU implementation made here is about 10 times faster thanMunafo’s CPU based approach in 2012. The GPU-code runs in a Mathematica 8 environment.

2 CURRENT ESTIMATE

The current estimates can be read off table 2.1. The estimate is based on the results listed intable 2.2 (refer to the last row). Table 2.2 is inspired from a similar table by Munafo6.

The Monte Carlo method used here is based on quadratic grids covering the hole Mandelbrotset. The Method is described in more detail in the next section.

20 grids are calculated at each grid size. Details of the different grid sizes can be read off table2:

6http://www.mrob.com/pub/muency/pixelcounting.html

2

105 107 109 1011 101310 9

10 7

10 5

0.001

number of calculated pixels

mea

n er

ror

Förstemann (2012)Munafo (2012)

Figure 2.1: A comparison of the mean error versus number of calculated pixels between this andMunafo’s approach. The dashed lines correspond to regressions y ∼ x−0.68.

Grid size The grid size is the number of grid points in both dimensions (quadratic grids).

Area estimate The area estimate and the 95% confidence interval are calculated using the 20estimates each based on one grid. The grids differ slightly in grid width and offset. Thegrid widths and offsets are chosen randomly.

Error The 95% confidence interval of the mean:p

1.96σ (20−1) with standard deviation σ

CPU time The total GPU/CPU time in seconds needed to calculate the 20 grids

average pixel rate The number of calculated pixels per second on average

iteration depth The lower bound of the maximum iteration depth

A comparison of the mean error versus number of calculated pixels between this and Munafo’sapproach is given in figure 2.1. The numbers are taken from table 2.2 and the correspondingtable by Munafo mentioned above.

The two main results are:

1. One additional digit of precision is gained by this approach incomparison to Munafo’s approach (refer to table 2.1)

2. The actual pixel rate of this approach is about 10x faster thanMunafo’s approach (refer to table 2.1)

3

(-2.05, -1.3)

2.626

2.62

6

Figure 3.1: The grid for random1 = random2 = random3 = 0 is marked gray. Typical values of thegrid width are too small to identify individual grid points.

3 METHOD’S DETAILS

In this section the used method based on Monte Carlo simulations is illustrated. The implemen-tation with OpenCL and Mathematica is shown in the following sections.

3.1 GRID COORDINATES AND DIMENSIONS

Complex numbers z in the complex pane are parametrized by two real coordinates x and y viaz = x + i y . All calculations are done in double precision.

The Mandelbrot set is sampled by slightly varying grids. The variations are for each gridrandomly chosen. The origins of the grids are given by(

ox

oy

)=

( −2.05−1.3

)+w

(random1

random2

)where random1 and random2 are random numbers between 0 and 1. The grid width is defined

by

w = 2.61.01+0.01 random3

Nwhere N is the grid size. The random numbers random1, random2 and random3 are arbitrary

but constant for each grid. The coordinates of the (i , j )-th grid point are(xi j

yi j

)=

(ox

oy

)+w

(ij

)where i and j go from 0 to N −1. Used gird sizes can be read of table 2.2. A typical grid is

shown in figure 3.1.

4

start

length = 0depth = 0

length > 500.000?

yes no

add list

shiftlist

(depth)

length > 2.000?

yes nodepth = 0

depth ++ 1

Figure 3.2: Pipeline handling

3.2 GPU-GRIDS AND PARALLELIZATION

Very large grids can not be efficiently handled by the GPU due to memory and timing issues. Inthis approach the GPU-grid size is set to n = 1024. Thus, the original grid has to be partitioned inm ×m GPU-girds, where N = 1024m.

The calculations of the individual m ×m GPU-grids can be parallelized. In Mathematica fourworker kernels process the list of the GPU-girds in parallel. Each worker kernel is bound to anindividual GPU. There are two graphic cards with two GPUs each.

3.3 PERFORMACE CONSIDERATIONS

The GPU-gird size is n = 1024. Thus, about 1 million pixels are simultanely loaded into eachGPU. By starting iterating on these points more and more points will diverge. Each GPU contains1600 stream processors. After a certain number of iterations the number of remaining pixels willbe smaller than 1600. Thus, the GPU load will decrease rapidly. It is quite sophisticated to keepthe pipeline full, especially when dealing with very high iteration depths about 1010 and pixelrates about 30 million pixels per second.

5

3.4 MANAGING THE PIXEL PIPELINE

The pipeline is a list of pixel coordinates with accompanying iteration count and iterationvariables (x and y). Each GPU has it’s own pipeline. So, there are four pipelines.

In figure 3.2 the applied algorithm is outlined. At the beginning the pipeline is empty (i.e.length = 0) and the parameter depth is set to zero. The addList procedure appends a list of preselected points of a GPU-grid to the pipeline. The details of the addList procedure are outlinedin the next section. The procedure addList is applied until the length of the pipeline is greaterthan 500.000 points.

The shiftList procedure tries to apply 2depth 10000 iterations on each pixel in the pipeline. Thedetails of the shiftList procedure are outlined in the section after next. Pixels belonging to theMandelbrot set and diverging pixels are removed from the pipeline. After appling shiftList theparameter depth is increased by one. Thus, more and more iterations are applied to less and lesspixels. This balances CPU and GPU load and helps to detect orbits with very long periods. Whenthe length of the pipeline is smaller than 2000 pixels depth is set to zero and again addList itsapplied to the pipeline.

Very high iteration depths can be reached by recycling the less than 2000 non member andnon diverging pixels in the next cycle of the algorithm.

3.5 THE ADDLIST PROCEDURE

The pseudocode of the addList procedure reads as follows:

1 create_GPU_grid ( )2 i t e r (990)3 t e s t_ c i r c l e_ c a r d i o i d ( )4 f o r ( n=1;n<4)5 i t e r (2^n ∗ 990)6 o r b i t (2^n ∗ 1000)

The specific definitions of the sub procedures are:

create_GPU_grid(): This sub routine creates an GPU grid with 10242 ≈ 1million pixels followingthe procedure described in section 3.1.

iter (n): This sub routine applies n Mandelbrot iterations to each point of the GPU-grid.

test_circle_cardioid (n): This sub routine tests any point whether it is part of the Mandelbrot set’scardioid or circle.

orbit (n): This sub routine applies n Mandelbrot iterations to each point of the GPU-grid andchecks simultaneously for iteration orbits.

Typically the about 1 million pixels divide into the following parts: 20.9% belong to theMandelbrot set, 77.6% diverge and 1.5% remain unknown. The on average 15000 unknownpixels are appended to the pipeline. 990+2∗ (990+1000)+4∗ (990+1000)+8∗ (990+1000) =990+14∗1990 = 28850 iterations are applied to unknown pixels. This procedure needs about 10milliseconds GPU and about 15 milliseconds CPU time on average.

6

Table 3.1: Numbers of remaining pixels for different grids after post processing the pipeline andcorresponding errors of the estimated area

grid size area estimate error (mean) iter.depth remain error (remain)

16 384 1.506 590 913 0 0.000 002 171 7 67 108 864 520 0.000 000 675 532 768 1.506 591 383 0 0.000 000 678 8 134 217 728 1 000 0.000 000 324 865 536 1.506 592 037 0 0.000 000 251 4 268 435 456 1 889 0.000 000 153 1

131 072 1.506 591 911 7 0.000 000 135 8 536 870 912 3 571 0.000 000 072 3262 144 1.506 591 918 4 0.000 000 040 0 1 073 741 824 5 711 0.000 000 029 0524 288 1.506 591 883 3 0.000 000 021 7 2 147 483 648 8 091 0.000 000 010 2

1 048 576 1.506 591 883 1 0.000 000 007 3 4 294 967 296 8 261 0.000 000 002 62 097 152 1.506 591 884 9 0.000 000 002 8 8 589 934 592 9 922 0.000 000 000 8

3.6 THE SHIFTLIST PROCEDURE

The pseudocode of the shiftList procedure reads as follows:

1 f o r ( n=0;n<depth+1)2 i t e r (2^n ∗ 9900)3 o r b i t (2^n ∗ 10000)

The specific definitions of the sub procedures are the same as for the addList procedure.Following figure 3.2 the shiftList procedure is applied repeatedly to the pipeline. Each time

the parameter depth is increased by one. Depth reaches up to 16. This procedure needs about100 milliseconds GPU and about 10 milliseconds CPU time on average.

3.7 POST PROCESSING OF THE PIPELINE

As mentioned in section 3.2 there are m ×m GPU-grids. At a certain time all GPU-grids aredepleted and integrated into the four pipelines. Then the algorithm mentioned in section 3.4ends with 4×2000 remaining pixels. A usual CPU based iteration algorithm with orbit detectionis applied to these remaining 8000 pixels. The chosen maximum iterations are listed in table 2.2and 3.1. The maximum iterations are chosen to balance GPU and CPU load and, last but notleast, to keep the estimation error of the area due to the remaining pixels smaller than the errordue to the sampling of the 20 grids.

The numbers of remaining pixels and the corresponding errors of the estimated area are listedin table 3.1.

4 IMPLEMENTATION

4.1 HARDWARE CONFIGURATION

A typical PC (refer to figure 4.1 is used. Specific details of the hardware configuration are:

• CPU: Intel Core i7 2600K

7

Figure 4.1: Left side: the computer with two graphic cards. Right side: Mathematica’s systeminformation

• GPU: 2x Radeon HD 5970. Thus, there are 4 GPUs with 1600 stream processors each (referto figure 3).

• Power consumption under load: approx. 350 W

• Dummy VGA dongle (refer to e.g. http://www.vt-tech.info/?p=8379)

4.2 SOFTWARE CONFIGURATION

Specific details of the software configuration are:

• Mathematica 8.0.4.0, Windows 7

• ATI driver Catalyst 11.2 with AMD Stream SDK 2.3 (refer to http://forums.wolfram.com/mathgroup/archive/2011/Jun/msg00615.html)

• Adjustment of TDR timeout parameter in Windows 7’s registry (refer to e.g. http://www.cmsoft.com.br/index.php?option=com_content&view=category&layout=blog&id=58&Itemid=95)

8

• Installation of a C-compiler for Mathematica (here: Visual Studio 2011 following Mathe-matica’s manual)

4.3 SOUCRE CODE

4.3.1 INITIALISATION

This Mathematica code initializes the calculation and creates the working files.

1 path = NotebookD i r e c to r y [ ] ;2 S e tD i r e c t o r y [ path ] ;3 k e r n e l S t r i n g = " k e r n e l " <> ToSt r ing [ $Kerne l ID ] ;45 (∗ Parameter s ∗)67 s i z eSamp l e = 20 ; (∗ number o f r a s t e r s to be c a l c u l a t e d ∗)89 sizeCPU = 2^4; (∗ r a s t e r = sizeCPU x sizeCPU t i l e s hand led by CPU ∗)

1011 sizeGPU = 2^11;(∗ t i l e = sizeGPU x s i z e GPU p i x e l s ∗)1213 (∗ I n i t data ∗)14 tmp = Table [ {15 " in_" <>16 ToSt r ing [17 AccountingForm [ n , {3 , 0} , NumberPadding −> {"0" , "" } ,18 NumberPoint −> "" ] ] <> " . t x t " ,19 {{{} , {} , {}} , {{RandomReal [ ] , RandomReal [ ] ,20 RandomReal [ ] } , { sizeCPU , sizeGPU }} , {0 , 0} , 0 , 0}21 } , {n , 1 , s i z eSamp l e } ] ;2223 (∗ Crea te i n i t f i l e s ∗)24 Map [ Put [#[ [ 2 ] ] , # [ [ 1 ] ] ] &, tmp ] ;

4.3.2 GPU-CODE FOR THE ADDLIST PROCEDURE

This is the GPU code of the addList procedure. The code is stored as a simple string.

1 sourceBi tmap="2 #i f d e f USING_DOUBLE_PRECISIONQ3 #pragma OPENCL EXTENSION cl_amd_fp64 : enab l e4 #end i f /∗ USING_DOUBLE_PRECISIONQ ∗/5 __kernel v o i d mande lb ro t_kerne l (6 __global mint ∗ se t , /∗ housekeep ing : data p o i n t e r ∗/7 doub l e o f f x , /∗ r a s t e r ’ s x o f f s e t ∗/8 doub l e o f f y , /∗ r a s t e r ’ s y o f f s e t ∗/9 doub l e s tep , /∗ r a s t e r ’ s g r i d width ∗/

10 mint s i z e ) /∗ housekeep ing : data s i z e ∗/11 {12 i n t x I ndex = get_g loba l_ id (0 ) ;13 i n t y I ndex = get_g loba l_ id (1 ) ;14 i n t i t e r ;15 i n t i n i t M a x i t e r = 990 ; /∗ i n i t i a l v a l u e o f warm up maximum i t e r a t i o n

depth ( emp i r i c ) ∗/

9

16 i n t o r b i tMa x i t e r = 1000 ; /∗ i n i t i a l v a l u e maximum i t e r a t i o n depth wi tho r b i t t e s t ( emp i r i c ) ∗/

17 i n t maxDoubling = 3 ; /∗ number o f maximum i t e r a t i o n ’ s d oub l i n g s ( emp i r i c) ∗/

18 i n t doub l i n g ;19 doub l e x0 = o f f x + s t ep ∗ x Index ;20 doub l e y0 = o f f y + s t ep ∗ y Index ;21 doub l e tmp , q , x = 0 . 0 , y = 0 . 0 ;22 doub l e ox , oy ;23 boo l card , c i r c ;24 boo l member = f a l s e ;25 boo l r e f u g e = f a l s e ;2627 /∗ housekeep ing : check i ndex ∗/28 i f ( x I ndex < s i z e && y Index < s i z e ) {29 /∗ warm up i t e r a t i o n s ∗/30 f o r ( i t e r = 0 ; ( x∗x+y∗y <= 4 . 0 ) && ( i t e r < i n i t M a x i t e r ) ; i t e r ++) {31 tmp = x∗x − y∗y + x0 ;32 y = 2.0∗ x∗y + y0 ;33 x = tmp ;34 }35 r e f u g e = ( x∗x+y∗y > 4) ; /∗ Escape ? ∗/36 /∗ C i r c l e and c a r d i o i d e t e s t ∗/37 i f ( ! r e f u g e ) {38 q = pow ( ( x0 − 0 . 25 ) , 2) + y0 ∗ y0 ;39 ca rd = q ∗ ( q + ( x0 − 0 . 25 ) ) < 0 .25 ∗ y0 ∗ y0 ;40 c i r c = pow ( ( x0 + 1 . 0 ) , 2) + y0 ∗ y0 < 0 . 0625 ;41 member = ( ca rd | | c i r c ) ;42 }43 /∗ Max i te r doub l i n g l oop ∗/44 f o r ( doub l i n g = 1 ; ! r e f u g e && ! member && doub l i ng<maxDoubling ; doub l i n g++)

{45 /∗ normal i t e r a t i o n , max i t e r : 2^ doub l i n g ∗ i n i t M a x i t e r ∗/46 f o r ( i t e r = i t e r ;47 ( x∗x+y∗y <= 4 . 0 ) && ( i t e r < pow (2 . 0 f , d oub l i n g ) ∗ i n i t M a x i t e r ) ;48 i t e r ++) {49 tmp = x∗x − y∗y + x0 ;50 y = 2.0∗ x∗y + y0 ;51 x = tmp ;52 }53 r e f u g e = ( x∗x+y∗y > 4) ; /∗ Escape ? ∗/54 /∗ i t e r a t i o n wi th o r b i t t e s t , max i t e r : 2^ doub l i n g ∗ o r b i tMa x i t e r ∗/55 i f ( ! r e f u g e ) {56 ox = x ; /∗ s e t r e a l p a r t f o r o r b i t t e s t ∗/57 oy = y ; /∗ s e t imag i na r y pa r t f o r o r b i t t e s t ∗/58 tmp = x∗x − y∗y + x0 ;59 y = 2.0∗ x∗y + y0 ;60 x = tmp ;61 f o r ( i t e r = i t e r ;62 ( x∗x+y∗y <= 4 . 0 ) && ( i t e r < pow (2 . 0 f , d oub l i n g ) ∗ o r b i tMa x i t e r ) &&63 ( ( ox != x ) | | ( oy != y ) ) ; /∗ non o r b i t t e s t ∗/64 i t e r ++) {65 tmp = x∗x − y∗y + x0 ;66 y = 2.0∗ x∗y + y0 ;67 x = tmp ;

10

68 }69 r e f u g e = ( x∗x+y∗y > 4) ; /∗ Escape ? ∗/70 member = ( ( ox == x ) && ( oy == y ) ) ; /∗ Orb i t ? ∗/71 }72 }73 i f ( ! r e f u g e && ! member ) {74 s e t [ ( y I ndex + x Index ∗ s i z e ) ] = 255 ; /∗ mark po i n t as unknown ∗/75 }76 i f (member ) {77 s e t [ ( y I ndex + x Index ∗ s i z e ) ] = 128 ; /∗ mark po i n t as member ∗/78 }79 i f ( r e f u g e ) {80 s e t [ ( y I ndex + x Index ∗ s i z e ) ] = 0 ; /∗ mark po i n t as r e f u g e e ∗/81 }82 }}" ;

4.3.3 GPU-CODE FOR THE SHIFTLIST PROCEDURE

This is the GPU code of the shiftList procedure. The code is again stored as a simple string.

1 s o u r c e L i s t="2 #i f d e f USING_DOUBLE_PRECISIONQ3 #pragma OPENCL EXTENSION cl_amd_fp64 : enab l e4 #end i f /∗ USING_DOUBLE_PRECISIONQ ∗/5 __kernel v o i d mande lb ro t_kerne l (6 __global doub l e ∗ se t , /∗ housekeep ing : data p o i n t e r ∗/7 mint i n i tMa x i t e r , /∗ max i t e r w i thout o r b i t t e s t ∗/8 mint o r b i tMax i t e r , /∗ max i t e r w i th o r b i t t e s t ∗/9 mint s i z e ) /∗ housekeep ing : data s i z e ∗/

10 {11 i n t x I ndex = get_g loba l_ id (0 ) ;12 i n t i t e r ;13 doub l e tmp ;14 doub l e cx , cy , x , y , ox , oy ;15 boo l r e f u g e = f a l s e ;16 boo l member = f a l s e ;1718 /∗ housekeep ing : check i ndex ∗/19 i f ( x I ndex < s i z e ) {20 cx = s e t [ 4∗ ( x I ndex ) ] ; /∗ c o l l e c t i n g r e a l p a r t o f p o i n t c o o r d i n a t e s ∗/21 cy = s e t [ 4∗ ( x I ndex ) +1] ; /∗ c o l l e c t i n g imag i na r y pa r t o f p o i n t c o o r d i n a t e s

∗/22 x = s e t [ 4∗ ( x I ndex ) +2] ; /∗ c o l l e c t i n g r e a l p a r t o f a c t u a l i t e r a t i o n v a l u e

∗/23 y = s e t [ 4∗ ( x I ndex ) +3] ; /∗ c o l l e c t i n g imag i na r y pa r t o f a c t u a l i t e r a t i o n

v a l u e ∗/24 /∗ normal i t e r a t i o n , max i t e r : i n i t M a x i t e r ∗/25 f o r ( i t e r = 0 ;26 ( x∗x+y∗y <= 4 . 0 ) && ( i t e r < i n i t M a x i t e r ) ;27 i t e r ++) {28 tmp = x∗x − y∗y + cx ;29 y = 2.0∗ x∗y + cy ;30 x = tmp ; }31 r e f u g e = ( x∗x+y∗y > 4 . 0 ) ; /∗ Escape ? ∗/

11

32 /∗ i t e r a t i o n wi th o r b i t t e s t , max i t e r : o r b i tMa x i t e r ∗/33 i f ( ! r e f u g e ) {34 ox = x ; /∗ s e t r e a l p a r t f o r o r b i t t e s t ∗/35 oy = y ; /∗ s e t imag i na r y pa r t f o r o r b i t t e s t ∗/36 tmp = x∗x − y∗y + cx ;37 y = 2.0∗ x∗y + cy ;38 x = tmp ;39 f o r ( i t e r = i t e r ;40 ( x∗x+y∗y <= 4 . 0 ) && ( i t e r < o r b i tMa x i t e r ) &&41 ( ( ox != x ) | | ( oy != y ) ) ; /∗ non o r b i t t e s t ∗/42 i t e r ++) {43 tmp = x∗x − y∗y + cx ;44 y = 2.0∗ x∗y + cy ;45 x = tmp ; }46 r e f u g e = ( x∗x+y∗y > 4) ; /∗ Escape ? ∗/47 member = ( ( ox == x ) && ( oy == y ) ) ; /∗ Orb i t ? ∗/}48 i f ( r e f u g e ) {49 /∗ mark po i n t as r e f u g e e ∗/50 s e t [ 4∗ ( x I ndex ) ] = 0 . 0 ;51 s e t [ 4∗ ( x I ndex )+1] = 0 . 0 ;52 s e t [ 4∗ ( x I ndex )+2] = 0 . 0 ;53 s e t [ 4∗ ( x I ndex )+3] = 0 . 0 ; }54 i f (member ) {55 /∗ mark po i n t as member ∗/56 s e t [ 4∗ ( x I ndex ) ] = 1 . 0 ;57 s e t [ 4∗ ( x I ndex )+1] = 1 . 0 ;58 s e t [ 4∗ ( x I ndex )+2] = 1 . 0 ;59 s e t [ 4∗ ( x I ndex )+3] = 1 . 0 ; }60 i f ( ! r e f u g e && ! member ) {61 /∗ mark po i n t as unknown ∗/62 s e t [ 4∗ ( x I ndex ) ] = cx ;63 s e t [ 4∗ ( x I ndex )+1] = cy ;64 s e t [ 4∗ ( x I ndex )+2] = x ;65 s e t [ 4∗ ( x I ndex )+3] = y ; }66 }}" ;

4.3.4 MATHEMATICA FUNCTIONS FOR THE GPU PART

This Mathematica code compiles and loads the GPU functions defined above. The Mathematicafunctions addList and shiftList are defined here.

1 (∗ wake up d e v i c e ∗)2 mem=OpenCLMemoryAllocate [ I n t e g e r ,{1024 ,1024} , " P la t fo rm "−>1," Dev ice "−>$Kerne l ID

] ;3 OpenCLMemoryUnload [mem ] ;45 (∗ Load GPU f u n c t i o n s ∗)6 c lB i tmap=OpenCLFunctionLoad [7 sourceBitmap , " mande lb ro t_kerne l " ,8 {{ _Intege r } , "Double " , "Double " , "Double " , _In tege r } ,9 {16 ,16} ,

10 " She l lOu tpu tFunc t i on "−>Pr in t , " Shel lCommandFunct ion "−>Pr i n t11 ] ;12 c l L i s t=OpenCLFunctionLoad [

12

13 s o u r c e L i s t , " mande lb ro t_kerne l " ,14 {{"Double " } , _Integer , _Integer , _ In tege r } ,15 64 ,16 " She l lOu tpu tFunc t i on "−>Pr in t , " Shel lCommandFunct ion "−>Pr i n t ] ;1718 (∗ He lpe r f u n c t i o n f o r addL i s t , from po i n t i nd ex to po i n t c o o r d i n a t e s ∗)19 gpuRaste r [ {{ rndX_ , rndY_ , rndW_} ,{ cpuSize_ , gpuSize_ } , idx_ }] := Block [20 { cpuOffs , cpuWidth , gpuOffsX , gpuOffsY , gpuWidth } ,21 I f [ idx <0| | idx>cpuS i z e ^2 , Abort [ ] ] ;22 cpuOf f s ={−2.05 ,−1.3}; (∗ l e f t bottom ∗)23 cpuWidth =2.6∗(1.01+0.01∗ rndW)/ cpuS i z e ; (∗ r a s t e r s i z e w i th random v a r i a t i o n

∗)24 gpuWidth=cpuWidth/ gpuS i ze ;25 {gpuOffsX , gpuOffsY}=cpuOffs −{rndX , rndY}∗gpuWidth+{Quot i en t [ idx , cpuS i z e ] ,Mod [

idx , cpuS i z e ]}∗ cpuWidth ;26 {gpuOffsX , gpuOffsY , gpuWidth , gpuS i ze }27 ] ;2829 (∗ f i l l up work l i s t ∗)30 addL i s t [ {31 { f u l l L i s t_ , doneList_ , workList_ } ,32 {{rndX_ , rndY_ , rndW_} ,{ cpuSize_ , gpuSize_ }} ,33 time_ , kerne l_ , depth_ }] := Block [34 { idx , ox , oy , s tep , s i z e ,mem, re s , out , img , erg , l s t , t imer , t ime r s } ,35 t ime r s ={0 ,0}; t ime r=AbsoluteTime [ ] ;36 i d x=Length [ d on eL i s t ]+Length [ f u l l L i s t ] ;37 I f [ idx>cpuS i z e ^2/4−1 ,38 Return [39 {{ doneL i s t , workL i s t , t im e r s } ,40 {{rndX , rndY , rndW} ,{ cpuS ize , gpuS i ze }} ,41 t ime+t ime r s , k e r n e l } ] ] ;42 {ox , oy , s tep , s i z e }=gpuRaste r [ {{ rndX , rndY , rndW} ,{ cpuS ize , gpuS i ze } ,4∗ i d x+(

k e r n e l −1) } ] ;43 t ime r s=t ime r s+{Abso luteTime [ ] − t imer , 0 } ; t ime r=AbsoluteTime [ ] ;44 mem=OpenCLMemoryAllocate [ I n t e g e r , { s i z e , s i z e } ] ;45 r e s=c lB i tmap [mem, ox , oy , s tep , s i z e ] ;46 out=OpenCLMemoryGet [ F i r s t [ r e s ] ] ; OpenCLMemoryUnload [mem ] ;47 t ime r s=t ime r s +{0, Abso luteTime [ ] − t ime r } ; t ime r=AbsoluteTime [ ] ;48 img=Image [ out , "Byte " ] ;49 e rg=ImageLeve l s [ img , 3 ] [ [ { 1 , 2 } , 2 ] ] ;50 l s t=ImageData [ B i n a r i z e [ img , 0 . 9 ] ] ;51 l s t=Ar r ayRu l e s [ Spa r s eAr r ay [ l s t ] ] [ [ ; ; − 2 , 1 ] ] ;52 l s t=Map[{4∗ i d x+( k e r n e l −1) ,{ ox , oy}+(#−1)∗ s tep , {0 . 0 , 0 . 0} , 0}& , l s t ] ;53 t ime r s=t ime r s+{Abso luteTime [ ] − t imer , 0 } ; t ime r=AbsoluteTime [ ] ;54 {{ f u l l L i s t , Append [ doneL i s t , {4∗ i d x+( k e r n e l −1) , e rg } ] , Jo i n [ workL i s t , l s t ] } ,{{

rndX , rndY , rndW} ,{ cpuS ize , gpuS i ze }} , t ime+t ime r s , k e r n e l , depth }55 ] ;5657 (∗ He lpe r f u n c t i o n f o r s h i f t L i s t , e x t r a c t p o s i t i o n s ∗)58 p o s i t i o nH e l p e r [ { doneList_ , workList_ } , bothList_ , pattern_ ] := Block [59 { e x t r a c t e d L i s t } ,60 e x t r a c t e d L i s t=Ex t r a c t [ workL i s t , P o s i t i o n [ bo thL i s t , p a t t e r n ] ] ;61 e x t r a c t e d L i s t=e x t r a c t e d L i s t [ [ A l l , { 1 , 2 , 3 } ] ] ;62 e x t r a c t e d L i s t=S p l i t [ So r t [ e x t r a c t e d L i s t ] ,#1 [ [ 1 ] ]==#2 [ [ 1 ] ]& ] ;63 e x t r a c t e d L i s t=Map[{#[ [ 1 , 1 ] ] , Length [#]}& , e x t r a c t e d L i s t ] ;

13

64 e x t r a c t e d L i s t=Map[{ P o s i t i o n [ d on eL i s t [ [ A l l , 1 ] ] ,#[ [ 1 ] ] ] [ [ 1 , 1 ] ] , # [ [ 2 ] ] } & ,e x t r a c t e d L i s t ] ;

65 e x t r a c t e d L i s t66 ] ;6768 (∗ d i e out work l i s t ∗)69 s h i f t L i s t [ {70 { f u l l L i s t_ , doneList_ , workList_ } ,71 {{rndX_ , rndY_ , rndW_} ,{ cpuSize_ , gpuSize_ }} ,72 time_ , kerne l_ , depth_ }] := Block [73 { bo thL i s t , r e f u g e L i s t , memberList , unknownList , doneOut , workOut , f u l lOu t , t imer ,

t ime r s } ,74 t ime r s ={0 ,0}; t ime r=AbsoluteTime [ ] ;75 doneOut=doneL i s t ;76 b o t h L i s t=F l a t t e n [Map[ {#[ [ 2 , 1 ] ] ,# [ [ 2 , 2 ] ] ,# [ [ 3 , 1 ] ] ,# [ [ 3 , 2 ] ] }& , wo r kL i s t ] ] ;77 t ime r s=t ime r s+{Abso luteTime [ ] − t imer , 0 } ; t ime r=AbsoluteTime [ ] ;78 b o t h L i s t=c l L i s t [ bo thL i s t ,2^ depth ∗9900 ,2^ depth ∗10000 , Length [ b o t h L i s t ] / 4 ] ;79 t ime r s=t ime r s +{0, Abso luteTime [ ] − t ime r } ; t ime r=AbsoluteTime [ ] ;80 b o t h L i s t=P a r t i t i o n [ F i r s t [ b o t h L i s t ] , 4 ] ;81 r e f u g e L i s t=p o s i t i o nH e l p e r [ { doneL i s t , wo r kL i s t } , bo thL i s t , { 0 . , 0 . , 0 . , 0 . } ] ;82 Do [ doneOut [ [ r e f u g e L i s t [ [ i , 1 ] ] , 2 , 1 ] ] = doneL i s t [ [ r e f u g e L i s t [ [ i , 1 ] ] , 2 , 1 ] ] +

r e f u g e L i s t [ [ i , 2 ] ] , { i , Length [ r e f u g e L i s t ] } ] ;83 memberList=p o s i t i o nH e l p e r [ { doneL i s t , wo r kL i s t } , bo thL i s t , { 1 . , 1 . , 1 . , 1 . } ] ;84 Do [ doneOut [ [ memberList [ [ i , 1 ] ] , 2 , 2 ] ] = doneL i s t [ [ memberList [ [ i , 1 ] ] , 2 , 2 ] ] +

memberList [ [ i , 2 ] ] , { i , Length [ memberList ] } ] ;85 unknownList=Po s i t i o n [ bo thL i s t , Except [ { 0 . , 0 . , 0 . , 0 . } | { 1 . , 1 . , 1 . , 1 . } ] , { 1 } ] ;86 unknownList=Transpose [ { Drop [ E x t r a c t [ workL i s t , unknownList ] , 1 ] , Drop [ E x t r a c t [

bo thL i s t , unknownList ] , 1 ] } ] ;87 workOut=Map[{#[ [ 1 , 1 ] ] , {# [ [ 2 , 1 ] ] ,# [ [ 2 , 2 ] ] } , {# [ [ 2 , 3 ] ] ,# [ [ 2 , 4 ] ] } ,# [ [ 1 , 4 ] ]+ depth

}&, unknownList ] ;88 f u l l O u t=Jo in [ f u l l L i s t , S e l e c t [ doneOut , Not [ gpuS i ze^2−Tota l [#[ [ 2 ] ] ] > 0 ] & ] ] ;89 doneOut=S e l e c t [ doneOut , gpuS i ze^2−Tota l [#[ [ 2 ] ] ] > 0& ] ;90 t ime r s=t ime r s+{Abso luteTime [ ] − t imer , 0 } ; t ime r=AbsoluteTime [ ] ;91 {{ f u l lOu t , doneOut , workOut } ,{{ rndX , rndY , rndW} ,{ cpuS ize , gpuS i ze }} , t ime+t ime r s ,

k e r n e l , depth+1}92 ] ;93 ] ;

4.3.5 MATHEMATICA CODE FOR THE CPU PART

This Mathematica code represents the main loop.

1 path=NotebookD i r e c to r y [ ] ;2 S e tD i r e c t o r y [ path ] ;3 f i l e s=Fi leNames [ " in_∗" ] ;4 (∗ Main loop ∗)5 While [ f i l e s !={} ,6 a l lT ime=AbsoluteTime [ ] ;7 f i l e=F i r s t [ f i l e s ] ;8 s e t=Get [ f i l e ] ;9 uppe rL im i t =500000; l owe r L im i t =2000; f i n a l L i m i t =20; (∗ l i s t d imens i on s ∗)

10 D i s t r i b u t e D e f i n i t i o n s [ s e t , uppe rL im i t , l owe rL im i t , f i n a l L i m i t , path , f i l e ] ;11 (∗ Begin p a r a l l e l e v a l u a t i o n ∗)12 P a r a l l e l E v a l u a t e [

14

13 S e tD i r e c t o r y [ path ] ;14 (∗ i n i t work ing l i s t ∗)15 {{ f u l l L i s t , doneL i s t , wo r kL i s t } ,{{ rndX , rndY , rndW} ,{ cpuS ize , gpuS i ze }} , t imer ,

k e r n e l , depth}=s e t ;16 s e t={{ f u l l L i s t , doneL i s t , wo r kL i s t } ,{{ rndX , rndY , rndW} ,{ cpuS ize , gpuS i ze

}} ,{0 ,0} , $Kerne l ID , 0 } ;17 (∗ CPU r a s t e r l oop ∗)18 While [ 4∗ ( Length [ f u l l L i s t ]+Length [ d on eL i s t ] )<cpuS i z e ^2 ,19 (∗ f i l l up work ing l i s t ∗)20 s e t=NestWhi le [ addL i s t , s e t , Length [#[ [ 1 , 1 ] ] ] + Length

[#[ [1 ,2 ] ] ] <#[ [2 ,2 ,1 ] ]^2/4&& Length [# [ [ 1 , 3 ] ] ] < uppe rL im i t &] ;21 (∗ d i e out work ing l i s t ∗)22 s e t=NestWhi le [ s h i f t L i s t , s e t , Length [#[ [ 1 , 3 ] ] ] > l owe r L im i t &] ;23 (∗ update work ing l i s t ∗)24 {{ f u l l L i s t , doneL i s t , wo r kL i s t } ,{{ rndX , rndY , rndW} ,{ cpuS ize , gpuS i ze }} , t imer

, k e r n e l , depth}=s e t ;25 (∗ p r i n t work l i s t s t a t u s ∗)26 P r i n t [ Gr id [{{27 DateS t r i ng [ Abso luteTime [ ] , { "Day" , " : " , "Time" } ] ,28 ToSt r ing [ Round [ ( Length [ f u l l L i s t ]+Length [ d on eL i s t ] ) / cpuS i z e

^2∗4∗100 ,0.1]] < >"%" ,29 ToSt r ing [ Length [ f u l l L i s t ] ] ,30 ToSt r ing [ Length [ d on eL i s t ] ] ,31 ToSt r ing [ Length [ wo r kL i s t ] ] ,32 ToSt r ing [ Round [ t ime r [ [ 1 ] ] , 0 . 1 ] ] < > " s " ,33 ToSt r ing [ Round [ t ime r [ [ 2 ] ] , 0 . 1 ] ] < > " s " ,34 ToSt r ing [ depth ]35 }} , Frame−>Al l , Al ignment−>Right , I t emS ize−>Al l , Background−>Which [36 k e r n e l ==1,LightRed , k e r n e l ==2,L ightGreen , k e r n e l ==3,L igh tB lue , k e r n e l ==4,

L igh tGray ] ] ] ;37 (∗ update work ing l i s t w i th r e s e t t e d t imes ∗)38 s e t={{ f u l l L i s t , doneL i s t , wo r kL i s t } ,{{ rndX , rndY , rndW} ,{ cpuS ize , gpuS i ze

}} ,{0 ,0} , k e r n e l , 0 } ;39 ] ;40 (∗ w r i t e work l i s t to k e r n e l f i l e ∗)41 Put [ se t , S t r i n gRep l a c e [ f i l e , { " i n "−>"out " , " . t x t "−>""}]<>"_"<>ToSt r ing [

$Kerne l ID]<>" . t x t " ] ;42 ] ;43 (∗ End p a r a l l e l e v a l u a t i o n ∗)44 (∗ p r i n t work l i s t f i n a l s t a t u s ∗)45 P r i n t [ G r id [{{46 DateS t r i ng [ Abso luteTime [ ] , { "Day" , " : " , "Time" } ] ,47 f i l e ,48 ToSt r ing [ I n t e g e r P a r t [ Abso luteTime [ ] − a l lT ime ]]<>" s "49 }} , Frame−>Al l , Al ignment−>Right , I t emS ize−>A l l ] ] ;50 (∗ Read 4 k e r n e l f i l e s ∗)51 tmp=Map [ Get , Table [ S t r i n gRep l a c e [ f i l e , { " i n "−>"out " , " . t x t "−>""}]<>"_"<>

ToSt r ing [ i ]<>" . t x t " ,{ i , 4 } ] ] ;52 tmp=Transpose [ tmp [ [ A l l , 1 ] ] ] ;53 tmp=Apply [ Jo in , tmp , 1 ] ;54 cpu=tmp [ [ 3 , A l l , { 2 , 3 } ] ] ;55 cpu=Map [#[ [ 1 ] ]+ I ∗#[[2 ] ]& , cpu , { 2 } ] ;56 summ=Tota l [ tmp [ [ 1 , A l l , 2 ] ] ] + Tota l [ tmp [ [ 2 , A l l , 2 ] ] ] ;57 (∗ Write r e s u l t f i l e s ∗)58 Put [ { Abso luteTime [ ] − a l lT ime , summ, cpu } , S t r i n gRep l a c e [ f i l e , { " i n "−>"out " , " . t x t "

15

−>""}]<>"_5"<>" . t x t " ] ;59 (∗ De l e t e i n pu t f i l e , mark as done ∗)60 D e l e t e F i l e [ f i l e ] ;61 f i l e s=Fi leNames [ " in_∗" ] ;62 ] ;

4.3.6 MATHEMATICA CODE FOR THE CPU POST PROCESSING

This is the code for the post processing of the pipeline. This code runs entirely on the CPU andin parallel to the main loop above.

1 c P o s t I t e r =2 Compi le [ {{ zz , _Complex , 1}} ,3 Module [ { n = 0 , nn = 0 , c = zz [ [ 1 ] ] , z = zz [ [ 2 ] ] , z0 = 0 . + 0 . I ,4 maxi0 = 2^15 , maxi1 = 2^16 , r e f u g e = Fa l s e , member = Fa l s e ,5 i s t e p = 0} ,6 While [ i s t e p < 6 + 10 && ! r e f u g e && ! member ,7 nn = 0 ;8 While [ nn < 2^ i s t e p && ! r e f u g e ,9 n = 0 ;

10 While [ n < maxi0 && ! r e f u g e ,11 z = z^2 + c ;12 r e f u g e = Re [ z ]^2 + Im [ z ]^2 > 4 ;13 n++];14 nn++];15 I f [ ! r e f uge ,16 z0 = z ;17 z = z^2 + c ;18 nn = 0 ;19 While [ nn < 2^ i s t e p && ! r e f u g e && ! member ,20 n = 0 ;21 While [ n < maxi1 && ! r e f u g e && ! member ,22 z = z^2 + c ;23 r e f u g e = Re [ z ]^2 + Im [ z ]^2 > 4 ;24 member = (Re [ z0 ] == Re [ z ] ) && ( Im [ z0 ] == Im [ z ] ) ;25 n++];26 nn++];27 ] ;28 i s t e p ++;29 ] ;30 Which [ r e f uge , {0 , i s t e p , n , nn } , member , {1 , i s t e p , n , nn } ,31 True , { zz [ [ 1 ] ] , i s t e p , n , nn } ]32 ] , Run t imeAt t r i bu t e s −> { L i s t a b l e } , P a r a l l e l i z a t i o n −> True ,33 Comp i l a t i onTa rge t −> "C" , Runt imeOpt ions −> "Speed" ] ;3435 While [ True ,36 path = NotebookD i r e c to r y [ ] ;37 S e tD i r e c t o r y [ path ] ;38 f i l e s 5 = Fi leNames [ "out_∗_5. t x t " ] ;39 f i l e s 6 = Fi leNames [ "out_∗_6. t x t " ] ;40 f i l e s 5 = Map [ S t r i n gRep l a c e [#, "_5 . t x t " −> ""] &, f i l e s 5 ] ;41 f i l e s 6 = Map [ S t r i n gRep l a c e [#, "_6 . t x t " −> ""] &, f i l e s 6 ] ;42 f i l e s = Complement [ f i l e s 5 , f i l e s 6 ] ;43 I f [ f i l e s == {} , Pause [ 1 0 ] ,

16

44 a l lT ime = AbsoluteTime [ ] ;45 f i l e = F i r s t [ f i l e s ] ;46 data = Get [ f i l e <> "_5 . t x t " ] ;47 cpu = Chop [ c P o s t I t e r [ data [ [ 3 ] ] ] ] ;48 Put [ cpu , f i l e <> "_6" <> " . t x t " ] ;49 summ =50 data [ [ 2 ] ] + {Length [ S e l e c t [ cpu , #[ [ 1 ] ] == 0 . &] ] ,51 Length [ S e l e c t [ cpu , #[ [ 1 ] ] == 1 . & ] ] } ;52 l e f t =53 Length [ cpu ] − Length [ S e l e c t [ cpu , #[ [ 1 ] ] == 0 . &] ] −54 Length [ S e l e c t [ cpu , #[ [ 1 ] ] == 1 . & ] ] ;55 rnd = Get [ f i l e <> "_1" <> " . t x t " ] [ [ 2 , 1 , 3 ] ] ;56 a r ea = (summ [ [ 2 ] ] + l e f t ) /( Tota l [ summ] +57 l e f t ) ∗ ( 2 . 6∗ ( 1 . 0 1 + 0.01∗ rnd ) ) ^2;58 Put [ { area , summ [ [ 1 ] ] , summ [ [ 2 ] ] , l e f t , rnd ,59 I n t e g e r P a r t [ Abso luteTime [ ] − a l lT ime ] } , f i l e <> "_7" <> " . t x t " ] ;60 P r i n t [ G r id [{{61 DateS t r i ng [ Abso luteTime [ ] , {"Day" , " : " , "Time" } ] ,62 f i l e ,63 ToSt r ing [64 AccountingForm [ area , {8 , 8} , D i g i tB l o c k −> 3 ,65 NumberPadding −> {"" , "0" } ] ] ,66 ToSt r ing [ summ [ [ 1 ] ] ] ,67 ToSt r ing [ summ [ [ 2 ] ] ] ,68 ToSt r ing [ l e f t ] ,69 rnd ,70 ToSt r ing [ I n t e g e r P a r t [ Abso luteTime [ ] − a l lT ime ] ] <> " s "71 }} , Frame −> Al l , A l ignment −> Right , I t emS i z e −> A l l ] ] ;72 ] ;73 ] ;

17