introduction to cuda programming
DESCRIPTION
Introduction to CUDA Programming. Scan Algorithm Explained Andreas Moshovos Winter 2009. Reading. You are strongly encouraged to read the following as it a contains a more formal treatment of the algorithm, plus an overview of various applications of scan. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Introduction to CUDA Programming](https://reader035.vdocuments.net/reader035/viewer/2022062310/56815b88550346895dc98e38/html5/thumbnails/1.jpg)
Introduction to CUDA ProgrammingScan Algorithm Explained
Andreas MoshovosWinter 2009
![Page 2: Introduction to CUDA Programming](https://reader035.vdocuments.net/reader035/viewer/2022062310/56815b88550346895dc98e38/html5/thumbnails/2.jpg)
Reading• You are strongly encouraged to read the
following as it a contains a more formal treatment of the algorithm, plus an overview of various applications of scan.– Guy E. Blelloch. “Prefix Sums and Their
Applications”. In John H. Reif (Ed.), Synthesis of Parallel Algorithms, Morgan Kaufmann, 1990. http://www.cs.cmu.edu/afs/cs.cmu.edu/project/scandal/public/papers/CMU-CS-90-190.html
![Page 3: Introduction to CUDA Programming](https://reader035.vdocuments.net/reader035/viewer/2022062310/56815b88550346895dc98e38/html5/thumbnails/3.jpg)
Two phases• Up-Sweep
– Essentially a reduction– Produces many partial results
• Down-Sweep– Propagating the partial results to all relevant
elements
![Page 4: Introduction to CUDA Programming](https://reader035.vdocuments.net/reader035/viewer/2022062310/56815b88550346895dc98e38/html5/thumbnails/4.jpg)
Up-Sweep• Just a reduction:
1 2 2 5 6 3 8 2 4 1 5 2 7 9 3 5
1 3 2 7 6 9 8 10 4 5 5 7 7 16 3 8
10 19 12 24
29 36
1 3 2 6 9 8 4 5 5 7 16 3
10 121 3 2 6 9 8 4 5 5 7 16 3
29 6510 121 3 2 6 9 8 4 5 5 7 16 3
![Page 5: Introduction to CUDA Programming](https://reader035.vdocuments.net/reader035/viewer/2022062310/56815b88550346895dc98e38/html5/thumbnails/5.jpg)
Up-Sweep• Now let’s see this is a tree
1 2 2 5 6 3 8 2 4 1 5 2 7 9 3 5
3 7 9 10 5 7 16 8
10 19 12 24
29 36
2910 121 3 2 6 9 8 4 5 5 7 16 3
• Notice we only have these nodes left in our array:– the rest were partial results
65
65
![Page 6: Introduction to CUDA Programming](https://reader035.vdocuments.net/reader035/viewer/2022062310/56815b88550346895dc98e38/html5/thumbnails/6.jpg)
Up-Sweep• So, this is what’s left
– nodes without values don’t exist, they were partial results
1 2 6 8 4 5 7 3
3 9 5 16
10 12
29
65
![Page 7: Introduction to CUDA Programming](https://reader035.vdocuments.net/reader035/viewer/2022062310/56815b88550346895dc98e38/html5/thumbnails/7.jpg)
Down-Sweep• For the second phase we need to think:
– The edges in reverse– The empty nodes as placeholders for partial results
1 2 6 8 4 5 7 3
3 9 5 16
10 12
29
65
![Page 8: Introduction to CUDA Programming](https://reader035.vdocuments.net/reader035/viewer/2022062310/56815b88550346895dc98e38/html5/thumbnails/8.jpg)
Down-Sweep• Now let’s view the tree as a collection of
nsubtrees– The root of each sub tree, where it’s still present
contains the reduction of all subtree elements• i.e., the sum of all subtree elements
1 2 6 8 4 5 7 3
3 9 5 16
10 12
29
65
![Page 9: Introduction to CUDA Programming](https://reader035.vdocuments.net/reader035/viewer/2022062310/56815b88550346895dc98e38/html5/thumbnails/9.jpg)
Down-Sweep• Let’s focus on the rightmost subtree:
1 2 6 8 4 5 7 3
3 9 5 16
10 12
29
65
![Page 10: Introduction to CUDA Programming](https://reader035.vdocuments.net/reader035/viewer/2022062310/56815b88550346895dc98e38/html5/thumbnails/10.jpg)
Down-Sweep• Before the last step of the down-sweep phase the
yellow element will contain the sum (57) of all elements to the left of the subtree.
3
57
• The last step will take the following two actions– 3+ 57 = 60, this goes on the rightmost element
• This is the sum of all elements including 3 but excluding the right most one– overwrite 3 with 57
• This is the sum of all elements left of 3
![Page 11: Introduction to CUDA Programming](https://reader035.vdocuments.net/reader035/viewer/2022062310/56815b88550346895dc98e38/html5/thumbnails/11.jpg)
Down-Sweep• In terms of the array stored in memory the
aforementioned actions look like this:
57 61
57
• Where:– the dark arrows represent addition– the red dotted arrow represents a move
3
![Page 12: Introduction to CUDA Programming](https://reader035.vdocuments.net/reader035/viewer/2022062310/56815b88550346895dc98e38/html5/thumbnails/12.jpg)
Down-Sweep• Let’s now focus at the rightmost subtree that
contains the last four nodes:– This will be processed at the step before the
previous subtree we just discussed
7 3
16
![Page 13: Introduction to CUDA Programming](https://reader035.vdocuments.net/reader035/viewer/2022062310/56815b88550346895dc98e38/html5/thumbnails/13.jpg)
Down-Sweep• Before the previous to the last step of the down-sweep
phase the green element will contain the sum (41) of all elements to the left of the subtree.
7 3
16
41
![Page 14: Introduction to CUDA Programming](https://reader035.vdocuments.net/reader035/viewer/2022062310/56815b88550346895dc98e38/html5/thumbnails/14.jpg)
Down-Sweep• The actions that will be taken at
this step are:– 16 + 41 = 57 will be written as the
root of the rightmost subtree• As we saw before this is the sum of all
element left of the rightmost subtree– 41 will replace 16
• This is the sum of all elements left of the subtree rooted by 16
7 3
41 57
41
![Page 15: Introduction to CUDA Programming](https://reader035.vdocuments.net/reader035/viewer/2022062310/56815b88550346895dc98e38/html5/thumbnails/15.jpg)
Down-Sweep• In terms of the array stored in memory the
aforementioned actions look like this:
• Where:– the dark arrows represent addition– the red dotted arrow represents a move
7 41 3
16
57
417 3
![Page 16: Introduction to CUDA Programming](https://reader035.vdocuments.net/reader035/viewer/2022062310/56815b88550346895dc98e38/html5/thumbnails/16.jpg)
Down-Sweep• Now let’s go a step back looking at the
complete right subtee (in green)
4 5 7 3
5 16
12
![Page 17: Introduction to CUDA Programming](https://reader035.vdocuments.net/reader035/viewer/2022062310/56815b88550346895dc98e38/html5/thumbnails/17.jpg)
Down-Sweep• Before this step the root node will contain the
sum (29) of all elements of the left subtree
4 5 7 3
5 16
12
29
![Page 18: Introduction to CUDA Programming](https://reader035.vdocuments.net/reader035/viewer/2022062310/56815b88550346895dc98e38/html5/thumbnails/18.jpg)
Down-Sweep• As before we’ll do two things:
– 29+12 = 41 and this becomes the root of the rightmost subtree
• This should be the sum of all elements to the left of that subtree for the next step (which we saw previously)
– 29 replaces 12 4 5 7 3
5 16
29 41
29
same reason: 29 is the sumof all elements left of the subtreerooted by what was 12.
![Page 19: Introduction to CUDA Programming](https://reader035.vdocuments.net/reader035/viewer/2022062310/56815b88550346895dc98e38/html5/thumbnails/19.jpg)
Down-Sweep• Let’s try to generalize what happens at every
step of the down-sweep phase• Let’s look at step 1:
– There is only one subtree shown in purple
1 2 6 8 4 5 7 3
3 9 5 16
10 12
29
65
![Page 20: Introduction to CUDA Programming](https://reader035.vdocuments.net/reader035/viewer/2022062310/56815b88550346895dc98e38/html5/thumbnails/20.jpg)
Down-Sweep• Before we process this tree as described before
the root node must contain the sum of all elements to the left of the tree– There are no elements– Hence the root must be 01 2 6 8 4 5 7 3
3 9 5 16
10 12
29
0
![Page 21: Introduction to CUDA Programming](https://reader035.vdocuments.net/reader035/viewer/2022062310/56815b88550346895dc98e38/html5/thumbnails/21.jpg)
Down-Sweep• Now repeat the steps we saw before
– 29 + 0 = 29 and this becomes the root of the right subtree
– 29 gets replaced by 0
1 2 6 8 4 5 7 3
3 9 5 16
10 12
0 29
0
![Page 22: Introduction to CUDA Programming](https://reader035.vdocuments.net/reader035/viewer/2022062310/56815b88550346895dc98e38/html5/thumbnails/22.jpg)
Down-Sweep• In terms of the array stored in memory the
aforementioned actions look like this:
• Where:– the dark arrows represent addition– the red dotted arrow represents a move
29 010 121 3 2 6 9 8 4 5 5 7 16 3
0 2910 121 3 2 6 9 8 4 5 5 7 16 3