iap09 cuda@mit 6.963 - lecture 02: cuda basics #1 (nicolas pinto, mit)

127
IAP09 CUDA@MIT / 6.963 Supercomputing on your desktop: Programming the next generation of cheap and massively parallel hardware using CUDA Lecture 02 CUDA Basics #1 - Nicolas Pinto (MIT)

Upload: npinto

Post on 16-Apr-2017

6.332 views

Category:

Education


4 download

TRANSCRIPT

Page 1: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

IAP09 CUDA@MIT / 6.963

Supercomputing on your desktop:Programming the next generation of cheap

and massively parallel hardware using CUDA

Lecture 02

CUDA Basics #1-

Nicolas Pinto (MIT)

Page 2: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

During this course,

we’ll try to

and use existing material ;-)

“ ”

adapted for 6.963

Page 3: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

Todayyey!!

Page 4: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

IntroGPU?

GPU History// Analysis

CUDA OverviewCUDA Basics

IAP09 CUDA@MIT / 6.963

Page 5: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

IntroIAP09 CUDA@MIT / 6.963

Page 6: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$0

1+2*3+24(56(738892:(;<=,89<>(?<9-@(A<*B,-@(C-,D2+@,86

/E#E/$$0

F

!"#$%&"'()'*+&+,,",'-./0%$123

!"#"$$%$&'()*+,-./&-0&"&1(#)&(1&'()*+,-./&-.&

23-'3&)".4&-.0,#+',-(.0&"#%&'"##-%5&(+,&

0-)+$,".%(+0$441510"61+

! 7&+61$1.2+,,8)',+&3"9'":0"2;1<"9';0"#1+,1="6

! >:.$1#'?%0"&#./0%$"&;'@>3)'-&+8A

! B1;$&1C%$"6'?8;$"/;'@>3)'D?-E'4F1$"9'G,%"H"2"A'

! *+&+,,",'#./0%$123'I+;'$&+61$1.2+,,8'

12+##";;1C,"'$.'$F"'#.//.61$8'/+&5"$0,+#"

!"#$%&'()*$+,-.%/'0%(,1,(2(%&'()'1$1-%&'3-3%#43%

-.'#%"0%5&",&"&#",%&(1&#(+/3$4&"&1"',(#&(1&,2(&*%#&4%"#&666&

7%#,"-.$4&(8%#&,3%&03(#,&,%#)&,3-0&#",%&'".&9%&%:*%',%5&,(&

'(.,-.+%;&-1&.(,&,(&-.'#%"0%6&<8%#&,3%&$(./%#&,%#);&,3%&

#",%&(1&-.'#%"0%&-0&"&9-,&)(#%&+.'%#,"-.;&"$,3(+/3&,3%#%&-0&

.(&#%"0(.&,(&9%$-%8%&-,&2-$$&.(,&#%)"-.&.%"#$4&'(.0,".,&1(#&

",&$%"0,&=>&4%"#06&?3",&)%".0&94&=@AB;&,3%&.+)9%#&(1&

'()*(.%.,0&*%#&-.,%/#",%5&'-#'+-,&1(#&)-.-)+)&'(0,&2-$$&

9%&CB;>>>6&D&9%$-%8%&,3",&0+'3&"&$"#/%&'-#'+-,&'".&9%&9+-$,&

'1%4%3,15*$%64/$07

H.&6.2'J..&"9'>,"#$&.21#;'J+3+=12"9'KL'D0&1,'KLMN

! 7F"'/.;$'"#.2./1#'2%/C"&'.O'#./0.2"2$;'

12'+2'E-'I1,,'6.%C,"'"<"&8'8"+&

! P1;$.&1#+,,8'!-*Q;'3"$'O+;$"&

"P+&6I+&"'&"+#F123'O&"R%"2#8',1/1$+$1.2;

! S.I'!-*Q;'3"$'I16"&

! T+$F"&'$F+2'":0"#$123'-*Q;'$.'3"$'$I1#"'+;'

O+;$9'":0"#$'$.'F+<"'$I1#"'+;'/+28U

! *+&+,,",'0&.#";;123'O.&'$F"'/+;;";

! Q2O.&$%2+$",8)'*+&+,,",'0&.3&+//123'1;'F+&6V''

"D,3.&1$F/;'+26'B+$+'?$&%#$%&";'/%;$'C"'O%26+/"2$+,,8'&"6";132"6

slide by Matthew Bolitho

Motivation

Page 7: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$0

1+2*3+24(56(738892:(;<=,89<>(?<9-@(A<*B,-@(C-,D2+@,86

/E#E/$$0

F

!"#$%&"'()'*+&+,,",'-./0%$123

!"#"$$%$&'()*+,-./&-0&"&1(#)&(1&'()*+,-./&-.&

23-'3&)".4&-.0,#+',-(.0&"#%&'"##-%5&(+,&

0-)+$,".%(+0$441510"61+

! 7&+61$1.2+,,8)',+&3"9'":0"2;1<"9';0"#1+,1="6

! >:.$1#'?%0"&#./0%$"&;'@>3)'-&+8A

! B1;$&1C%$"6'?8;$"/;'@>3)'D?-E'4F1$"9'G,%"H"2"A'

! *+&+,,",'#./0%$123'I+;'$&+61$1.2+,,8'

12+##";;1C,"'$.'$F"'#.//.61$8'/+&5"$0,+#"

!"#$%&'()*$+,-.%/'0%(,1,(2(%&'()'1$1-%&'3-3%#43%

-.'#%"0%5&",&"&#",%&(1&#(+/3$4&"&1"',(#&(1&,2(&*%#&4%"#&666&

7%#,"-.$4&(8%#&,3%&03(#,&,%#)&,3-0&#",%&'".&9%&%:*%',%5&,(&

'(.,-.+%;&-1&.(,&,(&-.'#%"0%6&<8%#&,3%&$(./%#&,%#);&,3%&

#",%&(1&-.'#%"0%&-0&"&9-,&)(#%&+.'%#,"-.;&"$,3(+/3&,3%#%&-0&

.(&#%"0(.&,(&9%$-%8%&-,&2-$$&.(,&#%)"-.&.%"#$4&'(.0,".,&1(#&

",&$%"0,&=>&4%"#06&?3",&)%".0&94&=@AB;&,3%&.+)9%#&(1&

'()*(.%.,0&*%#&-.,%/#",%5&'-#'+-,&1(#&)-.-)+)&'(0,&2-$$&

9%&CB;>>>6&D&9%$-%8%&,3",&0+'3&"&$"#/%&'-#'+-,&'".&9%&9+-$,&

'1%4%3,15*$%64/$07

H.&6.2'J..&"9'>,"#$&.21#;'J+3+=12"9'KL'D0&1,'KLMN

! 7F"'/.;$'"#.2./1#'2%/C"&'.O'#./0.2"2$;'

12'+2'E-'I1,,'6.%C,"'"<"&8'8"+&

! P1;$.&1#+,,8'!-*Q;'3"$'O+;$"&

"P+&6I+&"'&"+#F123'O&"R%"2#8',1/1$+$1.2;

! S.I'!-*Q;'3"$'I16"&

! T+$F"&'$F+2'":0"#$123'-*Q;'$.'3"$'$I1#"'+;'

O+;$9'":0"#$'$.'F+<"'$I1#"'+;'/+28U

! *+&+,,",'0&.#";;123'O.&'$F"'/+;;";

! Q2O.&$%2+$",8)'*+&+,,",'0&.3&+//123'1;'F+&6V''

"D,3.&1$F/;'+26'B+$+'?$&%#$%&";'/%;$'C"'O%26+/"2$+,,8'&"6";132"6

slide by Matthew Bolitho

Motivation

Page 8: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

GPU?IAP09 CUDA@MIT / 6.963

Page 9: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

GPUs are REALLY fast

Performance (gflops) Development Time (hours)

3D Filterbank Convolution

GPU?

Nicolas Pinto, James DiCarlo, David Cox (MIT, Harvard)

Page 10: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

GPUs are REALLY fast

Matlab

C/SSE

PS3

GT200

Performance (gflops) Development Time (hours)

3D Filterbank Convolution

GPU?

Nicolas Pinto, James DiCarlo, David Cox (MIT, Harvard)

Page 11: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

GPUs are REALLY fast

Matlab

C/SSE

PS3

GT200

0.3

Performance (gflops) Development Time (hours)

3D Filterbank Convolution

GPU?

Nicolas Pinto, James DiCarlo, David Cox (MIT, Harvard)

Page 12: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

GPUs are REALLY fast

Matlab

C/SSE

PS3

GT200

0.3

9.0

Performance (gflops) Development Time (hours)

3D Filterbank Convolution

GPU?

Nicolas Pinto, James DiCarlo, David Cox (MIT, Harvard)

Page 13: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

GPUs are REALLY fast

Matlab

C/SSE

PS3

GT200

0.3

9.0

110.0

Performance (gflops) Development Time (hours)

3D Filterbank Convolution

GPU?

Nicolas Pinto, James DiCarlo, David Cox (MIT, Harvard)

Page 14: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

GPUs are REALLY fast

Matlab

C/SSE

PS3

GT200

0.3

9.0

110.0

330.0

Performance (gflops) Development Time (hours)

3D Filterbank Convolution

GPU?

Nicolas Pinto, James DiCarlo, David Cox (MIT, Harvard)

Page 15: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

GPUs are REALLY fast

Matlab

C/SSE

PS3

GT200

0.3

9.0

110.0

330.0

0.5

Performance (gflops) Development Time (hours)

3D Filterbank Convolution

GPU?

Nicolas Pinto, James DiCarlo, David Cox (MIT, Harvard)

Page 16: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

GPUs are REALLY fast

Matlab

C/SSE

PS3

GT200

0.3

9.0

110.0

330.0

0.5

10.0

Performance (gflops) Development Time (hours)

3D Filterbank Convolution

GPU?

Nicolas Pinto, James DiCarlo, David Cox (MIT, Harvard)

Page 17: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

GPUs are REALLY fast

Matlab

C/SSE

PS3

GT200

0.3

9.0

110.0

330.0

0.5

10.0

30.0

Performance (gflops) Development Time (hours)

3D Filterbank Convolution

GPU?

Nicolas Pinto, James DiCarlo, David Cox (MIT, Harvard)

Page 18: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

GPUs are REALLY fast

Matlab

C/SSE

PS3

GT200

0.3

9.0

110.0

330.0

0.5

10.0

30.0

10.0

Performance (gflops) Development Time (hours)

3D Filterbank Convolution

GPU?

Nicolas Pinto, James DiCarlo, David Cox (MIT, Harvard)

Page 19: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$0

1+2*3+24(56(738892:(;<=,89<>(?<9-@(A<*B,-@(C-,D2+@,86

EFG$F/$$0

&

! !"#$%&$'()*(+,-.'/

!012('&.*2(3'45&*)&6,7'&"2'89':&%;<=&;>6?&;*2(4'& !012('&.*2(3'45&*)&6,7'&"2'89':&%;<=&;>6?&;*2(4'&

! 6'401-'@&)*(&+,3AB0-3'-407':&C,(,DD'D&

C(*8D'+4/

! E*('&3(,-4043*(4&@'@0.,3'@&3*&?">&3A,-&)D*F&

.*-3(*D&,-@&@,3,&.,.A'

! GA,3&,('&3A'&.*-4'H2'-.'4I

! GA,3&,('&3A'&.*-4'H2'-.'4I

! $(*1(,+&+243&8'&+*('&C('@0.3,8D'/

! 6,3,&,..'44&.*A'('-.5

! $(*1(,+&)D*F

slide by Matthew Bolitho

GPU?

Page 20: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$0

1+2*3+24(56(738892:(;<=,89<>(?<9-@(A<*B,-@(C-,D2+@,86

EFG$F/$$0

&

! !"#$%&$'()*(+,-.'/

!012('&.*2(3'45&*)&6,7'&"2'89':&%;<=&;>6?&;*2(4'& !012('&.*2(3'45&*)&6,7'&"2'89':&%;<=&;>6?&;*2(4'&

! 6'401-'@&)*(&+,3AB0-3'-407':&C,(,DD'D&

C(*8D'+4/

! E*('&3(,-4043*(4&@'@0.,3'@&3*&?">&3A,-&)D*F&

.*-3(*D&,-@&@,3,&.,.A'

! GA,3&,('&3A'&.*-4'H2'-.'4I

! GA,3&,('&3A'&.*-4'H2'-.'4I

! $(*1(,+&+243&8'&+*('&C('@0.3,8D'/

! 6,3,&,..'44&.*A'('-.5

! $(*1(,+&)D*F

slide by Matthew Bolitho

GPU?

Page 21: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$0

1+2*3+24(56(738892:(;<=,89<>(?<9-@(A<*B,-@(C-,D2+@,86

/E#E/$$0

F

!"#$%&"'()'*+&+,,",'-./0%$123

!"#"$$%$&'()*+,-./&-0&"&1(#)&(1&'()*+,-./&-.&

23-'3&)".4&-.0,#+',-(.0&"#%&'"##-%5&(+,&

0-)+$,".%(+0$441510"61+

! 7&+61$1.2+,,8)',+&3"9'":0"2;1<"9';0"#1+,1="6

! >:.$1#'?%0"&#./0%$"&;'@>3)'-&+8A

! B1;$&1C%$"6'?8;$"/;'@>3)'D?-E'4F1$"9'G,%"H"2"A'

! *+&+,,",'#./0%$123'I+;'$&+61$1.2+,,8'

12+##";;1C,"'$.'$F"'#.//.61$8'/+&5"$0,+#"

!"#$%&'()*$+,-.%/'0%(,1,(2(%&'()'1$1-%&'3-3%#43%

-.'#%"0%5&",&"&#",%&(1&#(+/3$4&"&1"',(#&(1&,2(&*%#&4%"#&666&

7%#,"-.$4&(8%#&,3%&03(#,&,%#)&,3-0&#",%&'".&9%&%:*%',%5&,(&

'(.,-.+%;&-1&.(,&,(&-.'#%"0%6&<8%#&,3%&$(./%#&,%#);&,3%&

#",%&(1&-.'#%"0%&-0&"&9-,&)(#%&+.'%#,"-.;&"$,3(+/3&,3%#%&-0&

.(&#%"0(.&,(&9%$-%8%&-,&2-$$&.(,&#%)"-.&.%"#$4&'(.0,".,&1(#&

",&$%"0,&=>&4%"#06&?3",&)%".0&94&=@AB;&,3%&.+)9%#&(1&

'()*(.%.,0&*%#&-.,%/#",%5&'-#'+-,&1(#&)-.-)+)&'(0,&2-$$&

9%&CB;>>>6&D&9%$-%8%&,3",&0+'3&"&$"#/%&'-#'+-,&'".&9%&9+-$,&

'1%4%3,15*$%64/$07

H.&6.2'J..&"9'>,"#$&.21#;'J+3+=12"9'KL'D0&1,'KLMN

! 7F"'/.;$'"#.2./1#'2%/C"&'.O'#./0.2"2$;'

12'+2'E-'I1,,'6.%C,"'"<"&8'8"+&

! P1;$.&1#+,,8'!-*Q;'3"$'O+;$"&

"P+&6I+&"'&"+#F123'O&"R%"2#8',1/1$+$1.2;

! S.I'!-*Q;'3"$'I16"&

! T+$F"&'$F+2'":0"#$123'-*Q;'$.'3"$'$I1#"'+;'

O+;$9'":0"#$'$.'F+<"'$I1#"'+;'/+28U

! *+&+,,",'0&.#";;123'O.&'$F"'/+;;";

! Q2O.&$%2+$",8)'*+&+,,",'0&.3&+//123'1;'F+&6V''

"D,3.&1$F/;'+26'B+$+'?$&%#$%&";'/%;$'C"'O%26+/"2$+,,8'&"6";132"6

slide by Matthew Bolitho

GPU?

Page 22: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

4

Task vs. Data parallelismTask vs. Data parallelism

• Task parallel

– Independent processes with little communication

– Easy to use

• “Free” on modern operating systems with SMP

• Data parallel

– Lots of data on which the same computation is being

executed

– No dependencies between data elements in each

step in the computation

– Can saturate many ALUs

– But often requires redesign of traditional algorithms

slide by Mike Houston

GPU?

Page 23: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

5

CPU vs. GPUCPU vs. GPU

• CPU

– Really fast caches (great for data reuse)

– Fine branching granularity

– Lots of different processes/threads

– High performance on a single thread of execution

• GPU

– Lots of math units

– Fast access to onboard memory

– Run a program on each fragment/vertex

– High throughput on parallel tasks

• CPUs are great for task parallelism

• GPUs are great for data parallelismslide by Mike Houston

GPU?

Page 24: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

6

The Importance of Data Parallelism for GPUsThe Importance of Data Parallelism for GPUs

• GPUs are designed for highly parallel tasks like

rendering

• GPUs process independent vertices and fragments

– Temporary registers are zeroed

– No shared or static data

– No read-modify-write buffers

– In short, no communication between vertices or fragments

• Data-parallel processing

– GPU architectures are ALU-heavy

• Multiple vertex & pixel pipelines

• Lots of compute power

– GPU memory systems are designed to stream data

• Linear access patterns can be prefetched

• Hide memory latency slide by Mike Houston

GPUs

Page 25: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

GPUHistory

IAP09 CUDA@MIT / 6.963

Page 26: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

not true!

History

Page 27: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DE/D/$$0

/

!""#$%&'$()

*(++&),

-.(+.'/0

1&2'./$3&'$()

4.5'6/.

7/&8+.)'

9$2"#&0

4:.;'/&,$'$()&#;+(,.#;<(/;=>9;1.),./$)8

!"#$%! ?./'$%.2;&),;@/$+$'$A.2

! 4/&)2<(/+&'$()2! !"#$%"&#'()*)+,%,*-.',%/0

&$%#$%! B9;C+&8.;<(/;,$2"#&0

!""#$%&'$()

*(++&),

-.(+.'/0

1&2'./$3&'$()

4.5'6/.

7/&8+.)'

9$2"#&0

'()*"%+,##()-.%)/"++

! @/."&/.;&),;#(&,;,&'&! C226.2;%(++&),2;A$&;&);!@C;D.E8E;

F".)-G;(/;9$/.%'HI

!""#$%&'$()

*(++&),

-.(+.'/0

1&2'./$3&'$()

4.5'6/.

7/&8+.)'

9$2"#&0

01/-*22)"3+-/44."5265.%.+71/4+%8*.##()-.%)/"

! 4/$&)86#&'.;@(#08()2

! @/."&/.;A./'.5;,&'&;2'/.&+2! 0

!""#$%&'$()

*(++&),

-.(+.'/0

1&2'./$3&'$()

4.5'6/.

7/&8+.)'

9$2"#&0

01/-*22)"3+9*1%)-*2

! J(,.#$)8;4/&)2<(/+&'$()2! ?$.K$)8;4/&)2<(/+&'$()2

! ?./'.5>L&2.,;G$8:'$)8;*(+"6'&'$()! @./2".%'$A.;4/&)2<(/+&'$()

!""#$%&'$()

*(++&),

-.(+.'/0

1&2'./$3&'$()

4.5'6/.

7/&8+.)'

9$2"#&0

'/":*1%)"3+;*/4*%1<+%/+=1.34*"%2

! 7(/+&'$();(<;'/$&)8#.2;K$':;"/(%.22.,;A./'$%.2

! C)'./"(#&'$();(<;A./'.5;&''/$L6'.2;&%/(22;'/$&)8#.2

! 1(,*-2,/%"3,'4"3"5,6

! */.&'$)8;7/&8+.)'2;</(+;':.;4/$&)8#.2! *#$""$)8

!""#$%&'$()

*(++&),

-.(+.'/0

1&2'./$3&'$()

4.5'6/.

7/&8+.)'

9$2"#&0

>*?%$1*+4.##)"3+/7+=1.34*"%2

! !77892'1%,:%9*,'+,+7*;6

! @./<(/+;'.5'6/.;<$#'./$)8

slide by Matthew Bolitho

History

Page 28: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DE/D/$$0

F

!""#$%&'$()

*(++&),

-.(+.'/0

1&2'./$3&'$()

4.5'6/.

7/&8+.)'

9$2"#&0

!"#$%&'()*"+,&--.'$

! :./;7/&8+.)';<$8='$)8! >+&8.?@&2.,;.AA.%'2

! !)'$?!#$&2$)8! !#"=&;B#.),$)8! !

!""#$%&'$()

*(++&),

-.(+.'/0

1&2'./$3&'$()

4.5'6/.

7/&8+.)'

9$2"#&0

/+'0&"(.'$)!"#$%&'(-)(+)*.1&2-

! 9."'=;B6AA./;4.2'! C'.)%$#;B6AA./;4.2'

! !%%6+6#&'$();B6AA./;D"./&'$()! E/$'.;7/&8+.)'2;'(;7/&+.@6AA./

! 1.),./;.'(&"#,(.0&F;/.&#$2'$%;%(+"6'./;

8.)./&'.,;2%.).2

! G&%=;A/&+.;$2;%(+"#.5

! H..,;IJ;A/&+.2;"./;2.%(),

! "#$%&'()*)'+,,'&-,(.

"3&4.,#(&4)5#"46#"&

!""#$%&'$()

*(++&),

-.(+.'/0

1&2'./$3&'$()

4.5'6/.

7/&8+.)'

9$2"#&0

! 4(;$+"/(K.;"./A(/+&)%.F;+(K.;2(+.;L(/M;'(;,.,$%&'.,;=&/,L&/.

! N&/,L&/.;%(6#,;"/(%.22;.&%=;K./'.5;

&),;.&%=;A/&8+.)';$),.".),.)'#0;"7.$528)*#"#22&2

*:O;P;N(2'

-/&"=$%2;N&/,L&/.

! /0)'1*23045&'#43)-46)'(2&'7!"#$%&!'()*"+(8

" N&/,L&/.;L&2;=&/,L$/.,;'(;"./A(/+;'=.;("./&'$()2;$);'=.;"$".#$).

! GK.)'6&##0F;"$".#$).;@.%&+.;+(/.;"/(8/&++&@#.

!""#$%&'$()

*(++&),

-.(+.'/0

1&2'./$3&'$()

4.5'6/.

7/&8+.)'

9$2"#&0

! 94*&+'&+2:)'+,';)'73*,:*2<<2;-)8'()*)'

4.5'6/.;&),;7/&8+.)';2'&8.2

! !@#.;'(;2".%$A0;&;,$2%/.'.;2.';(A;'.5'6/.;

@#.),$)8;("./&'$()2;;! *(6#,;%(+@$).;/.26#'2;A/(+;Q;'.5'6/.;

#((M6"2R;;GR8R;;

!SBT;;U!VW!FBXT;!;9D4;BT;!YB

! H(;%$/%6#&'$();(A;,&'&;$);"$".#$).

*:O;P;N(2'

-/&"=$%2;N&/,L&/.

slide by Matthew Bolitho

History

Page 29: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DE/D/$$0

F

!""#$%&'$()

*(++&),

-.(+.'/0

1&2'./$3&'$()

4.5'6/.

7/&8+.)'

9$2"#&0

!"#$%&'()*"+,&--.'$

! :./;7/&8+.)';<$8='$)8! >+&8.?@&2.,;.AA.%'2

! !)'$?!#$&2$)8! !#"=&;B#.),$)8! !

!""#$%&'$()

*(++&),

-.(+.'/0

1&2'./$3&'$()

4.5'6/.

7/&8+.)'

9$2"#&0

/+'0&"(.'$)!"#$%&'(-)(+)*.1&2-

! 9."'=;B6AA./;4.2'! C'.)%$#;B6AA./;4.2'

! !%%6+6#&'$();B6AA./;D"./&'$()! E/$'.;7/&8+.)'2;'(;7/&+.@6AA./

! 1.),./;.'(&"#,(.0&F;/.&#$2'$%;%(+"6'./;

8.)./&'.,;2%.).2

! G&%=;A/&+.;$2;%(+"#.5

! H..,;IJ;A/&+.2;"./;2.%(),

! "#$%&'()*)'+,,'&-,(.

"3&4.,#(&4)5#"46#"&

!""#$%&'$()

*(++&),

-.(+.'/0

1&2'./$3&'$()

4.5'6/.

7/&8+.)'

9$2"#&0

! 4(;$+"/(K.;"./A(/+&)%.F;+(K.;2(+.;L(/M;'(;,.,$%&'.,;=&/,L&/.

! N&/,L&/.;%(6#,;"/(%.22;.&%=;K./'.5;

&),;.&%=;A/&8+.)';$),.".),.)'#0;"7.$528)*#"#22&2

*:O;P;N(2'

-/&"=$%2;N&/,L&/.

! /0)'1*23045&'#43)-46)'(2&'7!"#$%&!'()*"+(8

" N&/,L&/.;L&2;=&/,L$/.,;'(;"./A(/+;'=.;("./&'$()2;$);'=.;"$".#$).

! GK.)'6&##0F;"$".#$).;@.%&+.;+(/.;"/(8/&++&@#.

!""#$%&'$()

*(++&),

-.(+.'/0

1&2'./$3&'$()

4.5'6/.

7/&8+.)'

9$2"#&0

! 94*&+'&+2:)'+,';)'73*,:*2<<2;-)8'()*)'

4.5'6/.;&),;7/&8+.)';2'&8.2

! !@#.;'(;2".%$A0;&;,$2%/.'.;2.';(A;'.5'6/.;

@#.),$)8;("./&'$()2;;! *(6#,;%(+@$).;/.26#'2;A/(+;Q;'.5'6/.;

#((M6"2R;;GR8R;;

!SBT;;U!VW!FBXT;!;9D4;BT;!YB

! H(;%$/%6#&'$();(A;,&'&;$);"$".#$).

*:O;P;N(2'

-/&"=$%2;N&/,L&/.

slide by Matthew Bolitho

History

Page 30: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DE/D/$$0

F

!""#$%&'$()

*(++&),

-.(+.'/0

1&2'./$3&'$()

4.5'6/.

7/&8+.)'

9$2"#&0

!"#$%&'()*"+,&--.'$

! :./;7/&8+.)';<$8='$)8! >+&8.?@&2.,;.AA.%'2

! !)'$?!#$&2$)8! !#"=&;B#.),$)8! !

!""#$%&'$()

*(++&),

-.(+.'/0

1&2'./$3&'$()

4.5'6/.

7/&8+.)'

9$2"#&0

/+'0&"(.'$)!"#$%&'(-)(+)*.1&2-

! 9."'=;B6AA./;4.2'! C'.)%$#;B6AA./;4.2'

! !%%6+6#&'$();B6AA./;D"./&'$()! E/$'.;7/&8+.)'2;'(;7/&+.@6AA./

! 1.),./;.'(&"#,(.0&F;/.&#$2'$%;%(+"6'./;

8.)./&'.,;2%.).2

! G&%=;A/&+.;$2;%(+"#.5

! H..,;IJ;A/&+.2;"./;2.%(),

! "#$%&'()*)'+,,'&-,(.

"3&4.,#(&4)5#"46#"&

!""#$%&'$()

*(++&),

-.(+.'/0

1&2'./$3&'$()

4.5'6/.

7/&8+.)'

9$2"#&0

! 4(;$+"/(K.;"./A(/+&)%.F;+(K.;2(+.;L(/M;'(;,.,$%&'.,;=&/,L&/.

! N&/,L&/.;%(6#,;"/(%.22;.&%=;K./'.5;

&),;.&%=;A/&8+.)';$),.".),.)'#0;"7.$528)*#"#22&2

*:O;P;N(2'

-/&"=$%2;N&/,L&/.

! /0)'1*23045&'#43)-46)'(2&'7!"#$%&!'()*"+(8

" N&/,L&/.;L&2;=&/,L$/.,;'(;"./A(/+;'=.;("./&'$()2;$);'=.;"$".#$).

! GK.)'6&##0F;"$".#$).;@.%&+.;+(/.;"/(8/&++&@#.

!""#$%&'$()

*(++&),

-.(+.'/0

1&2'./$3&'$()

4.5'6/.

7/&8+.)'

9$2"#&0

! 94*&+'&+2:)'+,';)'73*,:*2<<2;-)8'()*)'

4.5'6/.;&),;7/&8+.)';2'&8.2

! !@#.;'(;2".%$A0;&;,$2%/.'.;2.';(A;'.5'6/.;

@#.),$)8;("./&'$()2;;! *(6#,;%(+@$).;/.26#'2;A/(+;Q;'.5'6/.;

#((M6"2R;;GR8R;;

!SBT;;U!VW!FBXT;!;9D4;BT;!YB

! H(;%$/%6#&'$();(A;,&'&;$);"$".#$).

*:O;P;N(2'

-/&"=$%2;N&/,L&/.

slide by Matthew Bolitho

History

Page 31: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DE/D/$$0

&

!""#$%&'$()

*(++&),

-.(+.'/0

1&2'./$3&'$()

4.5'6/.

7/&8+.)'

9$2"#&0

! 4.5'6/.:&),:7/&8+.)':2'&8.2:;.%&+.:+(/.:"/(8/&++&;#.<:%(+;$).,:$)'(:

!"#$%&'()*+(,)-*=>:?:@(2'

-/&"A$%2:@&/,B&/.

!""#$%&'$()

*(++&),

-.(+.'/0

1&2'./$3&'$()

7/&8+.)':>)$'

9$2"#&0

! 4.5'6/.:&),:7/&8+.)':2'&8.2:;.%&+.:+(/.:"/(8/&++&;#.<:%(+;$).,:$)'(:

!"#$%&'()*+(,)-

! =/(8/&++&;#.:C$&:&22.+;#0:#&)86&8.! D.+(/0:/.&,2:C$&:'.5'6/.:#((E6"2! !.'/'(0$()-*)'1)2#'*34452/6

! F$+$'.,:=/(8/&+:2$3.

! G(:/.&#:;/&)%A$)8:H'A62:#(("$)8I

*=>:?:@(2'

-/&"A$%2:@&/,B&/.

!""#$%&'$()

*(++&),

J./'.5:>)$'

1&2'./$3&'$()

7/&8+.)':>)$'

9$2"#&0

! -.(+.'/0:2'&8.:;.%&+.:/#4%#$&&$73'8*9$33'0*!:'#)'1*+(,)-

! =/(8/&++&;#.:C$&:&22.+;#0:#&)86&8.

! G(:+.+(/0:/.&,2K

! F$+$'.,:=/(8/&+:2$3.

! G(:/.&#:;/&)%A$)8:H'A62:#(("$)8I

*=>:?:@(2'

-/&"A$%2:@&/,B&/.

!""#$%&'$()

*(++&),

J./'.5:>)$'

1&2'./$3&'$()

7/&8+.)':>)$'

9$2"#&0

! 4A$)82:$+"/(C.,:(C./:'$+.L

! J./'.5:6)$':%&):,(:+.+(/0:/.&,2! D&5$+6+:=/(8/&+:2$3.:$)%/.&2.,! M/&)%A$)8:26""(/'! @$8A./:#.C.#:#&)86&8.2:H.N8N:@FOF<:*8I

! G.$'A./:'A.:J./'.5:(/:7/&8+.)':6)$'2:%(6#,:B/$'.:'(:+.+(/0N::*&):()#0:B/$'.:'(:P/&+.:;6PP./

! G(:$)'.8./:+&'A! G(:;$'B$2.:("./&'(/2

*=>:?:@(2'

-/&"A$%2:@&/,B&/.

!""#$%&'$()

*(++&),*=>:?:@(2'

1&2'./$3&

'$()

9$2"#&0

!"#$"%&'()$*#+,-"($&

'()$

4.5'6/.:D.+(/0 4.5'6/.:D.+(/0

-/&"A$%2:@&/,B&/.

! ;(*<==>*?@+A6*7'9$&'*&46)3B*/#4%#$&&$73'8*

! !C23),Q/$66-*$3%4#,)D&6*$334E'0*E#,)'6*)4*

+.+(/0L

! R):"&22:S:B/$'.:'(:P/&+.;6PP./

! 1.;$),:'A.:P/&+.;6PP./ &2:&:'.5'6/.

! 1.&,:$':$):"&22:T<:.'%N

! M6':B./.:$).PP$%$.)'

slide by Matthew Bolitho

History

Page 32: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DE/D/$$0

&

!""#$%&'$()

*(++&),

-.(+.'/0

1&2'./$3&'$()

4.5'6/.

7/&8+.)'

9$2"#&0

! 4.5'6/.:&),:7/&8+.)':2'&8.2:;.%&+.:+(/.:"/(8/&++&;#.<:%(+;$).,:$)'(:

!"#$%&'()*+(,)-*=>:?:@(2'

-/&"A$%2:@&/,B&/.

!""#$%&'$()

*(++&),

-.(+.'/0

1&2'./$3&'$()

7/&8+.)':>)$'

9$2"#&0

! 4.5'6/.:&),:7/&8+.)':2'&8.2:;.%&+.:+(/.:"/(8/&++&;#.<:%(+;$).,:$)'(:

!"#$%&'()*+(,)-

! =/(8/&++&;#.:C$&:&22.+;#0:#&)86&8.! D.+(/0:/.&,2:C$&:'.5'6/.:#((E6"2! !.'/'(0$()-*)'1)2#'*34452/6

! F$+$'.,:=/(8/&+:2$3.

! G(:/.&#:;/&)%A$)8:H'A62:#(("$)8I

*=>:?:@(2'

-/&"A$%2:@&/,B&/.

!""#$%&'$()

*(++&),

J./'.5:>)$'

1&2'./$3&'$()

7/&8+.)':>)$'

9$2"#&0

! -.(+.'/0:2'&8.:;.%&+.:/#4%#$&&$73'8*9$33'0*!:'#)'1*+(,)-

! =/(8/&++&;#.:C$&:&22.+;#0:#&)86&8.

! G(:+.+(/0:/.&,2K

! F$+$'.,:=/(8/&+:2$3.

! G(:/.&#:;/&)%A$)8:H'A62:#(("$)8I

*=>:?:@(2'

-/&"A$%2:@&/,B&/.

!""#$%&'$()

*(++&),

J./'.5:>)$'

1&2'./$3&'$()

7/&8+.)':>)$'

9$2"#&0

! 4A$)82:$+"/(C.,:(C./:'$+.L

! J./'.5:6)$':%&):,(:+.+(/0:/.&,2! D&5$+6+:=/(8/&+:2$3.:$)%/.&2.,! M/&)%A$)8:26""(/'! @$8A./:#.C.#:#&)86&8.2:H.N8N:@FOF<:*8I

! G.$'A./:'A.:J./'.5:(/:7/&8+.)':6)$'2:%(6#,:B/$'.:'(:+.+(/0N::*&):()#0:B/$'.:'(:P/&+.:;6PP./

! G(:$)'.8./:+&'A! G(:;$'B$2.:("./&'(/2

*=>:?:@(2'

-/&"A$%2:@&/,B&/.

!""#$%&'$()

*(++&),*=>:?:@(2'

1&2'./$3&

'$()

9$2"#&0

!"#$"%&'()$*#+,-"($&

'()$

4.5'6/.:D.+(/0 4.5'6/.:D.+(/0

-/&"A$%2:@&/,B&/.

! ;(*<==>*?@+A6*7'9$&'*&46)3B*/#4%#$&&$73'8*

! !C23),Q/$66-*$3%4#,)D&6*$334E'0*E#,)'6*)4*

+.+(/0L

! R):"&22:S:B/$'.:'(:P/&+.;6PP./

! 1.;$),:'A.:P/&+.;6PP./ &2:&:'.5'6/.

! 1.&,:$':$):"&22:T<:.'%N

! M6':B./.:$).PP$%$.)'

slide by Matthew Bolitho

History

Page 33: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DE/D/$$0

&

!""#$%&'$()

*(++&),

-.(+.'/0

1&2'./$3&'$()

4.5'6/.

7/&8+.)'

9$2"#&0

! 4.5'6/.:&),:7/&8+.)':2'&8.2:;.%&+.:+(/.:"/(8/&++&;#.<:%(+;$).,:$)'(:

!"#$%&'()*+(,)-*=>:?:@(2'

-/&"A$%2:@&/,B&/.

!""#$%&'$()

*(++&),

-.(+.'/0

1&2'./$3&'$()

7/&8+.)':>)$'

9$2"#&0

! 4.5'6/.:&),:7/&8+.)':2'&8.2:;.%&+.:+(/.:"/(8/&++&;#.<:%(+;$).,:$)'(:

!"#$%&'()*+(,)-

! =/(8/&++&;#.:C$&:&22.+;#0:#&)86&8.! D.+(/0:/.&,2:C$&:'.5'6/.:#((E6"2! !.'/'(0$()-*)'1)2#'*34452/6

! F$+$'.,:=/(8/&+:2$3.

! G(:/.&#:;/&)%A$)8:H'A62:#(("$)8I

*=>:?:@(2'

-/&"A$%2:@&/,B&/.

!""#$%&'$()

*(++&),

J./'.5:>)$'

1&2'./$3&'$()

7/&8+.)':>)$'

9$2"#&0

! -.(+.'/0:2'&8.:;.%&+.:/#4%#$&&$73'8*9$33'0*!:'#)'1*+(,)-

! =/(8/&++&;#.:C$&:&22.+;#0:#&)86&8.

! G(:+.+(/0:/.&,2K

! F$+$'.,:=/(8/&+:2$3.

! G(:/.&#:;/&)%A$)8:H'A62:#(("$)8I

*=>:?:@(2'

-/&"A$%2:@&/,B&/.

!""#$%&'$()

*(++&),

J./'.5:>)$'

1&2'./$3&'$()

7/&8+.)':>)$'

9$2"#&0

! 4A$)82:$+"/(C.,:(C./:'$+.L

! J./'.5:6)$':%&):,(:+.+(/0:/.&,2! D&5$+6+:=/(8/&+:2$3.:$)%/.&2.,! M/&)%A$)8:26""(/'! @$8A./:#.C.#:#&)86&8.2:H.N8N:@FOF<:*8I

! G.$'A./:'A.:J./'.5:(/:7/&8+.)':6)$'2:%(6#,:B/$'.:'(:+.+(/0N::*&):()#0:B/$'.:'(:P/&+.:;6PP./

! G(:$)'.8./:+&'A! G(:;$'B$2.:("./&'(/2

*=>:?:@(2'

-/&"A$%2:@&/,B&/.

!""#$%&'$()

*(++&),*=>:?:@(2'

1&2'./$3&

'$()

9$2"#&0

!"#$"%&'()$*#+,-"($&

'()$

4.5'6/.:D.+(/0 4.5'6/.:D.+(/0

-/&"A$%2:@&/,B&/.

! ;(*<==>*?@+A6*7'9$&'*&46)3B*/#4%#$&&$73'8*

! !C23),Q/$66-*$3%4#,)D&6*$334E'0*E#,)'6*)4*

+.+(/0L

! R):"&22:S:B/$'.:'(:P/&+.;6PP./

! 1.;$),:'A.:P/&+.;6PP./ &2:&:'.5'6/.

! 1.&,:$':$):"&22:T<:.'%N

! M6':B./.:$).PP$%$.)'

slide by Matthew Bolitho

History

Page 34: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DE/D/$$0

&

!""#$%&'$()

*(++&),

-.(+.'/0

1&2'./$3&'$()

4.5'6/.

7/&8+.)'

9$2"#&0

! 4.5'6/.:&),:7/&8+.)':2'&8.2:;.%&+.:+(/.:"/(8/&++&;#.<:%(+;$).,:$)'(:

!"#$%&'()*+(,)-*=>:?:@(2'

-/&"A$%2:@&/,B&/.

!""#$%&'$()

*(++&),

-.(+.'/0

1&2'./$3&'$()

7/&8+.)':>)$'

9$2"#&0

! 4.5'6/.:&),:7/&8+.)':2'&8.2:;.%&+.:+(/.:"/(8/&++&;#.<:%(+;$).,:$)'(:

!"#$%&'()*+(,)-

! =/(8/&++&;#.:C$&:&22.+;#0:#&)86&8.! D.+(/0:/.&,2:C$&:'.5'6/.:#((E6"2! !.'/'(0$()-*)'1)2#'*34452/6

! F$+$'.,:=/(8/&+:2$3.

! G(:/.&#:;/&)%A$)8:H'A62:#(("$)8I

*=>:?:@(2'

-/&"A$%2:@&/,B&/.

!""#$%&'$()

*(++&),

J./'.5:>)$'

1&2'./$3&'$()

7/&8+.)':>)$'

9$2"#&0

! -.(+.'/0:2'&8.:;.%&+.:/#4%#$&&$73'8*9$33'0*!:'#)'1*+(,)-

! =/(8/&++&;#.:C$&:&22.+;#0:#&)86&8.

! G(:+.+(/0:/.&,2K

! F$+$'.,:=/(8/&+:2$3.

! G(:/.&#:;/&)%A$)8:H'A62:#(("$)8I

*=>:?:@(2'

-/&"A$%2:@&/,B&/.

!""#$%&'$()

*(++&),

J./'.5:>)$'

1&2'./$3&'$()

7/&8+.)':>)$'

9$2"#&0

! 4A$)82:$+"/(C.,:(C./:'$+.L

! J./'.5:6)$':%&):,(:+.+(/0:/.&,2! D&5$+6+:=/(8/&+:2$3.:$)%/.&2.,! M/&)%A$)8:26""(/'! @$8A./:#.C.#:#&)86&8.2:H.N8N:@FOF<:*8I

! G.$'A./:'A.:J./'.5:(/:7/&8+.)':6)$'2:%(6#,:B/$'.:'(:+.+(/0N::*&):()#0:B/$'.:'(:P/&+.:;6PP./

! G(:$)'.8./:+&'A! G(:;$'B$2.:("./&'(/2

*=>:?:@(2'

-/&"A$%2:@&/,B&/.

!""#$%&'$()

*(++&),*=>:?:@(2'

1&2'./$3&

'$()

9$2"#&0

!"#$"%&'()$*#+,-"($&

'()$

4.5'6/.:D.+(/0 4.5'6/.:D.+(/0

-/&"A$%2:@&/,B&/.

! ;(*<==>*?@+A6*7'9$&'*&46)3B*/#4%#$&&$73'8*

! !C23),Q/$66-*$3%4#,)D&6*$334E'0*E#,)'6*)4*

+.+(/0L

! R):"&22:S:B/$'.:'(:P/&+.;6PP./

! 1.;$),:'A.:P/&+.;6PP./ &2:&:'.5'6/.

! 1.&,:$':$):"&22:T<:.'%N

! M6':B./.:$).PP$%$.)'

slide by Matthew Bolitho

History

Page 35: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DE/D/$$0

&

!""#$%&'$()

*(++&),

-.(+.'/0

1&2'./$3&'$()

4.5'6/.

7/&8+.)'

9$2"#&0

! 4.5'6/.:&),:7/&8+.)':2'&8.2:;.%&+.:+(/.:"/(8/&++&;#.<:%(+;$).,:$)'(:

!"#$%&'()*+(,)-*=>:?:@(2'

-/&"A$%2:@&/,B&/.

!""#$%&'$()

*(++&),

-.(+.'/0

1&2'./$3&'$()

7/&8+.)':>)$'

9$2"#&0

! 4.5'6/.:&),:7/&8+.)':2'&8.2:;.%&+.:+(/.:"/(8/&++&;#.<:%(+;$).,:$)'(:

!"#$%&'()*+(,)-

! =/(8/&++&;#.:C$&:&22.+;#0:#&)86&8.! D.+(/0:/.&,2:C$&:'.5'6/.:#((E6"2! !.'/'(0$()-*)'1)2#'*34452/6

! F$+$'.,:=/(8/&+:2$3.

! G(:/.&#:;/&)%A$)8:H'A62:#(("$)8I

*=>:?:@(2'

-/&"A$%2:@&/,B&/.

!""#$%&'$()

*(++&),

J./'.5:>)$'

1&2'./$3&'$()

7/&8+.)':>)$'

9$2"#&0

! -.(+.'/0:2'&8.:;.%&+.:/#4%#$&&$73'8*9$33'0*!:'#)'1*+(,)-

! =/(8/&++&;#.:C$&:&22.+;#0:#&)86&8.

! G(:+.+(/0:/.&,2K

! F$+$'.,:=/(8/&+:2$3.

! G(:/.&#:;/&)%A$)8:H'A62:#(("$)8I

*=>:?:@(2'

-/&"A$%2:@&/,B&/.

!""#$%&'$()

*(++&),

J./'.5:>)$'

1&2'./$3&'$()

7/&8+.)':>)$'

9$2"#&0

! 4A$)82:$+"/(C.,:(C./:'$+.L

! J./'.5:6)$':%&):,(:+.+(/0:/.&,2! D&5$+6+:=/(8/&+:2$3.:$)%/.&2.,! M/&)%A$)8:26""(/'! @$8A./:#.C.#:#&)86&8.2:H.N8N:@FOF<:*8I

! G.$'A./:'A.:J./'.5:(/:7/&8+.)':6)$'2:%(6#,:B/$'.:'(:+.+(/0N::*&):()#0:B/$'.:'(:P/&+.:;6PP./

! G(:$)'.8./:+&'A! G(:;$'B$2.:("./&'(/2

*=>:?:@(2'

-/&"A$%2:@&/,B&/.

!""#$%&'$()

*(++&),*=>:?:@(2'

1&2'./$3&

'$()

9$2"#&0

!"#$"%&'()$*#+,-"($&

'()$

4.5'6/.:D.+(/0 4.5'6/.:D.+(/0

-/&"A$%2:@&/,B&/.

! ;(*<==>*?@+A6*7'9$&'*&46)3B*/#4%#$&&$73'8*

! !C23),Q/$66-*$3%4#,)D&6*$334E'0*E#,)'6*)4*

+.+(/0L

! R):"&22:S:B/$'.:'(:P/&+.;6PP./

! 1.;$),:'A.:P/&+.;6PP./ &2:&:'.5'6/.

! 1.&,:$':$):"&22:T<:.'%N

! M6':B./.:$).PP$%$.)'

slide by Matthew Bolitho

History

Page 36: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DE/D/$$0

F

! !"#$%&"'(%)%&*&%+,#-'././0'1+))2,%&3'45"67././0'8'.","5*('/25$+#"'9+)$2&*&%+,'+,'&:"'./0;

!"!"#$"%&'%()*

! !"#$%&'()&*)+%),&-#.%

! /(*1"'<*&*'%,'&"=&25"#

! !5*6'*'>(*&'?2*<'7+>>@#15"",;

! A5%&"')2(&%@$*##'*(4+5%&:)'2#%,4'B5*4)",&'0,%&'

&+'$"5>+5)'12#&+)'$5+1"##%,4

! 0,<"5@2&%(%C"<':*5<6*5"

! D,(3'2&%(%C"<'B5*4)",&'0,%&

! D>&",')")+53'E*,<6%<&:'(%)%&"<

! .*&:"5@E*#"<'*(4+5%&:)#'+,(3'7,+'#1*&&"5;

! 0#"<'&:"'.5*$:%1#'F/G

F$$(%1*&%+,

9+))*,<9/0'H'I+#&

J*#&"5%C*

&%+,

!%#$(*3

+,%-,.$#/0-1%('),/-$

#/0-

K")+53 K")+53

.5*$:%1#'I*5<6*5"

!,&),-%2$

#/0-

K")+53

! ."+)"&53'0,%&'+$"5*&"#'+,'*'$5%)%&%L"-'1*,'

65%&"'E*1M'&+')")+53

! 9:*,4"#'&+'2,<"5(3%,4':*5<6*5"N

! FE%(%&3'&+'65%&"'&+')")+53

! /-#.0.)12&3+"4)((.#5&'#.%(

! 90!F'%#'&:"',"6'6*3'&+'$"5>+5)'

1+)$2&*&%+,'+,'&:"'./0

! !")(#$%&'()&6+738.4(&9:;

! F5E%&5*53'*11"##'&+')")+53'7#1*&&"5'+5'

4*&:"5;! 0#"#'*(('*L*%(*E("'$5+1"##%,4'2,%&#

! I*#'G,&"4"5')*&:-'O%&6%#"'+$"5*&+5#

slide by Matthew Bolitho

History

Page 37: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DE/D/$$0

F

! !"#$%&"'(%)%&*&%+,#-'././0'1+))2,%&3'45"67././0'8'.","5*('/25$+#"'9+)$2&*&%+,'+,'&:"'./0;

!"!"#$"%&'%()*

! !"#$%&'()&*)+%),&-#.%

! /(*1"'<*&*'%,'&"=&25"#

! !5*6'*'>(*&'?2*<'7+>>@#15"",;

! A5%&"')2(&%@$*##'*(4+5%&:)'2#%,4'B5*4)",&'0,%&'

&+'$"5>+5)'12#&+)'$5+1"##%,4

! 0,<"5@2&%(%C"<':*5<6*5"

! D,(3'2&%(%C"<'B5*4)",&'0,%&

! D>&",')")+53'E*,<6%<&:'(%)%&"<

! .*&:"5@E*#"<'*(4+5%&:)#'+,(3'7,+'#1*&&"5;

! 0#"<'&:"'.5*$:%1#'F/G

F$$(%1*&%+,

9+))*,<9/0'H'I+#&

J*#&"5%C*

&%+,

!%#$(*3

+,%-,.$#/0-1%('),/-$

#/0-

K")+53 K")+53

.5*$:%1#'I*5<6*5"

!,&),-%2$

#/0-

K")+53

! ."+)"&53'0,%&'+$"5*&"#'+,'*'$5%)%&%L"-'1*,'

65%&"'E*1M'&+')")+53

! 9:*,4"#'&+'2,<"5(3%,4':*5<6*5"N

! FE%(%&3'&+'65%&"'&+')")+53

! /-#.0.)12&3+"4)((.#5&'#.%(

! 90!F'%#'&:"',"6'6*3'&+'$"5>+5)'

1+)$2&*&%+,'+,'&:"'./0

! !")(#$%&'()&6+738.4(&9:;

! F5E%&5*53'*11"##'&+')")+53'7#1*&&"5'+5'

4*&:"5;! 0#"#'*(('*L*%(*E("'$5+1"##%,4'2,%&#

! I*#'G,&"4"5')*&:-'O%&6%#"'+$"5*&+5#

slide by Matthew Bolitho

History

Page 38: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DE/D/$$0

F

! !"#$%&"'(%)%&*&%+,#-'././0'1+))2,%&3'45"67././0'8'.","5*('/25$+#"'9+)$2&*&%+,'+,'&:"'./0;

!"!"#$"%&'%()*

! !"#$%&'()&*)+%),&-#.%

! /(*1"'<*&*'%,'&"=&25"#

! !5*6'*'>(*&'?2*<'7+>>@#15"",;

! A5%&"')2(&%@$*##'*(4+5%&:)'2#%,4'B5*4)",&'0,%&'

&+'$"5>+5)'12#&+)'$5+1"##%,4

! 0,<"5@2&%(%C"<':*5<6*5"

! D,(3'2&%(%C"<'B5*4)",&'0,%&

! D>&",')")+53'E*,<6%<&:'(%)%&"<

! .*&:"5@E*#"<'*(4+5%&:)#'+,(3'7,+'#1*&&"5;

! 0#"<'&:"'.5*$:%1#'F/G

F$$(%1*&%+,

9+))*,<9/0'H'I+#&

J*#&"5%C*

&%+,

!%#$(*3

+,%-,.$#/0-1%('),/-$

#/0-

K")+53 K")+53

.5*$:%1#'I*5<6*5"

!,&),-%2$

#/0-

K")+53

! ."+)"&53'0,%&'+$"5*&"#'+,'*'$5%)%&%L"-'1*,'

65%&"'E*1M'&+')")+53

! 9:*,4"#'&+'2,<"5(3%,4':*5<6*5"N

! FE%(%&3'&+'65%&"'&+')")+53

! /-#.0.)12&3+"4)((.#5&'#.%(

! 90!F'%#'&:"',"6'6*3'&+'$"5>+5)'

1+)$2&*&%+,'+,'&:"'./0

! !")(#$%&'()&6+738.4(&9:;

! F5E%&5*53'*11"##'&+')")+53'7#1*&&"5'+5'

4*&:"5;! 0#"#'*(('*L*%(*E("'$5+1"##%,4'2,%&#

! I*#'G,&"4"5')*&:-'O%&6%#"'+$"5*&+5#

slide by Matthew Bolitho

History

Page 39: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DE/D/$$0

F

! !"#$%&"'(%)%&*&%+,#-'././0'1+))2,%&3'45"67././0'8'.","5*('/25$+#"'9+)$2&*&%+,'+,'&:"'./0;

!"!"#$"%&'%()*

! !"#$%&'()&*)+%),&-#.%

! /(*1"'<*&*'%,'&"=&25"#

! !5*6'*'>(*&'?2*<'7+>>@#15"",;

! A5%&"')2(&%@$*##'*(4+5%&:)'2#%,4'B5*4)",&'0,%&'

&+'$"5>+5)'12#&+)'$5+1"##%,4

! 0,<"5@2&%(%C"<':*5<6*5"

! D,(3'2&%(%C"<'B5*4)",&'0,%&

! D>&",')")+53'E*,<6%<&:'(%)%&"<

! .*&:"5@E*#"<'*(4+5%&:)#'+,(3'7,+'#1*&&"5;

! 0#"<'&:"'.5*$:%1#'F/G

F$$(%1*&%+,

9+))*,<9/0'H'I+#&

J*#&"5%C*

&%+,

!%#$(*3

+,%-,.$#/0-1%('),/-$

#/0-

K")+53 K")+53

.5*$:%1#'I*5<6*5"

!,&),-%2$

#/0-

K")+53

! ."+)"&53'0,%&'+$"5*&"#'+,'*'$5%)%&%L"-'1*,'

65%&"'E*1M'&+')")+53

! 9:*,4"#'&+'2,<"5(3%,4':*5<6*5"N

! FE%(%&3'&+'65%&"'&+')")+53

! /-#.0.)12&3+"4)((.#5&'#.%(

! 90!F'%#'&:"',"6'6*3'&+'$"5>+5)'

1+)$2&*&%+,'+,'&:"'./0

! !")(#$%&'()&6+738.4(&9:;

! F5E%&5*53'*11"##'&+')")+53'7#1*&&"5'+5'

4*&:"5;! 0#"#'*(('*L*%(*E("'$5+1"##%,4'2,%&#

! I*#'G,&"4"5')*&:-'O%&6%#"'+$"5*&+5#

slide by Matthew Bolitho

History

Page 40: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

// AnalysisIAP09 CUDA@MIT / 6.963

Page 41: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$0

1+2*3+24(56(738892:(;<=,89<>(?<9-@(A<*B,-@(C-,D2+@,86

/E#E/$$0

F

!"!# !"$#

$"!# $"$#

!%&'() $*(+%,()!%&'()

$*(+%,()

#-+-"&.+/*0+%1& !"!# !"$#

$"!# $"$#

!%&'() $*(+%,()

!%&'()

$*(+%,()

#-+-

"&.+/*0+%1&

!"!# !"$#

$"!# $"$#

!%&'() $*(+%,()

!%&'()

$*(+%,()

#-+-

"&.+/*0+%1& !"!# !"$#

$"!# $"$#

!%&'() $*(+%,()

!%&'()

$*(+%,()

#-+-

"&.+/*0+%1&

!"!# !"$#

$"!# $"$#

!%&'() $*(+%,()

!%&'()

$*(+%,()

"&.+/*0+%1&

"&.+/*0+%1&

$"$#

!"#$%&'(%)*$+

!(, -.(/

0123$1453%&'(%)*$+

(,, 67523%$2

8+4$1& 9$1&

slide by Matthew Bolitho

// Analysis

Page 42: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$0

1+2*3+24(56(738892:(;<=,89<>(?<9-@(A<*B,-@(C-,D2+@,86

/E#E/$$0

&

! !"#$%&'()'*$'+&',)($'',%$)-."#)$/.&0/1)

0#$."2'#'3&

! 45)$."$".&0"3)"-)$."6./#)&7/&)0()$/./11'1

! 85)($'',%$)"-)$/./11'1)$".&0"3)

! 9)('.0/1)/16".0&7#)+/3):')#/,')$/./11'1):;)

!"#$"#%&'!"#$%&'()$!*+%,+-..!,+/0

! <03,)-%3,/#'3&/1)$/.&()"-)&7')/16".0&7#)

&7/&)/.')('$/./:1'

slide by Matthew Bolitho

// Analysis

Page 43: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$0

1+2*3+24(56(738892:(;<=,89<>(?<9-@(A<*B,-@(C-,D2+@,86

/E#E/$$0

&

! !"#$%&'()'*$'+&',)($'',%$)-."#)$/.&0/1)

0#$."2'#'3&

! 45)$."$".&0"3)"-)$."6./#)&7/&)0()$/./11'1

! 85)($'',%$)"-)$/./11'1)$".&0"3)

! 9)('.0/1)/16".0&7#)+/3):')#/,')$/./11'1):;)

!"#$"#%&'!"#$%&'()$!*+%,+-..!,+/0

! <03,)-%3,/#'3&/1)$/.&()"-)&7')/16".0&7#)

&7/&)/.')('$/./:1'

slide by Matthew Bolitho

// Analysis

Page 44: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$0

1+2*3+24(56(738892:(;<=,89<>(?<9-@(A<*B,-@(C-,D2+@,86

/E#E/$$0

#

!"#$%&'()*+)#,-,).

&"-"%&'()*+)#,-,).

/0)1+%!"#$#

203'0%!"#$#

&"-"%45"0,.6

&'()*+)#,-,). &'+'.3'.(7%8."97#,#

!"#$%&'()*+)#,-,).

&"-"%&'()*+)#,-,).

/0)1+%!"#$#

203'0%!"#$#

&"-"%45"0,.6

&'()*+)#,-,). &'+'.3'.(7%8."97#,#

! 896)0,-5*#%(".%:'%3'()*+)#'3%:7%:)-5%-"#$%

".3%3"-";

! !"#$;%<,.3%60)1+#%)=%,.#-01(-,).#%-5"-%(".%:'%

'>'(1-'3%,.%+"0"99'9

! %"&";%<,.3%+"0-,-,).#%,.%-5'%3"-"%-5"-%(".%:'%1#'3%

?0'9"-,@'97A%,.3'+'.3'.-97

!"#$%&'()*+)#,-,).

&"-"%&'()*+)#,-,).

/0)1+%!"#$#

203'0%!"#$#

&"-"%45"0,.6

&'()*+)#,-,). &'+'.3'.(7%8."97#,#

! 8."97B'%-5'%"96)0,-5*%".3%=,.3%!"#$%&'#('

)*&+"$,+)#*& -5"-%"0'%?0'9"-,@'97A%,.3'+'.3'.-

! 8."97B'%-5'%"96)0,-5*%".3%=,.3%!"#$%&'#('

)*&+"$,+)#*& -5"-%"0'%?0'9"-,@'97A%,.3'+'.3'.-

! C6;%D"-0,>%D19-,+9,("-,).

! E)*+1-,.6%'"(5%'9'*'.-%)=%F%,#%"%3)-%+0)31(-

slide by Matthew Bolitho

// Analysis

Page 45: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$0

1+2*3+24(56(738892:(;<=,89<>(?<9-@(A<*B,-@(C-,D2+@,86

/E#E/$$0

#

!"#$%&'()*+)#,-,).

&"-"%&'()*+)#,-,).

/0)1+%!"#$#

203'0%!"#$#

&"-"%45"0,.6

&'()*+)#,-,). &'+'.3'.(7%8."97#,#

!"#$%&'()*+)#,-,).

&"-"%&'()*+)#,-,).

/0)1+%!"#$#

203'0%!"#$#

&"-"%45"0,.6

&'()*+)#,-,). &'+'.3'.(7%8."97#,#

! 896)0,-5*#%(".%:'%3'()*+)#'3%:7%:)-5%-"#$%

".3%3"-";

! !"#$;%<,.3%60)1+#%)=%,.#-01(-,).#%-5"-%(".%:'%

'>'(1-'3%,.%+"0"99'9

! %"&";%<,.3%+"0-,-,).#%,.%-5'%3"-"%-5"-%(".%:'%1#'3%

?0'9"-,@'97A%,.3'+'.3'.-97

!"#$%&'()*+)#,-,).

&"-"%&'()*+)#,-,).

/0)1+%!"#$#

203'0%!"#$#

&"-"%45"0,.6

&'()*+)#,-,). &'+'.3'.(7%8."97#,#

! 8."97B'%-5'%"96)0,-5*%".3%=,.3%!"#$%&'#('

)*&+"$,+)#*& -5"-%"0'%?0'9"-,@'97A%,.3'+'.3'.-

! 8."97B'%-5'%"96)0,-5*%".3%=,.3%!"#$%&'#('

)*&+"$,+)#*& -5"-%"0'%?0'9"-,@'97A%,.3'+'.3'.-

! C6;%D"-0,>%D19-,+9,("-,).

! E)*+1-,.6%'"(5%'9'*'.-%)=%F%,#%"%3)-%+0)31(-

slide by Matthew Bolitho

// Analysis

Page 46: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$0

1+2*3+24(56(738892:(;<=,89<>(?<9-@(A<*B,-@(C-,D2+@,86

/E#E/$$0

#

!"#$%&'()*+)#,-,).

&"-"%&'()*+)#,-,).

/0)1+%!"#$#

203'0%!"#$#

&"-"%45"0,.6

&'()*+)#,-,). &'+'.3'.(7%8."97#,#

!"#$%&'()*+)#,-,).

&"-"%&'()*+)#,-,).

/0)1+%!"#$#

203'0%!"#$#

&"-"%45"0,.6

&'()*+)#,-,). &'+'.3'.(7%8."97#,#

! 896)0,-5*#%(".%:'%3'()*+)#'3%:7%:)-5%-"#$%

".3%3"-";

! !"#$;%<,.3%60)1+#%)=%,.#-01(-,).#%-5"-%(".%:'%

'>'(1-'3%,.%+"0"99'9

! %"&";%<,.3%+"0-,-,).#%,.%-5'%3"-"%-5"-%(".%:'%1#'3%

?0'9"-,@'97A%,.3'+'.3'.-97

!"#$%&'()*+)#,-,).

&"-"%&'()*+)#,-,).

/0)1+%!"#$#

203'0%!"#$#

&"-"%45"0,.6

&'()*+)#,-,). &'+'.3'.(7%8."97#,#

! 8."97B'%-5'%"96)0,-5*%".3%=,.3%!"#$%&'#('

)*&+"$,+)#*& -5"-%"0'%?0'9"-,@'97A%,.3'+'.3'.-

! 8."97B'%-5'%"96)0,-5*%".3%=,.3%!"#$%&'#('

)*&+"$,+)#*& -5"-%"0'%?0'9"-,@'97A%,.3'+'.3'.-

! C6;%D"-0,>%D19-,+9,("-,).

! E)*+1-,.6%'"(5%'9'*'.-%)=%F%,#%"%3)-%+0)31(-

slide by Matthew Bolitho

// Analysis

Page 47: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$0

1+2*3+24(56(738892:(;<=,89<>(?<9-@(A<*B,-@(C-,D2+@,86

/E#E/$$0

'

! !"#$%&'()*'(#$+,-.)*/(#"0(1."0(!"#$%&'#('

)*&+"$,+)#*& )*#)(#-'(2-'$#).3'$%4(."0'5'"0'")

! 6+7(8,$'9:$#-(;%"#/.9<! =,/5:)'>.?-#).,"#$@,-9'<

! =,/5:)'A,)#).,"#$@,-9'<

! =,/5:)';.*'0-#$@,-9'<

! =,/5:)'B'.+*?,:-<

! =,/5:)'B,"C,"0."+@,-9'<

! D50#)'E,<.).,"<!"0>'$,9.).'<

F#<G(;'9,/5,<.).,"

;#)#(;'9,/5,<.).,"

H-,:5(F#<G<

I-0'-(F#<G<

;#)#(J*#-."+

;'9,/5,<.).," ;'5'"0'"9%(!"#$%<.<

! !"#$%&'()*'(#$+,-.)*/(),(1."0(K#%<(),(

%-"+)+)#*'+./'0-+-

! 6+7(8#)-.L(8:$).5$.9#).,"7(=,$:/"<(#"0(A,K<

1 2

! !"#$%&'()*'(#$+,-.)*/(),(1."0(K#%<(),(

%-"+)+)#*'+./'0-+-

! 6+7(8#)-.L(8:$).5$.9#).,"7(C$,9G<

1 2

! !"#$%&'()*'(#$+,-.)*/(),(1."0(K#%<(),(

%-"+)+)#*'+./'0-+-

! 6+7(8#)-.L(8:$).5$.9#).,"

1 2

! @."0(K#%<(),(5#-).).,"()*'(0#)#

! 6+7(8#)-.L(8:$).5$.9#).,"

1 2

slide by Matthew Bolitho

// Analysis

Page 48: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$0

1+2*3+24(56(738892:(;<=,89<>(?<9-@(A<*B,-@(C-,D2+@,86

/E#E/$$0

'

! !"#$%&'()*'(#$+,-.)*/(#"0(1."0(!"#$%&'#('

)*&+"$,+)#*& )*#)(#-'(2-'$#).3'$%4(."0'5'"0'")

! 6+7(8,$'9:$#-(;%"#/.9<! =,/5:)'>.?-#).,"#$@,-9'<

! =,/5:)'A,)#).,"#$@,-9'<

! =,/5:)';.*'0-#$@,-9'<

! =,/5:)'B'.+*?,:-<

! =,/5:)'B,"C,"0."+@,-9'<

! D50#)'E,<.).,"<!"0>'$,9.).'<

F#<G(;'9,/5,<.).,"

;#)#(;'9,/5,<.).,"

H-,:5(F#<G<

I-0'-(F#<G<

;#)#(J*#-."+

;'9,/5,<.).," ;'5'"0'"9%(!"#$%<.<

! !"#$%&'()*'(#$+,-.)*/(),(1."0(K#%<(),(

%-"+)+)#*'+./'0-+-

! 6+7(8#)-.L(8:$).5$.9#).,"7(=,$:/"<(#"0(A,K<

1 2

! !"#$%&'()*'(#$+,-.)*/(),(1."0(K#%<(),(

%-"+)+)#*'+./'0-+-

! 6+7(8#)-.L(8:$).5$.9#).,"7(C$,9G<

1 2

! !"#$%&'()*'(#$+,-.)*/(),(1."0(K#%<(),(

%-"+)+)#*'+./'0-+-

! 6+7(8#)-.L(8:$).5$.9#).,"

1 2

! @."0(K#%<(),(5#-).).,"()*'(0#)#

! 6+7(8#)-.L(8:$).5$.9#).,"

1 2

slide by Matthew Bolitho

// Analysis

Page 49: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$0

1+2*3+24(56(738892:(;<=,89<>(?<9-@(A<*B,-@(C-,D2+@,86

/E#E/$$0

'

! !"#$%&'()*'(#$+,-.)*/(#"0(1."0(!"#$%&'#('

)*&+"$,+)#*& )*#)(#-'(2-'$#).3'$%4(."0'5'"0'")

! 6+7(8,$'9:$#-(;%"#/.9<! =,/5:)'>.?-#).,"#$@,-9'<

! =,/5:)'A,)#).,"#$@,-9'<

! =,/5:)';.*'0-#$@,-9'<

! =,/5:)'B'.+*?,:-<

! =,/5:)'B,"C,"0."+@,-9'<

! D50#)'E,<.).,"<!"0>'$,9.).'<

F#<G(;'9,/5,<.).,"

;#)#(;'9,/5,<.).,"

H-,:5(F#<G<

I-0'-(F#<G<

;#)#(J*#-."+

;'9,/5,<.).," ;'5'"0'"9%(!"#$%<.<

! !"#$%&'()*'(#$+,-.)*/(),(1."0(K#%<(),(

%-"+)+)#*'+./'0-+-

! 6+7(8#)-.L(8:$).5$.9#).,"7(=,$:/"<(#"0(A,K<

1 2

! !"#$%&'()*'(#$+,-.)*/(),(1."0(K#%<(),(

%-"+)+)#*'+./'0-+-

! 6+7(8#)-.L(8:$).5$.9#).,"7(C$,9G<

1 2

! !"#$%&'()*'(#$+,-.)*/(),(1."0(K#%<(),(

%-"+)+)#*'+./'0-+-

! 6+7(8#)-.L(8:$).5$.9#).,"

1 2

! @."0(K#%<(),(5#-).).,"()*'(0#)#

! 6+7(8#)-.L(8:$).5$.9#).,"

1 2

slide by Matthew Bolitho

// Analysis

Page 50: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$0

1+2*3+24(56(738892:(;<=,89<>(?<9-@(A<*B,-@(C-,D2+@,86

/E#E/$$0

'

! !"#$%&'()*'(#$+,-.)*/(#"0(1."0(!"#$%&'#('

)*&+"$,+)#*& )*#)(#-'(2-'$#).3'$%4(."0'5'"0'")

! 6+7(8,$'9:$#-(;%"#/.9<! =,/5:)'>.?-#).,"#$@,-9'<

! =,/5:)'A,)#).,"#$@,-9'<

! =,/5:)';.*'0-#$@,-9'<

! =,/5:)'B'.+*?,:-<

! =,/5:)'B,"C,"0."+@,-9'<

! D50#)'E,<.).,"<!"0>'$,9.).'<

F#<G(;'9,/5,<.).,"

;#)#(;'9,/5,<.).,"

H-,:5(F#<G<

I-0'-(F#<G<

;#)#(J*#-."+

;'9,/5,<.).," ;'5'"0'"9%(!"#$%<.<

! !"#$%&'()*'(#$+,-.)*/(),(1."0(K#%<(),(

%-"+)+)#*'+./'0-+-

! 6+7(8#)-.L(8:$).5$.9#).,"7(=,$:/"<(#"0(A,K<

1 2

! !"#$%&'()*'(#$+,-.)*/(),(1."0(K#%<(),(

%-"+)+)#*'+./'0-+-

! 6+7(8#)-.L(8:$).5$.9#).,"7(C$,9G<

1 2

! !"#$%&'()*'(#$+,-.)*/(),(1."0(K#%<(),(

%-"+)+)#*'+./'0-+-

! 6+7(8#)-.L(8:$).5$.9#).,"

1 2

! @."0(K#%<(),(5#-).).,"()*'(0#)#

! 6+7(8#)-.L(8:$).5$.9#).,"

1 2

slide by Matthew Bolitho

// Analysis

Page 51: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$0

1+2*3+24(56(738892:(;<=,89<>(?<9-@(A<*B,-@(C-,D2+@,86

/E#E/$$0

0

!"#$%&'()*+)#,-,).

&"-"%&'()*+)#,-,).

/0)1+%!"#$#

203'0%!"#$#

&"-"%45"0,.6

&'()*+)#,-,). &'+'.3'.(7%8."97#,# ! !5'0'%"0'%*".7%:"7#%-)%3'()*+)#'%".7%

6,;'.%"96)0,-5*

! 4)*'-,*'#%3"-"%3'()*+)#'%'"#,97

! 4)*'-,*'#%-"#$#%3'()*+)#'%'"#,97

! 4)*'-,*'#%<)-5=! 4)*'-,*'#%.',-5'0=

!"#$%&'()*+)#,-,).

&"-"%&'()*+)#,-,).

/0)1+%!"#$#

203'0%!"#$#

&"-"%45"0,.6

&'()*+)#,-,). &'+'.3'.(7%8."97#,# ! 2.('%-5'%"96)0,-5*%5"#%<''.%3'()*+)#'3%

,.-)%3"-"%".3%-"#$#>

!8."97?' @.-'0"(-,).#

!"#$%&'()*+)#,-,).

&"-"%&'()*+)#,-,).

/0)1+%!"#$#

203'0%!"#$#

&"-"%45"0,.6

&'()*+)#,-,). &'+'.3'.(7%8."97#,# ! !)%'"#'%-5'%*"."6'*'.-%)A%3'+'.3'.(,'#%

A,.3%-"#$#%-5"-%"0'%#,*,9"0%".3%60)1+%-5'*

! !5'.%"."97?'%().#-0",.-#%-)%3'-'0*,.'%".7%

.'('##"07%)03'0

slide by Matthew Bolitho

// Analysis

Page 52: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$0

1+2*3+24(56(738892:(;<=,89<>(?<9-@(A<*B,-@(C-,D2+@,86

/E#E/$$0

0

!"#$%&'()*+)#,-,).

&"-"%&'()*+)#,-,).

/0)1+%!"#$#

203'0%!"#$#

&"-"%45"0,.6

&'()*+)#,-,). &'+'.3'.(7%8."97#,# ! !5'0'%"0'%*".7%:"7#%-)%3'()*+)#'%".7%

6,;'.%"96)0,-5*

! 4)*'-,*'#%3"-"%3'()*+)#'%'"#,97

! 4)*'-,*'#%-"#$#%3'()*+)#'%'"#,97

! 4)*'-,*'#%<)-5=! 4)*'-,*'#%.',-5'0=

!"#$%&'()*+)#,-,).

&"-"%&'()*+)#,-,).

/0)1+%!"#$#

203'0%!"#$#

&"-"%45"0,.6

&'()*+)#,-,). &'+'.3'.(7%8."97#,# ! 2.('%-5'%"96)0,-5*%5"#%<''.%3'()*+)#'3%

,.-)%3"-"%".3%-"#$#>

!8."97?' @.-'0"(-,).#

!"#$%&'()*+)#,-,).

&"-"%&'()*+)#,-,).

/0)1+%!"#$#

203'0%!"#$#

&"-"%45"0,.6

&'()*+)#,-,). &'+'.3'.(7%8."97#,# ! !)%'"#'%-5'%*"."6'*'.-%)A%3'+'.3'.(,'#%

A,.3%-"#$#%-5"-%"0'%#,*,9"0%".3%60)1+%-5'*

! !5'.%"."97?'%().#-0",.-#%-)%3'-'0*,.'%".7%

.'('##"07%)03'0

slide by Matthew Bolitho

// Analysis

Page 53: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$0

1+2*3+24(56(738892:(;<=,89<>(?<9-@(A<*B,-@(C-,D2+@,86

/E#E/$$0

0

!"#$%&'()*+)#,-,).

&"-"%&'()*+)#,-,).

/0)1+%!"#$#

203'0%!"#$#

&"-"%45"0,.6

&'()*+)#,-,). &'+'.3'.(7%8."97#,# ! !5'0'%"0'%*".7%:"7#%-)%3'()*+)#'%".7%

6,;'.%"96)0,-5*

! 4)*'-,*'#%3"-"%3'()*+)#'%'"#,97

! 4)*'-,*'#%-"#$#%3'()*+)#'%'"#,97

! 4)*'-,*'#%<)-5=! 4)*'-,*'#%.',-5'0=

!"#$%&'()*+)#,-,).

&"-"%&'()*+)#,-,).

/0)1+%!"#$#

203'0%!"#$#

&"-"%45"0,.6

&'()*+)#,-,). &'+'.3'.(7%8."97#,# ! 2.('%-5'%"96)0,-5*%5"#%<''.%3'()*+)#'3%

,.-)%3"-"%".3%-"#$#>

!8."97?' @.-'0"(-,).#

!"#$%&'()*+)#,-,).

&"-"%&'()*+)#,-,).

/0)1+%!"#$#

203'0%!"#$#

&"-"%45"0,.6

&'()*+)#,-,). &'+'.3'.(7%8."97#,# ! !)%'"#'%-5'%*"."6'*'.-%)A%3'+'.3'.(,'#%

A,.3%-"#$#%-5"-%"0'%#,*,9"0%".3%60)1+%-5'*

! !5'.%"."97?'%().#-0",.-#%-)%3'-'0*,.'%".7%

.'('##"07%)03'0

slide by Matthew Bolitho

// Analysis

Page 54: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$0

1+2*3+24(56(738892:(;<=,89<>(?<9-@(A<*B,-@(C-,D2+@,86

/E#E/$$0

0

!"#$%&'()*+)#,-,).

&"-"%&'()*+)#,-,).

/0)1+%!"#$#

203'0%!"#$#

&"-"%45"0,.6

&'()*+)#,-,). &'+'.3'.(7%8."97#,# ! !5'0'%"0'%*".7%:"7#%-)%3'()*+)#'%".7%

6,;'.%"96)0,-5*

! 4)*'-,*'#%3"-"%3'()*+)#'%'"#,97

! 4)*'-,*'#%-"#$#%3'()*+)#'%'"#,97

! 4)*'-,*'#%<)-5=! 4)*'-,*'#%.',-5'0=

!"#$%&'()*+)#,-,).

&"-"%&'()*+)#,-,).

/0)1+%!"#$#

203'0%!"#$#

&"-"%45"0,.6

&'()*+)#,-,). &'+'.3'.(7%8."97#,# ! 2.('%-5'%"96)0,-5*%5"#%<''.%3'()*+)#'3%

,.-)%3"-"%".3%-"#$#>

!8."97?' @.-'0"(-,).#

!"#$%&'()*+)#,-,).

&"-"%&'()*+)#,-,).

/0)1+%!"#$#

203'0%!"#$#

&"-"%45"0,.6

&'()*+)#,-,). &'+'.3'.(7%8."97#,# ! !)%'"#'%-5'%*"."6'*'.-%)A%3'+'.3'.(,'#%

A,.3%-"#$#%-5"-%"0'%#,*,9"0%".3%60)1+%-5'*

! !5'.%"."97?'%().#-0",.-#%-)%3'-'0*,.'%".7%

.'('##"07%)03'0

slide by Matthew Bolitho

// Analysis

Page 55: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$0

1+2*3+24(56(738892:(;<=,89<>(?<9-@(A<*B,-@(C-,D2+@,86

/E#E/$$0

F

! !"#$%&$#'($#)%*%+$)$*'#",#-$.$*-$*/0$&#,0*-#'%&1&#'(%'#%2$#&0)03%2#%*-#+2"4.#'($)

! 5+6#7"3$/43%2#89*%)0/&! :").4'$;0<2%'0"*%3="2/$&

! :").4'$>"'%'0"*%3="2/$&

! :").4'$80($-2%3="2/$&

! :").4'$?$0+(<"42&

! :").4'$?"*@"*-0*+="2/$&

! A.-%'$B"&0'0"*&C*-;$3"/0'0$&

! !"#$%&$#'($#)%*%+$)$*'#",#-$.$*-$*/0$&#,0*-#'%&1&#'(%'#%2$#&0)03%2#%*-#+2"4.#'($)

! 5+6#7"3$/43%2#89*%)0/&! :").4'$;0<2%'0"*%3="2/$&

! :").4'$>"'%'0"*%3="2/$&

! :").4'$80($-2%3="2/$&

! :").4'$?$0+(<"42&

! :").4'$?"*@"*-0*+="2/$&

! A.-%'$B"&0'0"*&C*-;$3"/0'0$&

! :").4'$#@"*-$-#="2/$&! :").4'$;0<2%'0"*%3="2/$&

! :").4'$>"'%'0"*%3="2/$&

! :").4'$80($-2%3="2/$&

! :").4'$#?$0+(<"42&! :").4'$#?"*D@"*-0*+#="2/$&

! A.-%'$B"&0'0"*&C*-;$3"/0'0$&

! E*/$#+2"4.&#",#'%&1&#%2$#0-$*'0,0$-F#-%'%#,3"G#

/"*&'2%0*'&#$*,"2/$#%#.%2'0%3#"2-$26

! E*/$#+2"4.&#",#'%&1&#%2$#0-$*'0,0$-F#-%'%#,3"G#

/"*&'2%0*'&#$*,"2/$#%#.%2'0%3#"2-$26

A.-%'$#B"&0'0"*&#%*-#;$3"/0'0$&

?"*#@"*-$-#="2/$&

?$0+(<"2#H0&'

@"*-$-#="2/$&

!%&1#8$/")."&0'0"*

8%'%#8$/")."&0'0"*

I2"4.#!%&1&

E2-$2#!%&1&

8%'%#J(%20*+

8$/")."&0'0"* 8$.$*-$*/9#C*%39&0&

slide by Matthew Bolitho

// Analysis

Page 56: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$0

1+2*3+24(56(738892:(;<=,89<>(?<9-@(A<*B,-@(C-,D2+@,86

/E#E/$$0

F

! !"#$%&$#'($#)%*%+$)$*'#",#-$.$*-$*/0$&#,0*-#'%&1&#'(%'#%2$#&0)03%2#%*-#+2"4.#'($)

! 5+6#7"3$/43%2#89*%)0/&! :").4'$;0<2%'0"*%3="2/$&

! :").4'$>"'%'0"*%3="2/$&

! :").4'$80($-2%3="2/$&

! :").4'$?$0+(<"42&

! :").4'$?"*@"*-0*+="2/$&

! A.-%'$B"&0'0"*&C*-;$3"/0'0$&

! !"#$%&$#'($#)%*%+$)$*'#",#-$.$*-$*/0$&#,0*-#'%&1&#'(%'#%2$#&0)03%2#%*-#+2"4.#'($)

! 5+6#7"3$/43%2#89*%)0/&! :").4'$;0<2%'0"*%3="2/$&

! :").4'$>"'%'0"*%3="2/$&

! :").4'$80($-2%3="2/$&

! :").4'$?$0+(<"42&

! :").4'$?"*@"*-0*+="2/$&

! A.-%'$B"&0'0"*&C*-;$3"/0'0$&

! :").4'$#@"*-$-#="2/$&! :").4'$;0<2%'0"*%3="2/$&

! :").4'$>"'%'0"*%3="2/$&

! :").4'$80($-2%3="2/$&

! :").4'$#?$0+(<"42&! :").4'$#?"*D@"*-0*+#="2/$&

! A.-%'$B"&0'0"*&C*-;$3"/0'0$&

! E*/$#+2"4.&#",#'%&1&#%2$#0-$*'0,0$-F#-%'%#,3"G#

/"*&'2%0*'&#$*,"2/$#%#.%2'0%3#"2-$26

! E*/$#+2"4.&#",#'%&1&#%2$#0-$*'0,0$-F#-%'%#,3"G#

/"*&'2%0*'&#$*,"2/$#%#.%2'0%3#"2-$26

A.-%'$#B"&0'0"*&#%*-#;$3"/0'0$&

?"*#@"*-$-#="2/$&

?$0+(<"2#H0&'

@"*-$-#="2/$&

!%&1#8$/")."&0'0"*

8%'%#8$/")."&0'0"*

I2"4.#!%&1&

E2-$2#!%&1&

8%'%#J(%20*+

8$/")."&0'0"* 8$.$*-$*/9#C*%39&0&

slide by Matthew Bolitho

// Analysis

Page 57: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$0

1+2*3+24(56(738892:(;<=,89<>(?<9-@(A<*B,-@(C-,D2+@,86

/E#E/$$0

F

! !"#$%&$#'($#)%*%+$)$*'#",#-$.$*-$*/0$&#,0*-#'%&1&#'(%'#%2$#&0)03%2#%*-#+2"4.#'($)

! 5+6#7"3$/43%2#89*%)0/&! :").4'$;0<2%'0"*%3="2/$&

! :").4'$>"'%'0"*%3="2/$&

! :").4'$80($-2%3="2/$&

! :").4'$?$0+(<"42&

! :").4'$?"*@"*-0*+="2/$&

! A.-%'$B"&0'0"*&C*-;$3"/0'0$&

! !"#$%&$#'($#)%*%+$)$*'#",#-$.$*-$*/0$&#,0*-#'%&1&#'(%'#%2$#&0)03%2#%*-#+2"4.#'($)

! 5+6#7"3$/43%2#89*%)0/&! :").4'$;0<2%'0"*%3="2/$&

! :").4'$>"'%'0"*%3="2/$&

! :").4'$80($-2%3="2/$&

! :").4'$?$0+(<"42&

! :").4'$?"*@"*-0*+="2/$&

! A.-%'$B"&0'0"*&C*-;$3"/0'0$&

! :").4'$#@"*-$-#="2/$&! :").4'$;0<2%'0"*%3="2/$&

! :").4'$>"'%'0"*%3="2/$&

! :").4'$80($-2%3="2/$&

! :").4'$#?$0+(<"42&! :").4'$#?"*D@"*-0*+#="2/$&

! A.-%'$B"&0'0"*&C*-;$3"/0'0$&

! E*/$#+2"4.&#",#'%&1&#%2$#0-$*'0,0$-F#-%'%#,3"G#

/"*&'2%0*'&#$*,"2/$#%#.%2'0%3#"2-$26

! E*/$#+2"4.&#",#'%&1&#%2$#0-$*'0,0$-F#-%'%#,3"G#

/"*&'2%0*'&#$*,"2/$#%#.%2'0%3#"2-$26

A.-%'$#B"&0'0"*&#%*-#;$3"/0'0$&

?"*#@"*-$-#="2/$&

?$0+(<"2#H0&'

@"*-$-#="2/$&

!%&1#8$/")."&0'0"*

8%'%#8$/")."&0'0"*

I2"4.#!%&1&

E2-$2#!%&1&

8%'%#J(%20*+

8$/")."&0'0"* 8$.$*-$*/9#C*%39&0&

slide by Matthew Bolitho

// Analysis

Page 58: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$0

1+2*3+24(56(738892:(;<=,89<>(?<9-@(A<*B,-@(C-,D2+@,86

/E#E/$$0

F

! !"#$%&$#'($#)%*%+$)$*'#",#-$.$*-$*/0$&#,0*-#'%&1&#'(%'#%2$#&0)03%2#%*-#+2"4.#'($)

! 5+6#7"3$/43%2#89*%)0/&! :").4'$;0<2%'0"*%3="2/$&

! :").4'$>"'%'0"*%3="2/$&

! :").4'$80($-2%3="2/$&

! :").4'$?$0+(<"42&

! :").4'$?"*@"*-0*+="2/$&

! A.-%'$B"&0'0"*&C*-;$3"/0'0$&

! !"#$%&$#'($#)%*%+$)$*'#",#-$.$*-$*/0$&#,0*-#'%&1&#'(%'#%2$#&0)03%2#%*-#+2"4.#'($)

! 5+6#7"3$/43%2#89*%)0/&! :").4'$;0<2%'0"*%3="2/$&

! :").4'$>"'%'0"*%3="2/$&

! :").4'$80($-2%3="2/$&

! :").4'$?$0+(<"42&

! :").4'$?"*@"*-0*+="2/$&

! A.-%'$B"&0'0"*&C*-;$3"/0'0$&

! :").4'$#@"*-$-#="2/$&! :").4'$;0<2%'0"*%3="2/$&

! :").4'$>"'%'0"*%3="2/$&

! :").4'$80($-2%3="2/$&

! :").4'$#?$0+(<"42&! :").4'$#?"*D@"*-0*+#="2/$&

! A.-%'$B"&0'0"*&C*-;$3"/0'0$&

! E*/$#+2"4.&#",#'%&1&#%2$#0-$*'0,0$-F#-%'%#,3"G#

/"*&'2%0*'&#$*,"2/$#%#.%2'0%3#"2-$26

! E*/$#+2"4.&#",#'%&1&#%2$#0-$*'0,0$-F#-%'%#,3"G#

/"*&'2%0*'&#$*,"2/$#%#.%2'0%3#"2-$26

A.-%'$#B"&0'0"*&#%*-#;$3"/0'0$&

?"*#@"*-$-#="2/$&

?$0+(<"2#H0&'

@"*-$-#="2/$&

!%&1#8$/")."&0'0"*

8%'%#8$/")."&0'0"*

I2"4.#!%&1&

E2-$2#!%&1&

8%'%#J(%20*+

8$/")."&0'0"* 8$.$*-$*/9#C*%39&0&

slide by Matthew Bolitho

// Analysis

Page 59: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$0

1+2*3+24(56(738892:(;<=,89<>(?<9-@(A<*B,-@(C-,D2+@,86

/E#E/$$0

F$

! !"#$%&'()*'++,%-(.$($.%/(-0&1%-2%)'131%'".%

&'()*)*-"1%-2%.')'%'($%*.$")*2*$.4%'"'+,5$%)6$%

!"#"$%&"'()*$)6')%-##0(1

! 7')'%16'(*"/%#'"%8$%#')$/-(*5$.%'19

! :$'.;-"+,

! <22$#)*=$+,%>-#'+

! :$'.;?(*)$

! @##0A0+')$

! B0+)*&+$%:$'.CD*"/+$%?(*)$

+,"!-.)/0

! 7')'%*1%($'.4%80)%"-)%E(*))$"

! F-%#-"1*1)$"#,%&(-8+$A1

! :$&+*#')*-"%*"%.*1)(*80)$.%1,1)$A

122,3#(4,/0-5.3"/

! 7')'%*1%($'.%'".%E(*))$"

! 7')'%*1%&'()*)*-"$.%*")-%1081$)1

! !"$%)'13%&$(%1081$)

! G'"%.*1)(*80)$%1081$)1

+,"!-6'(#,

! 7')'%*1%($'.%'".%E(*))$"

! B'",%)'131%'##$11%A'",%.')'

! G-"1*1)$"#,%*110$1

! B-1)%.*22*#0+)%)-%.$'+%E*)6

+,"!-6'(#,$!733898/"#(.)%

! @1%&$(%:$'.;?(*)$4%'+)6-0/6%E(*)$1%#-"1*1)%-2%'"%

'##0A0+')*-"%-&$(')*-"

! G-AA-"%*"%($.0#)*-";),&$%'+/-(*)6A1

! G'"%($&+*#')$%1*"#$%'##0A0+')*-"%#'"%8$%+*"$'(

slide by Matthew Bolitho

// Analysis

Page 60: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$0

1+2*3+24(56(738892:(;<=,89<>(?<9-@(A<*B,-@(C-,D2+@,86

/E#E/$$0

F$

! !"#$%&'()*'++,%-(.$($.%/(-0&1%-2%)'131%'".%

&'()*)*-"1%-2%.')'%'($%*.$")*2*$.4%'"'+,5$%)6$%

!"#"$%&"'()*$)6')%-##0(1

! 7')'%16'(*"/%#'"%8$%#')$/-(*5$.%'19

! :$'.;-"+,

! <22$#)*=$+,%>-#'+

! :$'.;?(*)$

! @##0A0+')$

! B0+)*&+$%:$'.CD*"/+$%?(*)$

+,"!-.)/0

! 7')'%*1%($'.4%80)%"-)%E(*))$"

! F-%#-"1*1)$"#,%&(-8+$A1

! :$&+*#')*-"%*"%.*1)(*80)$.%1,1)$A

122,3#(4,/0-5.3"/

! 7')'%*1%($'.%'".%E(*))$"

! 7')'%*1%&'()*)*-"$.%*")-%1081$)1

! !"$%)'13%&$(%1081$)

! G'"%.*1)(*80)$%1081$)1

+,"!-6'(#,

! 7')'%*1%($'.%'".%E(*))$"

! B'",%)'131%'##$11%A'",%.')'

! G-"1*1)$"#,%*110$1

! B-1)%.*22*#0+)%)-%.$'+%E*)6

+,"!-6'(#,$!733898/"#(.)%

! @1%&$(%:$'.;?(*)$4%'+)6-0/6%E(*)$1%#-"1*1)%-2%'"%

'##0A0+')*-"%-&$(')*-"

! G-AA-"%*"%($.0#)*-";),&$%'+/-(*)6A1

! G'"%($&+*#')$%1*"#$%'##0A0+')*-"%#'"%8$%+*"$'(

slide by Matthew Bolitho

// Analysis

Page 61: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$0

1+2*3+24(56(738892:(;<=,89<>(?<9-@(A<*B,-@(C-,D2+@,86

/E#E/$$0

F$

! !"#$%&'()*'++,%-(.$($.%/(-0&1%-2%)'131%'".%

&'()*)*-"1%-2%.')'%'($%*.$")*2*$.4%'"'+,5$%)6$%

!"#"$%&"'()*$)6')%-##0(1

! 7')'%16'(*"/%#'"%8$%#')$/-(*5$.%'19

! :$'.;-"+,

! <22$#)*=$+,%>-#'+

! :$'.;?(*)$

! @##0A0+')$

! B0+)*&+$%:$'.CD*"/+$%?(*)$

+,"!-.)/0

! 7')'%*1%($'.4%80)%"-)%E(*))$"

! F-%#-"1*1)$"#,%&(-8+$A1

! :$&+*#')*-"%*"%.*1)(*80)$.%1,1)$A

122,3#(4,/0-5.3"/

! 7')'%*1%($'.%'".%E(*))$"

! 7')'%*1%&'()*)*-"$.%*")-%1081$)1

! !"$%)'13%&$(%1081$)

! G'"%.*1)(*80)$%1081$)1

+,"!-6'(#,

! 7')'%*1%($'.%'".%E(*))$"

! B'",%)'131%'##$11%A'",%.')'

! G-"1*1)$"#,%*110$1

! B-1)%.*22*#0+)%)-%.$'+%E*)6

+,"!-6'(#,$!733898/"#(.)%

! @1%&$(%:$'.;?(*)$4%'+)6-0/6%E(*)$1%#-"1*1)%-2%'"%

'##0A0+')*-"%-&$(')*-"

! G-AA-"%*"%($.0#)*-";),&$%'+/-(*)6A1

! G'"%($&+*#')$%1*"#$%'##0A0+')*-"%#'"%8$%+*"$'(

slide by Matthew Bolitho

// Analysis

Page 62: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$0

1+2*3+24(56(738892:(;<=,89<>(?<9-@(A<*B,-@(C-,D2+@,86

/E#E/$$0

F$

! !"#$%&'()*'++,%-(.$($.%/(-0&1%-2%)'131%'".%

&'()*)*-"1%-2%.')'%'($%*.$")*2*$.4%'"'+,5$%)6$%

!"#"$%&"'()*$)6')%-##0(1

! 7')'%16'(*"/%#'"%8$%#')$/-(*5$.%'19

! :$'.;-"+,

! <22$#)*=$+,%>-#'+

! :$'.;?(*)$

! @##0A0+')$

! B0+)*&+$%:$'.CD*"/+$%?(*)$

+,"!-.)/0

! 7')'%*1%($'.4%80)%"-)%E(*))$"

! F-%#-"1*1)$"#,%&(-8+$A1

! :$&+*#')*-"%*"%.*1)(*80)$.%1,1)$A

122,3#(4,/0-5.3"/

! 7')'%*1%($'.%'".%E(*))$"

! 7')'%*1%&'()*)*-"$.%*")-%1081$)1

! !"$%)'13%&$(%1081$)

! G'"%.*1)(*80)$%1081$)1

+,"!-6'(#,

! 7')'%*1%($'.%'".%E(*))$"

! B'",%)'131%'##$11%A'",%.')'

! G-"1*1)$"#,%*110$1

! B-1)%.*22*#0+)%)-%.$'+%E*)6

+,"!-6'(#,$!733898/"#(.)%

! @1%&$(%:$'.;?(*)$4%'+)6-0/6%E(*)$1%#-"1*1)%-2%'"%

'##0A0+')*-"%-&$(')*-"

! G-AA-"%*"%($.0#)*-";),&$%'+/-(*)6A1

! G'"%($&+*#')$%1*"#$%'##0A0+')*-"%#'"%8$%+*"$'(

slide by Matthew Bolitho

// Analysis

Page 63: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$0

1+2*3+24(56(738892:(;<=,89<>(?<9-@(A<*B,-@(C-,D2+@,86

/E#E/$$0

F$

! !"#$%&'()*'++,%-(.$($.%/(-0&1%-2%)'131%'".%

&'()*)*-"1%-2%.')'%'($%*.$")*2*$.4%'"'+,5$%)6$%

!"#"$%&"'()*$)6')%-##0(1

! 7')'%16'(*"/%#'"%8$%#')$/-(*5$.%'19

! :$'.;-"+,

! <22$#)*=$+,%>-#'+

! :$'.;?(*)$

! @##0A0+')$

! B0+)*&+$%:$'.CD*"/+$%?(*)$

+,"!-.)/0

! 7')'%*1%($'.4%80)%"-)%E(*))$"

! F-%#-"1*1)$"#,%&(-8+$A1

! :$&+*#')*-"%*"%.*1)(*80)$.%1,1)$A

122,3#(4,/0-5.3"/

! 7')'%*1%($'.%'".%E(*))$"

! 7')'%*1%&'()*)*-"$.%*")-%1081$)1

! !"$%)'13%&$(%1081$)

! G'"%.*1)(*80)$%1081$)1

+,"!-6'(#,

! 7')'%*1%($'.%'".%E(*))$"

! B'",%)'131%'##$11%A'",%.')'

! G-"1*1)$"#,%*110$1

! B-1)%.*22*#0+)%)-%.$'+%E*)6

+,"!-6'(#,$!733898/"#(.)%

! @1%&$(%:$'.;?(*)$4%'+)6-0/6%E(*)$1%#-"1*1)%-2%'"%

'##0A0+')*-"%-&$(')*-"

! G-AA-"%*"%($.0#)*-";),&$%'+/-(*)6A1

! G'"%($&+*#')$%1*"#$%'##0A0+')*-"%#'"%8$%+*"$'(

slide by Matthew Bolitho

// Analysis

Page 64: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$0

1+2*3+24(56(738892:(;<=,89<>(?<9-@(A<*B,-@(C-,D2+@,86

/E#E/$$0

FF

!"#$%&'()"*!+,-)(.-"*!"#$/0(12-"*&'()"

! !"#$%&#'%()*+&,-%.#(/-01230#14/5#14%#-("6#7&,-%"

! '%/(8%)#914","-%495#914"-&(,4-"

! :8(;$/%<#=(-&,8#=2/-,$/,9(-,14

3 4

! :8(;$/%<#=(-&,8#=2/-,$/,9(-,14

3 4

'%()*>4/5

'%()*>4/5

:??%9-,@%/5*

A19(/

! :8(;$/%<#=1/%92/(&#B54(;,9"

C$)(-%#D1",-,14"#(4)#E%/19,-,%"

F14#G14)%)#H1&9%"

F%,30I1&#A,"-

G14)%)#H1&9%"

H1&9%"

! :8(;$/%<#=1/%92/(&#B54(;,9"

C$)(-%#D1",-,14"#(4)#E%/19,-,%"

F14#G14)%)#H1&9%"

F%,30I1&#A,"-

G14)%)#H1&9%"

!-1;,9#

J11&),4(-%"

slide by Matthew Bolitho

// Analysis

Page 65: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$0

1+2*3+24(56(738892:(;<=,89<>(?<9-@(A<*B,-@(C-,D2+@,86

/E#E/$$0

FF

!"#$%&'()"*!+,-)(.-"*!"#$/0(12-"*&'()"

! !"#$%&#'%()*+&,-%.#(/-01230#14/5#14%#-("6#7&,-%"

! '%/(8%)#914","-%495#914"-&(,4-"

! :8(;$/%<#=(-&,8#=2/-,$/,9(-,14

3 4

! :8(;$/%<#=(-&,8#=2/-,$/,9(-,14

3 4

'%()*>4/5

'%()*>4/5

:??%9-,@%/5*

A19(/

! :8(;$/%<#=1/%92/(&#B54(;,9"

C$)(-%#D1",-,14"#(4)#E%/19,-,%"

F14#G14)%)#H1&9%"

F%,30I1&#A,"-

G14)%)#H1&9%"

H1&9%"

! :8(;$/%<#=1/%92/(&#B54(;,9"

C$)(-%#D1",-,14"#(4)#E%/19,-,%"

F14#G14)%)#H1&9%"

F%,30I1&#A,"-

G14)%)#H1&9%"

!-1;,9#

J11&),4(-%"

slide by Matthew Bolitho

// Analysis

Page 66: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$0

1+2*3+24(56(738892:(;<=,89<>(?<9-@(A<*B,-@(C-,D2+@,86

/E#E/$$0

FF

!"#$%&'()"*!+,-)(.-"*!"#$/0(12-"*&'()"

! !"#$%&#'%()*+&,-%.#(/-01230#14/5#14%#-("6#7&,-%"

! '%/(8%)#914","-%495#914"-&(,4-"

! :8(;$/%<#=(-&,8#=2/-,$/,9(-,14

3 4

! :8(;$/%<#=(-&,8#=2/-,$/,9(-,14

3 4

'%()*>4/5

'%()*>4/5

:??%9-,@%/5*

A19(/

! :8(;$/%<#=1/%92/(&#B54(;,9"

C$)(-%#D1",-,14"#(4)#E%/19,-,%"

F14#G14)%)#H1&9%"

F%,30I1&#A,"-

G14)%)#H1&9%"

H1&9%"

! :8(;$/%<#=1/%92/(&#B54(;,9"

C$)(-%#D1",-,14"#(4)#E%/19,-,%"

F14#G14)%)#H1&9%"

F%,30I1&#A,"-

G14)%)#H1&9%"

!-1;,9#

J11&),4(-%"

slide by Matthew Bolitho

// Analysis

Page 67: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

CUDA Overview

IAP09 CUDA@MIT / 6.963

Page 68: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

!

"#!"#$%&'()%*+,-.,/012+%3./456'1(%7'6)%389:

*,.;<+/$%=*=*8*,.;<+/$%=*=*8

>?9$ !"!"# @ 6,'2A%6)+%=*8%'16.%(+1+,0<B45,4.C+%

2./456'1(%;D%20C6'1(%4,.;<+/%0C%(,04)'2C

E5,1%F060%'16.%'/0(+C%GH6+I65,+%/04CJK

E5,1%0<(.,'6)/C%'16.%'/0(+%CD16)+C'C%GH,+1F+,'1(%40CC+CJK

*,./'C'1(%,+C5<6CL%;56$

E.5()%<+0,1'1(%25,M+L%40,6'25<0,<D%-.,%1.1B(,04)'2C%+I4+,6C

*.6+16'0<<D%)'()%.M+,)+0F%.-%(,04)'2C%:*N

&'()<D%2.1C6,0'1+F%/+/.,D%<0D.56%O%022+CC%/.F+<

P++F%-.,%/01D%40CC+C%F,'M+C%54%;01F7'F6)%2.1C5/46'.1

"$!"#$%&'()%*+,-.,/012+%3./456'1(%7'6)%389:

!.<56'.1$%=*8%3./456'1(!.<56'.1$%=*8%3./456'1(

PQR$ !"#$%&'()*+,-$7'6)%389:

389:%S%%&'()*.$#,+/+.0$12+3.2$4256+*.5*)2.

3.BF+C'(1+F%)0,F70,+%O%C.-670,+%-.,%F',+26%=*8%2./456'1(

&0,F70,+$%-5<<D%(+1+,0<%F060B40,0<<+<%0,2)'6+265,+

!.-670,+$%4,.(,0/%6)+%=*8%'1%3

=+1+,0<%6),+0F%<0512)

=<.;0<%<.0FBC6.,+

*0,0<<+<%F060%202)+

!20<0,%0,2)'6+265,+

N16+(+,CL%;'6%.4+,06'.1C

9.5;<+%4,+2'C'.1%GC..1K

!20<0;<+%F060B40,0<<+<%

+I+256'.1T/+/.,D%/.F+<

3%7'6)%/'1'/0<%D+6%

4.7+,-5<%+I6+1C'.1C

Overview

Page 69: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DE/D/$$0

E

!"#$%&"'()'*+$&,-%#$.,+'/,'0%-1

! !"#$$%&'()*+$+,#'-./0&'1234')$5'3234

! 21+-13'145"-'16,%$'789:

! 1634'7'81.$-"0'69+-9),:'3;"<.="0'4)<)>

! 2&'3234';$509'!"#$$%&'<)*+$+,#:'?;<@

! 9,4$'3.5"'7*9:;'"<="#$'+,'.+4$&%#$.,+'3,#54$"=

! >&1=?.#4'@1&-A1&"'@.4$,&B'CDC

! 0E:F'@1&-A1&"'F&#?.$"#$%&"'GH"&H."A

! 0E:F'9"I,&B'9,-"3

! 0E:F'J<"#%$.,+'9,-"3

! @,I"A,&5'G+"

! !"#$)'0,I=%$"'E+.K."-':"H.#"'F&#?.$"#$%&"

! 0&"1$"-'6B'LM*:*F

! F'A1B'$,'="&K,&I'#,I=%$1$.,+',+'$?"'>8E

! 7="#.K.#1$.,+'K,&)

! F'#,I=%$"&'1&#?.$"#$%&"

! F'31+N%1N"

! F+'1==3.#1$.,+'.+$"&K1#"'OF8*P

! 0E:F'?1&-A1&"'1&#?.$"#$%&"'.4'614"-',+'

")<0&<'A9)=B.C&'69+C0&&.$-'D$.<&'EA6D%&F

slide by Matthew Bolitho

Overview

Page 70: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

© NVIDIA Corporation 2006 9

CUDA Advantages over Legacy GPGPU

Random access to memoryThread can access any memory location

Unlimited access to memory

Thread can read/write as many locations as needed

User-managed cache (per block)

Threads can cooperatively load data into SMEM

Any thread can then access any SMEM location

Low learning curve

Just a few extensions to C

No knowledge of graphics is required

No graphics API overhead

Overview

Page 71: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

© 2008 NVIDIA Corporation.

Some Design Goals

Scale to 100’s of cores, 1000’s of parallel threads

Let programmers focus on parallel algorithms

Not on the mechanics of a parallel programming language

Enable heterogeneous systems (i.e. CPU + GPU)CPU and GPU are separate devices with separate DRAMs

Overview

Page 72: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DE/D/$$0

#

! !"#$!%&'()*'+),-./'+0'1(2*'('3014*+-./'

4)0563+7''890:*'"0'+;*'%*+(9

! <-):+')*9*(:*'=>?$>@A'B01B*5

! C*30.5')*9*(:*'-.'D?$>@')*EF)-++*.

! G4*.'C06)3*

! H0+',*+'('I-(B9*':096+-0.

&*I-3*

%69+-EJ)03*::0) %69+-EJ)03*::0)

!

!"#$%&'()"*+ !"#$%&'()"*+

,(-.#(&'()"*+

/0 /0

/0 /0

/0 /0

! !

/0 /0

/0 /0

/0 /0

! !

%69+-EJ)03*::0)

!"#$%&'()"*+

/0 /0

/0 /0

/0 /0

! !

! ";*'K6.5(1*.+(9'6.-+'-:'+;*'!"#$%&'(#)*$!!)#

! !':3(9()L'1.23%(45*(#.1."2 K90(+-./'40-.+'!MN

! G.*'%69+-49,$!55'4*)'39032'3,39*

! 678 #OOOE@PQ'30149-(.+

! C+)*(1'4)03*::0):'()*'/)064*5'-.+0'&+,"-.

(#)*$!!)#!

! %69+-E4)03*::0):')6.'-.'/9',&%"#:1;(5

! !'.61B*)'0K'169+-E4)03*::0):'K0)1'('/$0-*$

%69+-EJ)03*::0)

<(3.1;(*1!"#$%&

'()"*+

!

/;*($)&0*"#(11"*

/=$*(>&'()"*+

?%"@$%&'()"*+

A"21;$2;&'()"*+

8(B;C*(&'()"*+

/;*($)&0*"#(11"*1&=$-(&$##(11&;"D

'()"*+&8+5( E##(11 /=$*.23

R*/-:+*): R*(5$S)-+* J)-I(+*

M03(9'%*10), R*(5$S)-+* J)-I(+*

C;()*5'%*10), R*(5$S)-+* %69+-EJ)03*::0)

T90B(9'%*10), R*(5$S)-+* &*I-3*

80.:+(.+'%*10), R*(5 &*I-3*

"*U+6)*'%*10), R*(5 &*I-3*

slide by Matthew Bolitho

Overview

Page 73: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

Overview

Page 74: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

© 2008 NVIDIA Corporation.

CUDA Installation

CUDA installation consists of Driver

CUDA Toolkit (compiler, libraries)

CUDA SDK (example codes)

Overview

Page 75: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

© 2008 NVIDIA Corporation.

CUDA Software Development

NVIDIA C Compiler

NVIDIA Assemblyfor Computing (PTX)

CPU Host Code

Integrated CPU + GPUC Source Code

CUDA Optimized Libraries:math.h, FFT, BLAS, …

CUDADriver

Profiler Standard C Compiler

GPU CPU

Overview

Page 76: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

© 2008 NVIDIA Corporation.

Compiling CUDA Code

NVCC

C/C++ CUDAApplication

PTX to Target

Compiler

G80 … GPU

Target code

PTX Code Virtual

Physical

CPU Code

Overview

Page 77: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

CUDA Basics

IAP09 CUDA@MIT / 6.963

Page 78: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

© 2008 NVIDIA Corporation.

CUDA Kernels and Threads

Parallel portions of an application are executed on the device as kernels

One kernel is executed at a time

Many threads execute each kernel

Differences between CUDA and CPU threads CUDA threads are extremely lightweight

Very little creation overhead

Instant switching

CUDA uses 1000s of threads to achieve efficiencyMulti-core CPUs can use only a few

Definitions Device = GPU

Host = CPU

Kernel = function that runs on the device

Basics

Page 79: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

© 2008 NVIDIA Corporation.

Arrays of Parallel Threads

A CUDA kernel is executed by an array of threadsAll threads run the same code

Each thread has an ID that it uses to compute memory addresses and make control decisions

0 1 2 3 4 5 6 7

float x = input[threadID];

float y = func(x);

output[threadID] = y;

threadID

Basics

Page 80: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

© 2008 NVIDIA Corporation.

Thread Cooperation

The Missing Piece: threads may need to cooperate

Thread cooperation is valuable

Share results to avoid redundant computation

Share memory accesses

Drastic bandwidth reduction

Thread cooperation is a powerful feature of CUDA

Cooperation between a monolithic array of threads is not scalable

Cooperation within smaller batches of threads is scalable

Basics

Page 81: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

© 2008 NVIDIA Corporation.

Thread Batching

Kernel launches a grid of thread blocksThreads within a block cooperate via shared memory

Threads within a block can synchronize

Threads in different blocks cannot cooperate

Allows programs to transparently scale to different GPUs

Grid

Thread Block 0

Shared Memory

Thread Block 1

Shared Memory

Thread Block N-1

Shared Memory

Basics

Page 82: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

© 2008 NVIDIA Corporation.

Transparent Scalability

Kernel grid

Block 2 Block 3

Block 4 Block 5

Block 6 Block 7

Device Device

Block 0 Block 1 Block 2 Block 3

Block 4 Block 5 Block 6 Block 7

Block 0 Block 1

Block 2 Block 3

Block 4 Block 5

Block 6 Block 7

Block 0 Block 1

Hardware is free to schedule thread blocks on any processor

A kernel scales across parallel multiprocessors

Basics

Page 83: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

© 2008 NVIDIA Corporation.

8-Series Architecture (G80)

128 thread processors execute kernel threads

16 multiprocessors, each contains

8 thread processors

Shared memory enables thread cooperation

SharedMemory

SharedMemory

SharedMemory

SharedMemory

SharedMemory

SharedMemory

SharedMemory

SharedMemory

SharedMemory

SharedMemory

SharedMemory

SharedMemory

SharedMemory

SharedMemory

SharedMemory

SharedMemory

Multiprocessor

Thread

Processors

SharedMemory

Basics

Page 84: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

© 2008 NVIDIA Corporation.

10-Series Architecture

240 thread processors execute kernel threads

30 multiprocessors, each contains

8 thread processors

One double-precision unit

Shared memory enables thread cooperation

ThreadProcessors

Multiprocessor

SharedMemory

Double

Basics

Page 85: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

© 2008 NVIDIA Corporation.

Kernel Memory Access

Per-thread

Per-block

Per-device

ThreadRegisters

Local Memory

SharedMemory

Block

...Kernel 0

...Kernel 1

GlobalMemory

Time

On-chip

Off-chip, uncached

• On-chip, small

• Fast

• Off-chip, large

• Uncached

• Persistent across kernel launches

• Kernel I/O

Basics

Page 86: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DE/D/$$0

#

! !"#$!%&'()*'+),-./'+0'1(2*'('3014*+-./'

4)0563+7''890:*'"0'+;*'%*+(9

! <-):+')*9*(:*'=>?$>@A'B01B*5

! C*30.5')*9*(:*'-.'D?$>@')*EF)-++*.

! G4*.'C06)3*

! H0+',*+'('I-(B9*':096+-0.

&*I-3*

%69+-EJ)03*::0) %69+-EJ)03*::0)

!

!"#$%&'()"*+ !"#$%&'()"*+

,(-.#(&'()"*+

/0 /0

/0 /0

/0 /0

! !

/0 /0

/0 /0

/0 /0

! !

%69+-EJ)03*::0)

!"#$%&'()"*+

/0 /0

/0 /0

/0 /0

! !

! ";*'K6.5(1*.+(9'6.-+'-:'+;*'!"#$%&'(#)*$!!)#

! !':3(9()L'1.23%(45*(#.1."2 K90(+-./'40-.+'!MN

! G.*'%69+-49,$!55'4*)'39032'3,39*

! 678 #OOOE@PQ'30149-(.+

! C+)*(1'4)03*::0):'()*'/)064*5'-.+0'&+,"-.

(#)*$!!)#!

! %69+-E4)03*::0):')6.'-.'/9',&%"#:1;(5

! !'.61B*)'0K'169+-E4)03*::0):'K0)1'('/$0-*$

%69+-EJ)03*::0)

<(3.1;(*1!"#$%&

'()"*+

!

/;*($)&0*"#(11"*

/=$*(>&'()"*+

?%"@$%&'()"*+

A"21;$2;&'()"*+

8(B;C*(&'()"*+

/;*($)&0*"#(11"*1&=$-(&$##(11&;"D

'()"*+&8+5( E##(11 /=$*.23

R*/-:+*): R*(5$S)-+* J)-I(+*

M03(9'%*10), R*(5$S)-+* J)-I(+*

C;()*5'%*10), R*(5$S)-+* %69+-EJ)03*::0)

T90B(9'%*10), R*(5$S)-+* &*I-3*

80.:+(.+'%*10), R*(5 &*I-3*

"*U+6)*'%*10), R*(5 &*I-3*

slide by Matthew Bolitho

Basics

Page 87: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

© 2008 NVIDIA Corporation.

Execution Model

Software Hardware

Threads are executed by thread processors

Thread

Thread Processor

Thread Block Multiprocessor

Thread blocks are executed on multiprocessors

Thread blocks do not migrate

Several concurrent thread blocks can reside on one multiprocessor - limited by multiprocessor resources (shared memory and register file)

...

Grid Device

A kernel is launched as a grid of thread blocks

Only one kernel can execute on a device at one time

Basics

Page 88: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

© 2008 NVIDIA Corporation.

Key Parallel Abstractions in CUDA

Trillions of lightweight threadsSimple decomposition model

Hierarchy of concurrent threadsSimple execution model

Lightweight synchronization of primitivesSimple synchronization model

Shared memory model for thread cooperation

Simple communication model

Basics

Page 89: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

© 2008 NVIDIA Corporation.

Managing Memory

CPU and GPU have separate memory spaces

Host (CPU) code manages device (GPU) memory:Allocate / free

Copy data to and from device

Applies to global device memory (DRAM)

Multiprocessor

Host

CPU

ChipsetDRAM

Device

DRAM

Local Memory

GlobalMemory

GPU

Multiprocessor

Multiprocessor

Registers

Shared Memory

Basics

Page 90: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

© 2008 NVIDIA Corporation.

GPU Memory Allocation / Release

cudaMalloc(void ** pointer, size_t nbytes)

cudaMemset(void * pointer, int value, size_t count)

cudaFree(void* pointer)

int n = 1024;

int nbytes = 1024*sizeof(int);

int *a_d = 0;

cudaMalloc( (void**)&a_d, nbytes );

cudaMemset( a_d, 0, nbytes);

cudaFree(a_d);

Basics

Page 91: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

© 2008 NVIDIA Corporation.

Data Copies

cudaMemcpy(void *dst, void *src, size_t nbytes, enum cudaMemcpyKind direction);

direction specifies locations (host or device) of src and dst

Blocks CPU thread: returns after the copy is complete

Doesn’t start copying until previous CUDA calls complete

enum cudaMemcpyKindcudaMemcpyHostToDevice

cudaMemcpyDeviceToHost

cudaMemcpyDeviceToDevice

Basics

Page 92: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

© 2008 NVIDIA Corporation.

Data Movement Example

int main(void)

{

float *a_h, *b_h; // host data

float *a_d, *b_d; // device data

int N = 14, nBytes, i ;

nBytes = N*sizeof(float);

a_h = (float *)malloc(nBytes);

b_h = (float *)malloc(nBytes);

cudaMalloc((void **) &a_d, nBytes);

cudaMalloc((void **) &b_d, nBytes);

for (i=0, i<N; i++) a_h[i] = 100.f + i;

cudaMemcpy(a_d, a_h, nBytes, cudaMemcpyHostToDevice);

cudaMemcpy(b_d, a_d, nBytes, cudaMemcpyDeviceToDevice);

cudaMemcpy(b_h, b_d, nBytes, cudaMemcpyDeviceToHost);

for (i=0; i< N; i++) assert( a_h[i] == b_h[i] );

free(a_h); free(b_h); cudaFree(a_d); cudaFree(b_d);

return 0;

}

Basics

Page 93: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

© 2008 NVIDIA Corporation.

Data Movement Example

int main(void)

{

float *a_h, *b_h; // host data

float *a_d, *b_d; // device data

int N = 14, nBytes, i ;

nBytes = N*sizeof(float);

a_h = (float *)malloc(nBytes);

b_h = (float *)malloc(nBytes);

cudaMalloc((void **) &a_d, nBytes);

cudaMalloc((void **) &b_d, nBytes);

for (i=0, i<N; i++) a_h[i] = 100.f + i;

cudaMemcpy(a_d, a_h, nBytes, cudaMemcpyHostToDevice);

cudaMemcpy(b_d, a_d, nBytes, cudaMemcpyDeviceToDevice);

cudaMemcpy(b_h, b_d, nBytes, cudaMemcpyDeviceToHost);

for (i=0; i< N; i++) assert( a_h[i] == b_h[i] );

free(a_h); free(b_h); cudaFree(a_d); cudaFree(b_d);

return 0;

}

Host

a_h

b_h

Basics

Page 94: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

© 2008 NVIDIA Corporation.

Data Movement Example

int main(void)

{

float *a_h, *b_h; // host data

float *a_d, *b_d; // device data

int N = 14, nBytes, i ;

nBytes = N*sizeof(float);

a_h = (float *)malloc(nBytes);

b_h = (float *)malloc(nBytes);

cudaMalloc((void **) &a_d, nBytes);

cudaMalloc((void **) &b_d, nBytes);

for (i=0, i<N; i++) a_h[i] = 100.f + i;

cudaMemcpy(a_d, a_h, nBytes, cudaMemcpyHostToDevice);

cudaMemcpy(b_d, a_d, nBytes, cudaMemcpyDeviceToDevice);

cudaMemcpy(b_h, b_d, nBytes, cudaMemcpyDeviceToHost);

for (i=0; i< N; i++) assert( a_h[i] == b_h[i] );

free(a_h); free(b_h); cudaFree(a_d); cudaFree(b_d);

return 0;

}

Host Device

a_h

b_h

a_d

b_d

Basics

Page 95: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

© 2008 NVIDIA Corporation.

Data Movement Example

int main(void)

{

float *a_h, *b_h; // host data

float *a_d, *b_d; // device data

int N = 14, nBytes, i ;

nBytes = N*sizeof(float);

a_h = (float *)malloc(nBytes);

b_h = (float *)malloc(nBytes);

cudaMalloc((void **) &a_d, nBytes);

cudaMalloc((void **) &b_d, nBytes);

for (i=0, i<N; i++) a_h[i] = 100.f + i;

cudaMemcpy(a_d, a_h, nBytes, cudaMemcpyHostToDevice);

cudaMemcpy(b_d, a_d, nBytes, cudaMemcpyDeviceToDevice);

cudaMemcpy(b_h, b_d, nBytes, cudaMemcpyDeviceToHost);

for (i=0; i< N; i++) assert( a_h[i] == b_h[i] );

free(a_h); free(b_h); cudaFree(a_d); cudaFree(b_d);

return 0;

}

Host Device

a_h

b_h

a_d

b_d

Basics

Page 96: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

© 2008 NVIDIA Corporation.

Data Movement Example

int main(void)

{

float *a_h, *b_h; // host data

float *a_d, *b_d; // device data

int N = 14, nBytes, i ;

nBytes = N*sizeof(float);

a_h = (float *)malloc(nBytes);

b_h = (float *)malloc(nBytes);

cudaMalloc((void **) &a_d, nBytes);

cudaMalloc((void **) &b_d, nBytes);

for (i=0, i<N; i++) a_h[i] = 100.f + i;

cudaMemcpy(a_d, a_h, nBytes, cudaMemcpyHostToDevice);

cudaMemcpy(b_d, a_d, nBytes, cudaMemcpyDeviceToDevice);

cudaMemcpy(b_h, b_d, nBytes, cudaMemcpyDeviceToHost);

for (i=0; i< N; i++) assert( a_h[i] == b_h[i] );

free(a_h); free(b_h); cudaFree(a_d); cudaFree(b_d);

return 0;

}

Host Device

a_h

b_h

a_d

b_d

Basics

Page 97: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

© 2008 NVIDIA Corporation.

Data Movement Example

int main(void)

{

float *a_h, *b_h; // host data

float *a_d, *b_d; // device data

int N = 14, nBytes, i ;

nBytes = N*sizeof(float);

a_h = (float *)malloc(nBytes);

b_h = (float *)malloc(nBytes);

cudaMalloc((void **) &a_d, nBytes);

cudaMalloc((void **) &b_d, nBytes);

for (i=0, i<N; i++) a_h[i] = 100.f + i;

cudaMemcpy(a_d, a_h, nBytes, cudaMemcpyHostToDevice);

cudaMemcpy(b_d, a_d, nBytes, cudaMemcpyDeviceToDevice);

cudaMemcpy(b_h, b_d, nBytes, cudaMemcpyDeviceToHost);

for (i=0; i< N; i++) assert( a_h[i] == b_h[i] );

free(a_h); free(b_h); cudaFree(a_d); cudaFree(b_d);

return 0;

}

Host Device

a_h

b_h

a_d

b_d

Basics

Page 98: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

© 2008 NVIDIA Corporation.

Data Movement Example

int main(void)

{

float *a_h, *b_h; // host data

float *a_d, *b_d; // device data

int N = 14, nBytes, i ;

nBytes = N*sizeof(float);

a_h = (float *)malloc(nBytes);

b_h = (float *)malloc(nBytes);

cudaMalloc((void **) &a_d, nBytes);

cudaMalloc((void **) &b_d, nBytes);

for (i=0, i<N; i++) a_h[i] = 100.f + i;

cudaMemcpy(a_d, a_h, nBytes, cudaMemcpyHostToDevice);

cudaMemcpy(b_d, a_d, nBytes, cudaMemcpyDeviceToDevice);

cudaMemcpy(b_h, b_d, nBytes, cudaMemcpyDeviceToHost);

for (i=0; i< N; i++) assert( a_h[i] == b_h[i] );

free(a_h); free(b_h); cudaFree(a_d); cudaFree(b_d);

return 0;

}

Host Device

a_h

b_h

a_d

b_d

Basics

Page 99: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

© 2008 NVIDIA Corporation.

Data Movement Example

int main(void)

{

float *a_h, *b_h; // host data

float *a_d, *b_d; // device data

int N = 14, nBytes, i ;

nBytes = N*sizeof(float);

a_h = (float *)malloc(nBytes);

b_h = (float *)malloc(nBytes);

cudaMalloc((void **) &a_d, nBytes);

cudaMalloc((void **) &b_d, nBytes);

for (i=0, i<N; i++) a_h[i] = 100.f + i;

cudaMemcpy(a_d, a_h, nBytes, cudaMemcpyHostToDevice);

cudaMemcpy(b_d, a_d, nBytes, cudaMemcpyDeviceToDevice);

cudaMemcpy(b_h, b_d, nBytes, cudaMemcpyDeviceToHost);

for (i=0; i< N; i++) assert( a_h[i] == b_h[i] );

free(a_h); free(b_h); cudaFree(a_d); cudaFree(b_d);

return 0;

}

Host Device

a_h

b_h

a_d

b_d

Basics

Page 100: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

© 2008 NVIDIA Corporation.

Data Movement Example

int main(void)

{

float *a_h, *b_h; // host data

float *a_d, *b_d; // device data

int N = 14, nBytes, i ;

nBytes = N*sizeof(float);

a_h = (float *)malloc(nBytes);

b_h = (float *)malloc(nBytes);

cudaMalloc((void **) &a_d, nBytes);

cudaMalloc((void **) &b_d, nBytes);

for (i=0, i<N; i++) a_h[i] = 100.f + i;

cudaMemcpy(a_d, a_h, nBytes, cudaMemcpyHostToDevice);

cudaMemcpy(b_d, a_d, nBytes, cudaMemcpyDeviceToDevice);

cudaMemcpy(b_h, b_d, nBytes, cudaMemcpyDeviceToHost);

for (i=0; i< N; i++) assert( a_h[i] == b_h[i] );

free(a_h); free(b_h); cudaFree(a_d); cudaFree(b_d);

return 0;

}

Host Device

Basics

Page 101: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

© 2008 NVIDIA Corporation.

Executing Code on the GPU

Kernels are C functions with some restrictions

Cannot access host memoryMust have void return type

No variable number of arguments (“varargs”)Not recursiveNo static variables

Function arguments automatically copied from host to device

Basics

Page 102: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

© 2008 NVIDIA Corporation.

Function Qualifiers

Kernels designated by function qualifier:__global__

Function called from host and executed on deviceMust return void

Other CUDA function qualifiers__device__

Function called from device and run on deviceCannot be called from host code

__host__

Function called from host and executed on host (default)__host__ and __device__ qualifiers can be combined to generate both CPU and GPU code

Basics

Page 103: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

© 2008 NVIDIA Corporation.

Launching Kernels

Modified C function call syntax:

kernel<<<dim3 dG, dim3 dB>>>(…)

Execution Configuration (“<<< >>>”)

dG - dimension and size of grid in blocks

Two-dimensional: x and y

Blocks launched in the grid: dG.x * dG.y

dB - dimension and size of blocks in threads:

Three-dimensional: x, y, and z

Threads per block: dB.x * dB.y * dB.z

Unspecified dim3 fields initialize to 1

Basics

Page 104: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

© 2008 NVIDIA Corporation.

Execution Configuration Examples

kernel<<<32,512>>>(...);

dim3 grid, block;grid.x = 2; grid.y = 4;block.x = 8; block.y = 16;

kernel<<<grid, block>>>(...);

dim3 grid(2, 4), block(8,16);

kernel<<<grid, block>>>(...);

Equivalent assignment using constructor functions

Basics

Page 105: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

© 2008 NVIDIA Corporation.

CUDA Built-in Device Variables

All __global__ and __device__ functions have access to these automatically defined variables

dim3 gridDim;

Dimensions of the grid in blocks (at most 2D)

dim3 blockDim;

Dimensions of the block in threads

dim3 blockIdx;

Block index within the grid

dim3 threadIdx;

Thread index within the block

Basics

Page 106: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

© 2008 NVIDIA Corporation.

Unique Thread IDs

Built-in variables are used to determine unique thread IDs

Map from local thread ID (threadIdx) to a global ID which can be used as array indices

0

0 1 2 3 4

1

0 1 2 3 4

2

0 1 2 3 4

blockIdx.x

blockDim.x = 5

threadIdx.x

blockIdx.x*blockDim.x

+ threadIdx.x

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Grid

Basics

Page 107: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DE/D/$$0

0

! !"#$%#&'#%(")'*+%,*"-'*+%."/0#'/#%'/1%2$3#45$%

6$6"57%'**%)"6$%85"6%#&$%0'6$%9&70:)'*%

6$6"57%9""*

! ;40#%1:88$5%:/%'))$00%9'##$5/0+%)')&:/<+%$#)=

! >%.?@>%1$A:)$%:0%'%&:<&*7%9'5'**$*%95")$00"5

! B$%'0046$%:#%)'/%$3$)4#$%6'/7%&4/15$10%"8%

#&5$'10%:/%9'5'**$*

! 2&5$'10%C%D#5$'6%E5")$00"50%F%G

! B&$/%H5:#:/<%.?@>%0"8#H'5$+%#&:/I%:/%#$560%"8%#&5$'10+%/"#%95")$00"50

! >%!"#$"% :0%$3$)4#$1%'0%'%!"#$

! >%&#'( :0%'%)"**$)#:"/%"8%%&"'($)*+,-./

! >%)*#"+(,-%./!,:0%'%)"**$)#:"/%"8%%&"'($/

! 2&5$'1%-*")I0%'/1%#&5$'10%'5$%<:A$/%4/:J4$%

:1$/#:8:$50%

! K1$/#:8:$50%-$%G@+%L@%"5%M@

! ?0$1%#"%&$*9%:1$/#:87%H&:)&%9'5#%"8%'%95"-*$6%

'%#&5$'1N-*")I%0&"4*1%"9$5'#$%"/

@$A:)$

,5:1

0)

12324

0

12354

0)

15324

0

15354

0)

16324

0)

16354

!

! 2&5$'1%O*")I%PG+GQ

7)

12324

7)

12354!

7)

15324

7)

15354

7)

16324

7)

16354

!

!

slide by Matthew Bolitho

Basics

Page 108: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DE/D/$$0

F

!"#$%&'()*+,-

! !"#$%&'(")*+,-".'/"$'0&"12"#+"345"#$%&'(6

! !**"#$%&'(6"78"'"#$%&'(")*+,-"'%&"%18"+8"#$&"

6'.&".1*#792%+,&66+%

! :$16",'8",+..187,'#&"07'"6$'%&(".&.+%/

! !8("6/8,$%+87;&"

! :$%&'(6"+<"'")*+,-"'%&".1*#72*&=&("+8#+"'"

.1*#792%+,&66+%"'6"!"#$%

>?@

A+%#$)%7(B&

CD!E

F+1#$)%7(B&

F!:! G#$&%8&#

H%'2$7,6">'%("I"

>@C!

J%+8#"F7(&"K16

E&.+%/"K16 ?>L"K16

?>L9G=2

%&66"K16

!

! ./012 +%"./0$! D&2*',&("!H?

! ?5?M"J1**"C12*&="F&%7'*M"F/..&#%7,"K16! 53NEKI6")'8(O7(#$"78"&',$"(7%&,#7+8

! "#$$#%&'()#$*+%,-(+%.#($/&.+0&,(1&2%,3(,+8<7B1%'#7+86P""GPBQ! ?>L9G"4R="S"4R"*'8&6

! 4R"#7.&6"#$&")'8(O7(#$"TUHKI6V

! :$&">@C!"62&,7<7,'#7+8"$'6")&&8"12('#&(! W&%67+8"4PN"4 L87#7'*"%&*&'6&M"N4INX

! W&%67+8"4P4"4@2('#&"O7#$"8&O&%"$'%(O'%&M"NUINX

! K',-O'%(6",+.2'#7)*&

! G=2&,#&("12('#&6"78"8&'%"<1#1%&Q! W&%67+8"4P5"I"5PN

! RY9)7#"<*+'#78B"2+78#"6122+%#"T7P&P"(+1)*&V

! W&%67+8"4P4"'((&("6+.&"7.2+%#'8#"16&<1*"

<&'#1%&6Q

3*456%#$

! !6/8,$%+8+16".&.+%/",+27&6

! !6/8,$%+8+16"H?@"2%+B%'."*'18,$

7%#&6%#$

! !#+.7,".&.+%/"786#%1,#7+86

slide by Matthew Bolitho

Basics

Page 109: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

© 2008 NVIDIA Corporation.

Minimal Kernels

__global__ void minimal( int* a_d, int value)

{

*a_d = value;

}

__global__ void assign( int* a_d, int value)

{

int idx = blockDim.x * blockIdx.x + threadIdx.x;

a_d[idx] = value;

}

Basics

Page 110: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

© 2008 NVIDIA Corporation.

Increment Array Example

CPU program CUDA program

void inc_cpu(int *a, int N)

{

int idx;

for (idx = 0; idx<N; idx++)

a[idx] = a[idx] + 1;

}

int main()

{

...

inc_cpu(a, N);

}

__global__ void inc_gpu(int *a, int N)

{

int idx = blockIdx.x * blockDim.x

+ threadIdx.x;

if (idx < N)

a[idx] = a[idx] + 1;

}

int main()

{

dim3 dimBlock (blocksize);

dim3 dimGrid( ceil( N / (float)blocksize) );

inc_gpu<<<dimGrid, dimBlock>>>(a, N);

}

Basics

Page 111: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

© 2008 NVIDIA Corporation.

Host Synchronization

All kernel launches are asynchronouscontrol returns to CPU immediately

kernel executes after all previous CUDA calls have completed

cudaMemcpy() is synchronouscontrol returns to CPU after copy completes

copy starts after all previous CUDA calls have completed

cudaThreadSynchronize()blocks until all previous CUDA calls complete

Basics

Page 112: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

© 2008 NVIDIA Corporation.

Host Synchronization Example

// copy data from host to device

cudaMemcpy(a_d, a_h, numBytes, cudaMemcpyHostToDevice);

// execute the kernel

inc_gpu<<<ceil(N/(float)blocksize), blocksize>>>(a_d, N);

// run independent CPU code

run_cpu_stuff();

// copy data from device back to host

cudaMemcpy(a_h, a_d, numBytes, cudaMemcpyDeviceToHost);

Basics

Page 113: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

© NVIDIA Corporation 2006 29

Device Runtime Component:Synchronization Function

void __syncthreads();

Synchronizes all threads in a block

Once all threads have reached this point, execution resumes normally

Used to avoid RAW / WAR / WAW hazards when accessing shared

Allowed in conditional code only if the conditional is uniform across the entire thread block

Basics

Page 114: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

© 2008 NVIDIA Corporation.

Variable Qualifiers (GPU code)

__device__Stored in global memory (large, high latency, no cache)Allocated with cudaMalloc (__device__ qualifier implied)Accessible by all threads

Lifetime: application

__shared__Stored in on-chip shared memory (very low latency)Specified by execution configuration or at compile timeAccessible by all threads in the same thread block

Lifetime: thread block

Unqualified variables:Scalars and built-in vector types are stored in registersWhat doesn’t fit in registers spills to “local” memory

Basics

Page 115: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

© 2008 NVIDIA Corporation.

CUDA Error Reporting to CPU

All CUDA calls return error code:Except for kernel launches

cudaError_t type

cudaError_t cudaGetLastError(void)

Returns the code for the last error (no error has a code)

Can be used to get error from kernel execution

char* cudaGetErrorString(cudaError_t code)Returns a null-terminated character string describing the error

printf(“%s\n”, cudaGetErrorString( cudaGetLastError() ) );

Basics

Page 116: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

© NVIDIA Corporation 2006 26

Host Runtime Component:Device Management

Device enumerationcudaGetDeviceCount(), cudaGetDeviceProperties()

Device selectioncudaChooseDevice(), cudaSetDevice()

> ~/NVIDIA_CUDA_SDK/bin/linux/release/deviceQuery

There is 1 device supporting CUDA

Device 0: "Quadro FX 5600" Major revision number: 1 Minor revision number: 0 Total amount of global memory: 1609891840 bytes Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 16384 bytes Total number of registers available per block: 8192 Warp size: 32 Maximum number of threads per block: 512 Maximum sizes of each dimension of a block: 512 x 512 x 64 Maximum sizes of each dimension of a grid: 65535 x 65535 x 1 Maximum memory pitch: 262144 bytes Texture alignment: 256 bytes Clock rate: 1350000 kilohertz

Basics

Page 117: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

© NVIDIA Corporation 2006 27

Host Runtime Component:Memory Management

Two kinds of memory:Linear memory: accessed through 32-bit pointers

CUDA arrays: opaque layouts with dimensionality

readable only through texture objects

Memory allocation

cudaMalloc(), cudaFree(), cudaMallocPitch(),

cudaMallocArray(), cudaFreeArray()

Memory copycudaMemcpy(), cudaMemcpy2D(),

cudaMemcpyToArray(), cudaMemcpyFromArray(), etc.

cudaMemcpyToSymbol(), cudaMemcpyFromSymbol()

Memory addressingcudaGetSymbolAddress()

Basics

Page 118: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DE/D/$$0

F

!"#$%&'()*+,-

! !"#$%&'(")*+,-".'/"$'0&"12"#+"345"#$%&'(6

! !**"#$%&'(6"78"'"#$%&'(")*+,-"'%&"%18"+8"#$&"

6'.&".1*#792%+,&66+%

! :$16",'8",+..187,'#&"07'"6$'%&(".&.+%/

! !8("6/8,$%+87;&"

! :$%&'(6"+<"'")*+,-"'%&".1*#72*&=&("+8#+"'"

.1*#792%+,&66+%"'6"!"#$%

>?@

A+%#$)%7(B&

CD!E

F+1#$)%7(B&

F!:! G#$&%8&#

H%'2$7,6">'%("I"

>@C!

J%+8#"F7(&"K16

E&.+%/"K16 ?>L"K16

?>L9G=2

%&66"K16

!

! ./012 +%"./0$! D&2*',&("!H?

! ?5?M"J1**"C12*&="F&%7'*M"F/..&#%7,"K16! 53NEKI6")'8(O7(#$"78"&',$"(7%&,#7+8

! "#$$#%&'()#$*+%,-(+%.#($/&.+0&,(1&2%,3(,+8<7B1%'#7+86P""GPBQ! ?>L9G"4R="S"4R"*'8&6

! 4R"#7.&6"#$&")'8(O7(#$"TUHKI6V

! :$&">@C!"62&,7<7,'#7+8"$'6")&&8"12('#&(! W&%67+8"4PN"4 L87#7'*"%&*&'6&M"N4INX

! W&%67+8"4P4"4@2('#&"O7#$"8&O&%"$'%(O'%&M"NUINX

! K',-O'%(6",+.2'#7)*&

! G=2&,#&("12('#&6"78"8&'%"<1#1%&Q! W&%67+8"4P5"I"5PN

! RY9)7#"<*+'#78B"2+78#"6122+%#"T7P&P"(+1)*&V

! W&%67+8"4P4"'((&("6+.&"7.2+%#'8#"16&<1*"

<&'#1%&6Q

3*456%#$

! !6/8,$%+8+16".&.+%/",+27&6

! !6/8,$%+8+16"H?@"2%+B%'."*'18,$

7%#&6%#$

! !#+.7,".&.+%/"786#%1,#7+86

slide by Matthew Bolitho

Basics

Page 119: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

!"#$$%&$'()*+,-.(/$$01(234-5(63*7,-5(8-,9:+5,;<(

=3*<+,.4;(>(?@;;4:A(B3C,;43(/$$0

/DE/D/$$0

F

!"#$%&'()*+,-

! !"#$%&'(")*+,-".'/"$'0&"12"#+"345"#$%&'(6

! !**"#$%&'(6"78"'"#$%&'(")*+,-"'%&"%18"+8"#$&"

6'.&".1*#792%+,&66+%

! :$16",'8",+..187,'#&"07'"6$'%&(".&.+%/

! !8("6/8,$%+87;&"

! :$%&'(6"+<"'")*+,-"'%&".1*#72*&=&("+8#+"'"

.1*#792%+,&66+%"'6"!"#$%

>?@

A+%#$)%7(B&

CD!E

F+1#$)%7(B&

F!:! G#$&%8&#

H%'2$7,6">'%("I"

>@C!

J%+8#"F7(&"K16

E&.+%/"K16 ?>L"K16

?>L9G=2

%&66"K16

!

! ./012 +%"./0$! D&2*',&("!H?

! ?5?M"J1**"C12*&="F&%7'*M"F/..&#%7,"K16! 53NEKI6")'8(O7(#$"78"&',$"(7%&,#7+8

! "#$$#%&'()#$*+%,-(+%.#($/&.+0&,(1&2%,3(,+8<7B1%'#7+86P""GPBQ! ?>L9G"4R="S"4R"*'8&6

! 4R"#7.&6"#$&")'8(O7(#$"TUHKI6V

! :$&">@C!"62&,7<7,'#7+8"$'6")&&8"12('#&(! W&%67+8"4PN"4 L87#7'*"%&*&'6&M"N4INX

! W&%67+8"4P4"4@2('#&"O7#$"8&O&%"$'%(O'%&M"NUINX

! K',-O'%(6",+.2'#7)*&

! G=2&,#&("12('#&6"78"8&'%"<1#1%&Q! W&%67+8"4P5"I"5PN

! RY9)7#"<*+'#78B"2+78#"6122+%#"T7P&P"(+1)*&V

! W&%67+8"4P4"'((&("6+.&"7.2+%#'8#"16&<1*"

<&'#1%&6Q

3*456%#$

! !6/8,$%+8+16".&.+%/",+27&6

! !6/8,$%+8+16"H?@"2%+B%'."*'18,$

7%#&6%#$

! !#+.7,".&.+%/"786#%1,#7+86

slide by Matthew Bolitho

Basics

Page 120: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

COME

Page 121: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

Back Pocket Slides

slide by David Cox

Page 122: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

Code Walkthrough 2:Parallel Reduction

Page 123: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

© NVIDIA Corporation 2006 37

Execution Decomposition

Two stages of computation:Sum within each block

Sum partial results from the blocks

For reductions, code for all levels is the same

4 7 5 9

11 14

25

3 1 7 0 4 1 6 3

4 7 5 9

11 14

25

3 1 7 0 4 1 6 3

4 7 5 9

11 14

25

3 1 7 0 4 1 6 3

4 7 5 9

11 14

25

3 1 7 0 4 1 6 3

4 7 5 9

11 14

25

3 1 7 0 4 1 6 3

4 7 5 9

11 14

25

3 1 7 0 4 1 6 3

4 7 5 9

11 14

25

3 1 7 0 4 1 6 3

4 7 5 9

11 14

25

3 1 7 0 4 1 6 3

4 7 5 9

11 14

25

3 1 7 0 4 1 6 3

Stage 1:many blocks

Stage2:1 block

Page 124: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

© NVIDIA Corporation 2006 38

Kernel execution

10 1 8 -1 0 -2 3 5 -2 -3 2 7 0 11 0 2Values (shared memory)

0 1 2 3 4 5 6 7

8 -2 10 6 0 9 3 7 -2 -3 2 7 0 11 0 2values

0 1 2 3

8 7 13 13 0 9 3 7 -2 -3 2 7 0 11 0 2values

0 1

21 20 13 13 0 9 3 7 -2 -3 2 7 0 11 0 2values

0

41 20 13 13 0 9 3 7 -2 -3 2 7 0 11 0 2values

threads

Step 1 Distance 8

Step 2 Distance 4

Step 3 Distance 2

Step 4 Distance 1

threads

threads

threads

Page 125: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

© NVIDIA Corporation 2006 39

Kernel Source Code

__global__ void sum_kernel(int *g_input, int *g_output){ extern __shared__ int s_data[]; // allocated during kernel launch

// read input into shared memory unsigned int idx = blockIdx.x * blockDim.x + threadIdx.x; s_data[threadIdx.x] = g_input[idx]; __syncthreads();

// compute sum for the threadblock for(int dist = blockDim.x/2; dist>0; dist/=2) { if(threadIdx.x<dist) s_data[threadIdx.x] += s_data[threadIdx.x+dist]; __syncthreads(); }

// write the block's sum to global memory if(threadIdx.x==0) g_output[blockIdx.x] = s_data[0];}

Page 126: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

© NVIDIA Corporation 2006 40

Host Source Code (1)

int main(){ // data set size in elements and bytes unsigned int n = 4096; unsigned int num_bytes = n*sizeof(int);

// launch configuration parameters unsigned int block_dim = 256; unsigned int num_blocks = n / block_dim; unsigned int num_smem_bytes = block_dim*sizeof(int); // allocate and initialize the data on the CPU int *h_a=(int*)malloc(num_bytes); for(int i=0;i<n;i++) h_a[i]=1; // allocate memory on the GPU device int *d_a=0, *d_output=0; cudaMalloc((void**)&d_a, num_bytes); cudaMalloc((void**)&d_output, num_blocks*sizeof(int));

...

Page 127: IAP09 CUDA@MIT 6.963 - Lecture 02: CUDA Basics #1 (Nicolas Pinto, MIT)

© NVIDIA Corporation 2006 41

Host Source Code (2)

...

// copy the input data from CPU to the GPU device cudaMemcpy(d_a, h_a, num_bytes, cudaMemcpyHostToDevice);

// two stages of kernel execution sum_kernel<<<num_blocks, block_dim, num_smem_bytes>>>(d_a, d_output); sum_kernel<<<1, num_blocks, num_blocks*sizeof(int)>>>(d_output, d_output);

// copy the output from GPU device to CPU and print cudaMemcpy(h_a, d_output, sizeof(int), cudaMemcpyDeviceToHost); printf("%d\n", h_a[0]);

// release resources cudaFree(d_a); cudaFree(d_output); free(h_a); return 0;}