6963 midterm review

Upload: vysrilekha

Post on 26-Feb-2018

254 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/25/2019 6963 Midterm Review

    1/20

    CS6235

    Review for Midterm

  • 7/25/2019 6963 Midterm Review

    2/20

    Review for Midterm

    2CS6235

    Administrative

    Pascal will meet the class on Wednesday

    - I will join at the beginning for questions on test Midterm

    - In class March 28, can bring single age of notes

    - !e"iew notes, readings and re"iew lecture

    - Prior e#ams co"ered, will be discussed today

    $esign !e"iew- Intermediate assessment of rogress on roject, oral and short

    - In class %ril &

    'inal rojects- Poster session, %ril 2( )dry run %ril *8+

    - 'inal reort, May (

  • 7/25/2019 6963 Midterm Review

    3/20

    Review for Midterm

    3CS6235

    Parts of ExamI $efinitions

    % list of terms you will be as.ed to define

    II /hort %nswer )& questions, 20 oints+- 1nderstand basic P1 architecture3 rocessors and memory hierarchy

    - 4igh le"el questions on more recent 5attern and alication6 lectures

    III Problem /ol"ing 7 tyes of questions

    - %nalye data deendences and data reuse in code and use this to guide

    91$% aralleliation and memory hierarchy maing- i"en some 91$% code, indicate whether global memory accesses will be

    coalesced and whether there will be ban. conflicts in shared memory

    - i"en some 91$% code, add synchroniation to deri"e a correct

    imlementation

    -i"en some 91$% code, ro"ide an otimied "ersion that will ha"e fewerdi"ergent branches

    I: );rief+

  • 7/25/2019 6963 Midterm Review

    4/20

    Review for Midterm

    4CS6235

    Syllabus

    >*3 Introduction and 91$% ?"er"iew

    @ot much thereA

    >23 4ardware ( D >&3 Memory 4ierarchy3 >ocality and $ata Placement

    Memory latency and memory bandwidth otimiations

    !euse and locality What are the different memory saces on the de"ice, who can

    readEwrite themC

    4ow do you tell the comiler that something belongs in a articularmemory saceC

    Biling transformation )to fit data into constrained storage+3 /afetyand rofitability

  • 7/25/2019 6963 Midterm Review

    5/20

    Review for Midterm

    5CS6235

    Syllabus

    > D >F3 Memory 4ierarchy III3 Memory ;andwidth?timiation

    Biling )for registers+ ;andwidth 7 ma#imie utility of each memory cycle

    Memory accesses in scheduling )half-war+

    1nderstanding global memory coalescing )for comute

    caability G *2 and H *2+ 1nderstanding shared memory ban. conflicts

    >3 Writing 9orrect Programs

    !ace condition, deendence

    What is a reduction comutation and why is it a good matchfor a P1C What does JJsyncthreads )+ doC )barrier synchroniation+

    %tomic oerations

    Memory 'ence Instructions

    $e"ice emulation mode

  • 7/25/2019 6963 Midterm Review

    6/20

    Review for Midterm

    6CS6235

    Syllabus

    >83 9ontrol 'low

    $i"ergent branches

    **3 /arse >inear %lgebra on P1/

    $ifferent sarse matri# reresentations

    /tencil comutations using sarse matrices

  • 7/25/2019 6963 Midterm Review

    7/20

    Review for Midterm

    7CS6235

    Syllabus

    >*2, >*( and >*&3 %lication case studies

    4ost tiling for constant cache )lus data structure reorganiation+

    !elacing trig function intrinsic calls with hardware imlementations

    lobal synchroniation for MPMEIMP

    >*3 $ynamic /cheduling Bas. queues

    /tatic queues, dynamic queues

    Wait-free synchroniation

    >*F3 /orting 1sing a hybrid algorithm for different sied lists

    %"oiding synchroniation

    Bradeoff between additional comutation and eliminating costlysynchroniation

  • 7/25/2019 6963 Midterm Review

    8/20

    Review for Midterm8CS6235

    2010 Exam: Problem III.aa Manain memory bandwidt!

    i"en the following 91$% code, how would you rewrite to imro"e bandwidth to

    global memory and, if alicable, shared memoryC ?9R/ *2EF&Q

    float aL*2, bL*2, cL*2L*2Q

    JJglobal comute)float a, float Sb, float Sc+ T

    int t# threadId##Q

    int b# bloc.Id##Q

    for )j b#SF&Q jG )b#SF&+NF&Q jNN+

    aLt# aLt# - cLt#Lj S bLjQ

    U

  • 7/25/2019 6963 Midterm Review

    9/20

    Review for Midterm9CS6235

    Exam: Problem III.aa Manain memory bandwidt!

    i"en the following 91$% code, how would you rewrite to imro"e bandwidth to

    global memory and, if alicable, shared memoryC ?9R/ *2EF&Q

    float aL*2, bL*2, cL*2L*2Q

    JJglobal comute)float a, float Sb, float Sc+ T

    int t# threadId##Q

    int b# bloc.Id##Q

    for )j b#SF&Q jG )b#SF&+NF&Q jNN+

    aLt# aLt# - cLt#Lj S bLjQ

    U

    How to solve?

    Copy ! to s"#red memoryi$ o#lesed order

    %ile i$ s"#red memory

    Copy # to re&ister

    Copy ' to s"#red memory(

    o$st#$t memory or te)t*re

    memory

  • 7/25/2019 6963 Midterm Review

    10/20

    Review for Midterm+,CS6235

    Exam: Problem III.a@ *2Q @1M;>?9R/ *2EF&Q

    float aL*2, bL*2, cL*2L*2Q

    float tmaQ

    JJglobal comute)float a, float Sb, float Sc+ T

    JJsharedJJ ctmL*02&N(2Q EE letVs use (2#(2

    EE ad for ban. conflicts

    int t# threadId##Q

    int b# bloc.Id##Q

    tma aLt#Q

    Pad* t#E(2Q Pad2 jE(2Q

    for )jj b#SF&Q jjG )b#SF&+NF&Q jjN(2+ for )jjjQ jGjjN2Q jNN+

    9tmLjS*2Nt#Nad* cLjLt#Q

    JJsyncthreads)+Q

    tma tma - ctmLt#S*2 N j N ad2 S bLjQ

    How to solve?

    Copy ! to s"#red memoryi$ o#lesed order

    %ile i$ s"#red memory

    Copy # to re&ister

    Copy ' to s"#red memory(

    o$st#$t memory or te)t*re

    memory

  • 7/25/2019 6963 Midterm Review

    11/20

    Review for Midterm++CS6235

    2010 Exam: Problem III.bb "iverent #ran$!i"en the following 91$% code, describe how you would modify this to deri"e an

    otimied "ersion that will ha"e fewer di"ergent branches AMain)+ T float hJaL*02&, hJbL*02&Q

    A ES assume aroriate cudaMalloc called to create dJa and dJb, and dJa is SE

    ES initialied from hJa using aroriate call to cudaMemcy SE

    dim( dimbloc.)2F+Q dim( dimgrid)&+Q comuteGGGdimgrid, dimbloc.,0HHH)dJa,dJb+Q ES assume dJb is coied bac. from the de"ice using call to cudaMemcy SEU

    JJglobalJJ comute )float Sa, float Sb+ Tfloat aL&L2F, bL&L2FQint t# threadId##Q b# bloc.Id##Qif )t# *F 0+

    )"oid+ startingJ.ernel )aLb#Lt#, bLb#Lt#+Qelse ES )t# *F H 0+ SE )"oid+ defaultJ.ernel )aLb#Lt#, bLb#Lt#+Q

    U

    -ey ide#.Sep#r#te m*ltiples of +6

    from ot"ers

  • 7/25/2019 6963 Midterm Review

    12/20

    Review for Midterm+2CS6235

    Problem III.b

    %roach3

    !enumber thread to concentrate case where notdi"isible by *F

    if )t# G 2&0+ t t# N )t#E*F+ N *Q

  • 7/25/2019 6963 Midterm Review

    13/20

    Review for Midterm+3CS6235

    2010 Exam: Problem III.$c %ilinBhe following sequential image correlation comutation comares a region of an image to

    a temlate /how how you would tile the image and threshold data to fit in *28M;global memory and the temlate data to fit in a *FR; shared memoryC

  • 7/25/2019 6963 Midterm Review

    14/20

    Review for Midterm+4CS6235

    &iew of 'om(utation

    Perform correlation of temlate with ortion of image

    Mo"e 5window6 horiontally and downward and reeat

    im#&e

    templ#te

  • 7/25/2019 6963 Midterm Review

    15/20

    Review for Midterm+5CS6235

    2010 Exam: Problem III.$

    i 4ow big is image and temlate dataC

    Image *K22

    S & bytesEint *00 MbytesBh *00 Mbytes

    Bemlate F&2S & bytes Eint e#actly *FR;ytes

    Botal data set sie H 200 Mbytes

    9annot ha"e both image and Bh in global memory 7 must generate2 tiles

    Bemlate data does not fit in shared memory due to other thingslaced thereA

    ii Partitioning to suort tiling for shared memory

    4int to e#loit reuse on temlate by coying to shared memory

    9ould also e#loit reuse on ortion of image

    $eendences only on th )a reduction+

  • 7/25/2019 6963 Midterm Review

    16/20

    Review for Midterm+6CS6235

    2010 Exam: Problem III.$

    )iii+ @eed to show tiling for temlate

    9an coy into shared memory in coalesced order

    9oy half or less at a time

  • 7/25/2019 6963 Midterm Review

    17/20

    Review for Midterm+7CS6235

    2010 Exam: Problem III.dd Parallel (artitionin and syn$!roni)ation *+, "e$om(osition-

    Without writing out the 91$% code, consider a 91$% maing of the >1 $ecomosition

    sequential code below %nswer should be in three arts, ro"iding oortunities forartial credit3 )i+ where are the data deendences in this comutationC )ii+ how would youartition the comutation across threads and bloc.sC )iii+ how would you addsynchroniation to a"oid race conditionsC

    float aL*02&L*02&Q

    for ).0Q jG*02(Q .NN+ T

    for )i.N*Q iG*02&Q iNN+

    aLiL. aLiL. E aL.L.Q

    for )i.N*Q iG*02&Q iNN+

    for )j.N*Q jG*02&Q jNN+

    aLiLj aLiLj 7 aLiL.SaL.LjQ

    U

  • 7/25/2019 6963 Midterm Review

    18/20

    Review for Midterm+8CS6235

    2010 Exam: Problem III.dd Parallel (artitionin and syn$!roni)ation *+, "e$om(osition-

    Without writing out the 91$% code, consider a 91$% maing of the >1 $ecomosition

    sequential code below %nswer should be in three arts, ro"iding oortunities forartial credit3 )i+ where are the data deendences in this comutationC )ii+ how would youartition the comutation across threads and bloc.sC )iii+ how would you addsynchroniation to a"oid race conditionsC

    float aL*02&L*02&Q

    for ).0Q jG*02(Q .NN+ T

    for )i.N*Q iG*02&Q iNN+

    aLiL. aLiL. E aL.L.Q

    for )i.N*Q iG*02&Q iNN+

    for )j.N*Q jG*02&Q jNN+

    aLiLj aLiLj 7 aLiL.SaL.LjQ

    U

    -ey /e#t*res of Sol*tio$.

    i01epe$de$es.

    %r*e #i(#(#i(#i #rried 'y

    %r*e #i(#( #i(#i( #rried 'y %r*e #i(#i( #i( #( #rried 'y

    ii0#rtitio$.

    Mer&e i loops( i$ter"#$&e wit" ( p#rtitio$

    ross 'los:t"re#ds ;s*ffiie$t

  • 7/25/2019 6963 Midterm Review

    19/20

    Review for Midterm+9CS6235

    2011 Exam: Exam(les of s!ort answers

    a $escribe how you can e#loit satial reuse inotimiing for memory bandwidth on a P1 )Partialcredit3 what are the memory bandwidth otimiationswe studiedC+

    b i"en e#amles we ha"e seen of control flow inP1 .ernels, describe ?@< way to reduce di"ergentbranches for ?@< of the following3 consider tree-structured reductions, e"en-odd comutations, orboundary conditions

    c !egarding floating oint suort in P1s, how doesthe architecture ermit trading off recision and

    erformanceC d What haens if two threads assigned to

    different bloc.s write to the same memory locationin global memoryC

  • 7/25/2019 6963 Midterm Review

    20/20

    Review for Midterm2,CS6235

    2011 Exam: Exam(les of Essay

    Pic. one of the following three toics and write a "erybrief essay about it, no more than ( sentences

    a We tal.ed about sarse matri# comutations withresect to linear algebra, grah coloring and rogramanalysis $escribe a sarse matri# reresentationthat is aroriate for a P1 imlementation of oneof these alications and e#lain why it is well suited

    b We tal.ed about how to ma tree-structuredcomutations to P1s ;riefly describe features ofthis maing that would yield an efficient P1imlementation

    c We tal.ed about dynamic scheduling on a P1$escribe a secific strategy for dynamic scheduling)static tas. list, dynamic tas. list, wait-freesynchroniation+ and when it would be aroriate touse it