![Page 1: Progggramming the FlexRAM Parallel Intelligent Memory Systemiacoma.cs.uiuc.edu/iacoma-papers/PRES/present_ppopp03.pdf · aIMOs aEvaluation aRelated Work aConclusions B. B. Fraguela](https://reader030.vdocuments.net/reader030/viewer/2022040921/5e9a32b2fae1b54a564a4c20/html5/thumbnails/1.jpg)
Programming the FlexRAM Parallel g gIntelligent Memory System
B. B. Fraguela*, J. Renau†, P. Feautrier‡
D. Padua† and J. Torrellas†
*Univ. da Coruña†Univ. of Illinois
‡ENS de Lyon
![Page 2: Progggramming the FlexRAM Parallel Intelligent Memory Systemiacoma.cs.uiuc.edu/iacoma-papers/PRES/present_ppopp03.pdf · aIMOs aEvaluation aRelated Work aConclusions B. B. Fraguela](https://reader030.vdocuments.net/reader030/viewer/2022040921/5e9a32b2fae1b54a564a4c20/html5/thumbnails/2.jpg)
Intelligent Memory ArchitecturesIntelligent Memory Architectures
Main memory enhanced with many simple processors
•HeterogeneousHeterogeneous
•Highly parallel
P bl Littl h h t th hit t
B. B. Fraguela et al Programming FlexRAM, PPoPP 2003 2
Problem: Little research on how to program these architectures
![Page 3: Progggramming the FlexRAM Parallel Intelligent Memory Systemiacoma.cs.uiuc.edu/iacoma-papers/PRES/present_ppopp03.pdf · aIMOs aEvaluation aRelated Work aConclusions B. B. Fraguela](https://reader030.vdocuments.net/reader030/viewer/2022040921/5e9a32b2fae1b54a564a4c20/html5/thumbnails/3.jpg)
ContributionsContributions
Language support for intelligent memories
OpenMP-like directives (CFlex)p ( )
Library of Intelligent Memory Operations (IMOs)
Runtime system and OS extensionsRuntime system and OS extensions
Speedups of over one order of magnitude
B. B. Fraguela et al Programming FlexRAM, PPoPP 2003 3
![Page 4: Progggramming the FlexRAM Parallel Intelligent Memory Systemiacoma.cs.uiuc.edu/iacoma-papers/PRES/present_ppopp03.pdf · aIMOs aEvaluation aRelated Work aConclusions B. B. Fraguela](https://reader030.vdocuments.net/reader030/viewer/2022040921/5e9a32b2fae1b54a564a4c20/html5/thumbnails/4.jpg)
OutlineOutline
FlexRAM ArchitectureSoftware SupportCFlexIMOsEvaluationRelated WorkConclusions
B. B. Fraguela et al Programming FlexRAM, PPoPP 2003 4
![Page 5: Progggramming the FlexRAM Parallel Intelligent Memory Systemiacoma.cs.uiuc.edu/iacoma-papers/PRES/present_ppopp03.pdf · aIMOs aEvaluation aRelated Work aConclusions B. B. Fraguela](https://reader030.vdocuments.net/reader030/viewer/2022040921/5e9a32b2fae1b54a564a4c20/html5/thumbnails/5.jpg)
FlexRAM ArchitectureFlexRAM Architecture64 PArrays/chip PArrays arePHost Off-the-shelf system6 ays/c p
64 MB/chipy
muchsimpler thanthe PHost(s)
2D torus insideeach chip
Controller: comm& synchr tasks
PHost
S d d S d d S d d
the PHost(s)
StandardMemory
StandardMemory
StandardMemory
ProgrammedindependentlyThe FlexRAM bus
B. B. Fraguela et al Programming FlexRAM, PPoPP 2003 5
depe de t y(MIMD, SPMD)interconnects
all the chips
![Page 6: Progggramming the FlexRAM Parallel Intelligent Memory Systemiacoma.cs.uiuc.edu/iacoma-papers/PRES/present_ppopp03.pdf · aIMOs aEvaluation aRelated Work aConclusions B. B. Fraguela](https://reader030.vdocuments.net/reader030/viewer/2022040921/5e9a32b2fae1b54a564a4c20/html5/thumbnails/6.jpg)
FlexRAM Architectural IssuesFlexRAM Architectural Issues
PArrays cannot interrupt/invoke the PHost(s)PArray requests are sent to chip controller
PHost polls the controllers and services the requests
Communication PHost - PArray : pass input andCommunication PHost PArray : pass input and output arguments through memory
No HW cache coherenceNo HW cache coherenceCompiler inserted cache flushes and invalidations
B. B. Fraguela et al Programming FlexRAM, PPoPP 2003 6
![Page 7: Progggramming the FlexRAM Parallel Intelligent Memory Systemiacoma.cs.uiuc.edu/iacoma-papers/PRES/present_ppopp03.pdf · aIMOs aEvaluation aRelated Work aConclusions B. B. Fraguela](https://reader030.vdocuments.net/reader030/viewer/2022040921/5e9a32b2fae1b54a564a4c20/html5/thumbnails/7.jpg)
OutlineOutline
FlexRAM ArchitectureSoftware SupportCFlexIMOsEvaluationRelated WorkConclusions
B. B. Fraguela et al Programming FlexRAM, PPoPP 2003 7
![Page 8: Progggramming the FlexRAM Parallel Intelligent Memory Systemiacoma.cs.uiuc.edu/iacoma-papers/PRES/present_ppopp03.pdf · aIMOs aEvaluation aRelated Work aConclusions B. B. Fraguela](https://reader030.vdocuments.net/reader030/viewer/2022040921/5e9a32b2fae1b54a564a4c20/html5/thumbnails/8.jpg)
Operating System ExtensionsOperating System Extensions
Common address space for PHost(s) and PArraysPArrays kernel
Manages the TLBManages the TLB Manages spawn and termination of local tasks
PHost OSUpdates the shared page tableFirst-touch placement of pagesCooperates with PArray kernels to keep TLBs coherentCooperates with PArray kernels to keep TLBs coherent
B. B. Fraguela et al Programming FlexRAM, PPoPP 2003 8
![Page 9: Progggramming the FlexRAM Parallel Intelligent Memory Systemiacoma.cs.uiuc.edu/iacoma-papers/PRES/present_ppopp03.pdf · aIMOs aEvaluation aRelated Work aConclusions B. B. Fraguela](https://reader030.vdocuments.net/reader030/viewer/2022040921/5e9a32b2fae1b54a564a4c20/html5/thumbnails/9.jpg)
Programming FlexRAMProgramming FlexRAM
OpenMP-like directives (CFlex)
Library of Intelligent Memory Operations (IMOs)Library of Intelligent Memory Operations (IMOs)
B. B. Fraguela et al Programming FlexRAM, PPoPP 2003 9
![Page 10: Progggramming the FlexRAM Parallel Intelligent Memory Systemiacoma.cs.uiuc.edu/iacoma-papers/PRES/present_ppopp03.pdf · aIMOs aEvaluation aRelated Work aConclusions B. B. Fraguela](https://reader030.vdocuments.net/reader030/viewer/2022040921/5e9a32b2fae1b54a564a4c20/html5/thumbnails/10.jpg)
CFlexCFlex
CFlex: family of directives inspired by OpenMPExecution modifiers: build/sync tasksData modifiers: properties of data structuresExecutable directives: barriers, prefetches,...
#pragma FlexRAM directive-type [clauses]
B. B. Fraguela et al Programming FlexRAM, PPoPP 2003 10
![Page 11: Progggramming the FlexRAM Parallel Intelligent Memory Systemiacoma.cs.uiuc.edu/iacoma-papers/PRES/present_ppopp03.pdf · aIMOs aEvaluation aRelated Work aConclusions B. B. Fraguela](https://reader030.vdocuments.net/reader030/viewer/2022040921/5e9a32b2fae1b54a564a4c20/html5/thumbnails/11.jpg)
Execution Modifier: SpawnExecution Modifier: Spawn
Directive-type
[phost|parray] : specifies kind of processor to useClClauses
on_home(x): run the task on the PArray on whose bank x is located.sync/async: parent task must stop until child finishes (sync) or not (async)pfor: parallelize for loop
B. B. Fraguela et al Programming FlexRAM, PPoPP 2003 11
![Page 12: Progggramming the FlexRAM Parallel Intelligent Memory Systemiacoma.cs.uiuc.edu/iacoma-papers/PRES/present_ppopp03.pdf · aIMOs aEvaluation aRelated Work aConclusions B. B. Fraguela](https://reader030.vdocuments.net/reader030/viewer/2022040921/5e9a32b2fae1b54a564a4c20/html5/thumbnails/12.jpg)
Execution Modifier: Spawn (II)Execution Modifier: Spawn (II)
Clauses (cont.):if(cond)/ else: conditional execution of the compiler directivecompiler directiveshared, private, firstprivate, lastprivate, reduction: scope clauses with p , pthe same meaning as in OpenMP flush: specify which pieces of data to flush from PHost cachePHost cache
B. B. Fraguela et al Programming FlexRAM, PPoPP 2003 12
![Page 13: Progggramming the FlexRAM Parallel Intelligent Memory Systemiacoma.cs.uiuc.edu/iacoma-papers/PRES/present_ppopp03.pdf · aIMOs aEvaluation aRelated Work aConclusions B. B. Fraguela](https://reader030.vdocuments.net/reader030/viewer/2022040921/5e9a32b2fae1b54a564a4c20/html5/thumbnails/13.jpg)
Example: Parallelizing a LoopExample: Parallelizing a Loop
for(p = head; p != NULL; p = p->next)process(p->data);
B. B. Fraguela et al Programming FlexRAM, PPoPP 2003 13
![Page 14: Progggramming the FlexRAM Parallel Intelligent Memory Systemiacoma.cs.uiuc.edu/iacoma-papers/PRES/present_ppopp03.pdf · aIMOs aEvaluation aRelated Work aConclusions B. B. Fraguela](https://reader030.vdocuments.net/reader030/viewer/2022040921/5e9a32b2fae1b54a564a4c20/html5/thumbnails/14.jpg)
Example: Parallelizing a LoopExample: Parallelizing a Loop
for(p = head; p != NULL; p = p->next)for(p head; p ! NULL; p p >next)
process(p->data);
B. B. Fraguela et al Programming FlexRAM, PPoPP 2003 14
![Page 15: Progggramming the FlexRAM Parallel Intelligent Memory Systemiacoma.cs.uiuc.edu/iacoma-papers/PRES/present_ppopp03.pdf · aIMOs aEvaluation aRelated Work aConclusions B. B. Fraguela](https://reader030.vdocuments.net/reader030/viewer/2022040921/5e9a32b2fae1b54a564a4c20/html5/thumbnails/15.jpg)
Example: Parallelizing a LoopExample: Parallelizing a Loop
#pragma FlexRAM phost syncfor(p = head; p != NULL; p = p->next)
#pragma FlexRAM parray async on_home(*(p->data)) \firstprivate(p)
process(p->data);
B. B. Fraguela et al Programming FlexRAM, PPoPP 2003 15
![Page 16: Progggramming the FlexRAM Parallel Intelligent Memory Systemiacoma.cs.uiuc.edu/iacoma-papers/PRES/present_ppopp03.pdf · aIMOs aEvaluation aRelated Work aConclusions B. B. Fraguela](https://reader030.vdocuments.net/reader030/viewer/2022040921/5e9a32b2fae1b54a564a4c20/html5/thumbnails/16.jpg)
Example: Parallelizing a LoopExample: Parallelizing a Loop
#pragma FlexRAM parray pfor on_home(*(p->data)) \firstprivate(p)
for(p = head; p != NULL; p = p->next)process(p->data);
B. B. Fraguela et al Programming FlexRAM, PPoPP 2003 16
![Page 17: Progggramming the FlexRAM Parallel Intelligent Memory Systemiacoma.cs.uiuc.edu/iacoma-papers/PRES/present_ppopp03.pdf · aIMOs aEvaluation aRelated Work aConclusions B. B. Fraguela](https://reader030.vdocuments.net/reader030/viewer/2022040921/5e9a32b2fae1b54a564a4c20/html5/thumbnails/17.jpg)
Parallelizing Complex CodesParallelizing Complex Codes
int TreeAdd (register tree_t *t) {if (t == NULL) return 0;if (t NULL) return 0;else {
int leftval, rightval;
leftval = TreeAdd(t->left);leftval TreeAdd(t >left);
rightval = TreeAdd(t->right);
return leftval + rightval + t->val;return leftval + rightval + t >val;}
}
B. B. Fraguela et al Programming FlexRAM, PPoPP 2003 17
![Page 18: Progggramming the FlexRAM Parallel Intelligent Memory Systemiacoma.cs.uiuc.edu/iacoma-papers/PRES/present_ppopp03.pdf · aIMOs aEvaluation aRelated Work aConclusions B. B. Fraguela](https://reader030.vdocuments.net/reader030/viewer/2022040921/5e9a32b2fae1b54a564a4c20/html5/thumbnails/18.jpg)
Parallelizing Complex Codes (II)Parallelizing Complex Codes (II)PHost async task
PHost async task PHost async taskPHost-allocated node
PArray subtreePArray async task
B. B. Fraguela et al Programming FlexRAM, PPoPP 2003 18
![Page 19: Progggramming the FlexRAM Parallel Intelligent Memory Systemiacoma.cs.uiuc.edu/iacoma-papers/PRES/present_ppopp03.pdf · aIMOs aEvaluation aRelated Work aConclusions B. B. Fraguela](https://reader030.vdocuments.net/reader030/viewer/2022040921/5e9a32b2fae1b54a564a4c20/html5/thumbnails/19.jpg)
Parallelizing Complex Codes (III)Parallelizing Complex Codes (III)
int TreeAdd (register tree_t *t) {if (t == NULL) return 0;else {
int leftval, rightval;
#pragma FlexRAM phost sync{
#pragma FlexRAM parray async on_home(*(t->tleft)) if (lcl(t->left))##pragma FlexRAM phost async else
leftval = TreeAdd(t->left);
#pragma FlexRAM parray async on_home(*(t->tright)) if (lcl(t->right))( )rightval = TreeAdd(t->right);
}
return leftval + rightval + t->val;}
B. B. Fraguela et al Programming FlexRAM, PPoPP 2003 19
}}
![Page 20: Progggramming the FlexRAM Parallel Intelligent Memory Systemiacoma.cs.uiuc.edu/iacoma-papers/PRES/present_ppopp03.pdf · aIMOs aEvaluation aRelated Work aConclusions B. B. Fraguela](https://reader030.vdocuments.net/reader030/viewer/2022040921/5e9a32b2fae1b54a564a4c20/html5/thumbnails/20.jpg)
OutlineOutline
FlexRAM ArchitectureSoftware SupportCFlexIMOsEvaluationRelated WorkConclusions
B. B. Fraguela et al Programming FlexRAM, PPoPP 2003 20
![Page 21: Progggramming the FlexRAM Parallel Intelligent Memory Systemiacoma.cs.uiuc.edu/iacoma-papers/PRES/present_ppopp03.pdf · aIMOs aEvaluation aRelated Work aConclusions B. B. Fraguela](https://reader030.vdocuments.net/reader030/viewer/2022040921/5e9a32b2fae1b54a564a4c20/html5/thumbnails/21.jpg)
Intelligent Memory Operations (IMOs)Intelligent Memory Operations (IMOs)
Libraries that hide FlexRAM while providing near-optimal performanceImplement common operations on data structures often used in programsHi hl ti i d b th th ti l d thHighly optimized, both the sequential and the parallel versions
B. B. Fraguela et al Programming FlexRAM, PPoPP 2003 21
![Page 22: Progggramming the FlexRAM Parallel Intelligent Memory Systemiacoma.cs.uiuc.edu/iacoma-papers/PRES/present_ppopp03.pdf · aIMOs aEvaluation aRelated Work aConclusions B. B. Fraguela](https://reader030.vdocuments.net/reader030/viewer/2022040921/5e9a32b2fae1b54a564a4c20/html5/thumbnails/22.jpg)
Example IMOs: Vector ContainerExample IMOs: Vector Container
IMO description Syntax
Apply func f with arg a Vector apply(v f a)Apply func f with arg a Vector_apply(v,f,a)
Search element that fulfills
cond f with arg aVector_search(v,f,a)
Generate vector with the result
of appl func f with arg av2=Vector_map(v,f,a)
Reduce vector applying func f,Vector reduce(v,f,a)
whose neutrum is aVector_reduce(v,f,a)
Process two vectors and an arg
a, generating a new vectorv3=Vector_map2(v,v2,f,a)
B. B. Fraguela et al Programming FlexRAM, PPoPP 2003 22
![Page 23: Progggramming the FlexRAM Parallel Intelligent Memory Systemiacoma.cs.uiuc.edu/iacoma-papers/PRES/present_ppopp03.pdf · aIMOs aEvaluation aRelated Work aConclusions B. B. Fraguela](https://reader030.vdocuments.net/reader030/viewer/2022040921/5e9a32b2fae1b54a564a4c20/html5/thumbnails/23.jpg)
OutlineOutline
FlexRAM ArchitectureSoftware SupportCFlexIMOsEvaluationRelated WorkConclusions
B. B. Fraguela et al Programming FlexRAM, PPoPP 2003 23
![Page 24: Progggramming the FlexRAM Parallel Intelligent Memory Systemiacoma.cs.uiuc.edu/iacoma-papers/PRES/present_ppopp03.pdf · aIMOs aEvaluation aRelated Work aConclusions B. B. Fraguela](https://reader030.vdocuments.net/reader030/viewer/2022040921/5e9a32b2fae1b54a564a4c20/html5/thumbnails/24.jpg)
Architecture ParametersArchitecture Parameters
PHost1.6 GHz, 5 issueL1 cache: 32 KB
PArray1.2 GHz, 2 issue in orderL1 cache: 8 KB
L2 cache: 1 MB
Mem latency 180 cycles
No FP support
Mem latency 14 cyclesMem latency 180 cycles Mem latency 14 cycles
B. B. Fraguela et al Programming FlexRAM, PPoPP 2003 24
![Page 25: Progggramming the FlexRAM Parallel Intelligent Memory Systemiacoma.cs.uiuc.edu/iacoma-papers/PRES/present_ppopp03.pdf · aIMOs aEvaluation aRelated Work aConclusions B. B. Fraguela](https://reader030.vdocuments.net/reader030/viewer/2022040921/5e9a32b2fae1b54a564a4c20/html5/thumbnails/25.jpg)
Applications (I)Applications (I)
Application Suite Access Data Task Size
TSP Olden Ptr FP Large
TreeAdd Olden Ptr Int Large
Swim SPEC OMP 2001 Reg FP Med
Mgrid SPEC OMP 2001 Reg FP Var
Dmxdm Kernel Reg FP Large
S K l I d FP M dSpmxv Kernel Ind FP Med
Distance CAM Ptr Int Large
Path CAM Ptr Int Small
B. B. Fraguela et al Programming FlexRAM, PPoPP 2003 25
Path CAM Ptr Int Small
![Page 26: Progggramming the FlexRAM Parallel Intelligent Memory Systemiacoma.cs.uiuc.edu/iacoma-papers/PRES/present_ppopp03.pdf · aIMOs aEvaluation aRelated Work aConclusions B. B. Fraguela](https://reader030.vdocuments.net/reader030/viewer/2022040921/5e9a32b2fae1b54a564a4c20/html5/thumbnails/26.jpg)
Applications (II)Original code not changed
Applications (II)
Application Code size (lines) Directives Additional lines
TSP 485 12 5
TreeAdd 71 8 4
Swim 272 8 0
Mgrid 470 13 0
Dmxdm 81 1 2
Spmxv 47 1 0
Distance 108 17 7
h 16 1 9
B. B. Fraguela et al Programming FlexRAM, PPoPP 2003 26
Path 165 17 9
![Page 27: Progggramming the FlexRAM Parallel Intelligent Memory Systemiacoma.cs.uiuc.edu/iacoma-papers/PRES/present_ppopp03.pdf · aIMOs aEvaluation aRelated Work aConclusions B. B. Fraguela](https://reader030.vdocuments.net/reader030/viewer/2022040921/5e9a32b2fae1b54a564a4c20/html5/thumbnails/27.jpg)
SpeedupsSpeedups
B. B. Fraguela et al Programming FlexRAM, PPoPP 2003 27
Speedups: Over one order of magnitude!
![Page 28: Progggramming the FlexRAM Parallel Intelligent Memory Systemiacoma.cs.uiuc.edu/iacoma-papers/PRES/present_ppopp03.pdf · aIMOs aEvaluation aRelated Work aConclusions B. B. Fraguela](https://reader030.vdocuments.net/reader030/viewer/2022040921/5e9a32b2fae1b54a564a4c20/html5/thumbnails/28.jpg)
Further OptimizationsFurther Optimizations
Single task for consecutive pages affected by an on_home residing in the same bankConsecutive rather than cyclic spawn among theConsecutive rather than cyclic spawn among the FlexRAM chipsLimiting the usage of on_home to large loops
Increase the average speedup by about 30%
B. B. Fraguela et al Programming FlexRAM, PPoPP 2003 28
![Page 29: Progggramming the FlexRAM Parallel Intelligent Memory Systemiacoma.cs.uiuc.edu/iacoma-papers/PRES/present_ppopp03.pdf · aIMOs aEvaluation aRelated Work aConclusions B. B. Fraguela](https://reader030.vdocuments.net/reader030/viewer/2022040921/5e9a32b2fae1b54a564a4c20/html5/thumbnails/29.jpg)
OutlineOutline
FlexRAM ArchitectureSoftware SupportCFlexIMOsEvaluationRelated WorkConclusions
B. B. Fraguela et al Programming FlexRAM, PPoPP 2003 29
![Page 30: Progggramming the FlexRAM Parallel Intelligent Memory Systemiacoma.cs.uiuc.edu/iacoma-papers/PRES/present_ppopp03.pdf · aIMOs aEvaluation aRelated Work aConclusions B. B. Fraguela](https://reader030.vdocuments.net/reader030/viewer/2022040921/5e9a32b2fae1b54a564a4c20/html5/thumbnails/30.jpg)
Other Intelligent Mem ArchOther Intelligent Mem Arch.
[ k l] [ ll l]Active Pages [Oskin et al], DIVA [Hall et al]Program the machine by handC l b d ith CFl /IMOCan also be programmed with CFlex/IMOsSeem more sensitive to data placement than FlexRAMFurther extensions of CFlex to specify alignment andFurther extensions of CFlex to specify alignment and placement of data structuresMessage passing is more natural for DIVA
B. B. Fraguela et al Programming FlexRAM, PPoPP 2003 30
![Page 31: Progggramming the FlexRAM Parallel Intelligent Memory Systemiacoma.cs.uiuc.edu/iacoma-papers/PRES/present_ppopp03.pdf · aIMOs aEvaluation aRelated Work aConclusions B. B. Fraguela](https://reader030.vdocuments.net/reader030/viewer/2022040921/5e9a32b2fae1b54a564a4c20/html5/thumbnails/31.jpg)
Related WorkRelated Work
FlexRAM [Solihin et al]: automatic partition and mapping by a compiler
Only feasible for simple codesOnly feasible for simple codesPerformance is limited
Widespread compiler directivesOpenMP: UMA model, no locality clausesHPF: extensive alignment + replicationBoth: Unadequate for irregular applicationsq g pp
B. B. Fraguela et al Programming FlexRAM, PPoPP 2003 31
![Page 32: Progggramming the FlexRAM Parallel Intelligent Memory Systemiacoma.cs.uiuc.edu/iacoma-papers/PRES/present_ppopp03.pdf · aIMOs aEvaluation aRelated Work aConclusions B. B. Fraguela](https://reader030.vdocuments.net/reader030/viewer/2022040921/5e9a32b2fae1b54a564a4c20/html5/thumbnails/32.jpg)
ConclusionsConclusions
Effective programming support for Intelligent MemCFlex: family of pragmas inspired by OpenMPIMOs: library of intelligent memory operations
CFlex parallelizes more problems than OpenMPfComplexity can be further hidden using IMOs
Speedups over one order of magnitude
B. B. Fraguela et al Programming FlexRAM, PPoPP 2003 32
![Page 33: Progggramming the FlexRAM Parallel Intelligent Memory Systemiacoma.cs.uiuc.edu/iacoma-papers/PRES/present_ppopp03.pdf · aIMOs aEvaluation aRelated Work aConclusions B. B. Fraguela](https://reader030.vdocuments.net/reader030/viewer/2022040921/5e9a32b2fae1b54a564a4c20/html5/thumbnails/33.jpg)
Programming the FlexRAM Parallel g gIntelligent Memory System
B. B. Fraguela*, J. Renau†, P. Feautrier‡
D. Padua† and J. Torrellas†
*Univ. da Coruña†Univ. of Illinois
‡ENS de Lyon
![Page 34: Progggramming the FlexRAM Parallel Intelligent Memory Systemiacoma.cs.uiuc.edu/iacoma-papers/PRES/present_ppopp03.pdf · aIMOs aEvaluation aRelated Work aConclusions B. B. Fraguela](https://reader030.vdocuments.net/reader030/viewer/2022040921/5e9a32b2fae1b54a564a4c20/html5/thumbnails/34.jpg)
Runtime SystemRuntime System
Task managementCreation of buffer with input args + messageCreation of buffer with input args + messagePHost spins on a termination flag set by the PArray
Interface to chip controller locks and constructions built upon them, like barriersHeap memory management
ll f h l hPolling of the FlexRAM chips
B. B. Fraguela et al Programming FlexRAM, PPoPP 2003 34
![Page 35: Progggramming the FlexRAM Parallel Intelligent Memory Systemiacoma.cs.uiuc.edu/iacoma-papers/PRES/present_ppopp03.pdf · aIMOs aEvaluation aRelated Work aConclusions B. B. Fraguela](https://reader030.vdocuments.net/reader030/viewer/2022040921/5e9a32b2fae1b54a564a4c20/html5/thumbnails/35.jpg)
PArray StructurePArray Structure
B. B. Fraguela et al Programming FlexRAM, PPoPP 2003 35
![Page 36: Progggramming the FlexRAM Parallel Intelligent Memory Systemiacoma.cs.uiuc.edu/iacoma-papers/PRES/present_ppopp03.pdf · aIMOs aEvaluation aRelated Work aConclusions B. B. Fraguela](https://reader030.vdocuments.net/reader030/viewer/2022040921/5e9a32b2fae1b54a564a4c20/html5/thumbnails/36.jpg)
Further OptimizationsFurther Optimizations
Initial optimizations:Alignments using the OS first-touch policyon home clause to exploit localityon_home clause to exploit locality
New optimizations:H: single task for consecutive pages affected by an on_home
idi i th b kresiding in the same bankC: consecutive rather than cyclic spawn among the FlexRAM chipsL: limiting the usage of on_home to large loops
B. B. Fraguela et al Programming FlexRAM, PPoPP 2003 36
![Page 37: Progggramming the FlexRAM Parallel Intelligent Memory Systemiacoma.cs.uiuc.edu/iacoma-papers/PRES/present_ppopp03.pdf · aIMOs aEvaluation aRelated Work aConclusions B. B. Fraguela](https://reader030.vdocuments.net/reader030/viewer/2022040921/5e9a32b2fae1b54a564a4c20/html5/thumbnails/37.jpg)
Software Optimization ResultsSoftware Optimization Results
B. B. Fraguela et al Programming FlexRAM, PPoPP 2003 37
![Page 38: Progggramming the FlexRAM Parallel Intelligent Memory Systemiacoma.cs.uiuc.edu/iacoma-papers/PRES/present_ppopp03.pdf · aIMOs aEvaluation aRelated Work aConclusions B. B. Fraguela](https://reader030.vdocuments.net/reader030/viewer/2022040921/5e9a32b2fae1b54a564a4c20/html5/thumbnails/38.jpg)
Final ResultsFinal Results
B. B. Fraguela et al Programming FlexRAM, PPoPP 2003 38