r. arce-nazario, m. jimenez, and d. rodriguez electrical and computer engineering university of...
Post on 04-Jan-2016
216 Views
Preview:
TRANSCRIPT
R. Arce-Nazario, M. Jimenez, and D. RodriguezElectrical and Computer EngineeringUniversity of Puerto Rico – Mayagüez
WALSAIP
2
Motivation and ObjectiveMotivation and Objective
Discrete Signal Transforms (DSTs)DFT, DCT, lots of applications
Hardware accelerated but at high area cost
Distributed (dedicated) hardware architectures (DHAs)Cost-effective
Partitioning plays key role
Objective: Use inherent properties of DSTs to improve their hardware partitioning to distributed hardware architectures.
DST Partitioning
DHA
3
Previous WorkPrevious Work
Automated partitioning of DST to DHA’sDSTs treated as any other algorithm/benchmark [Srinivasan01][Bringmann00]Converted to high-level or structural DFG and treated as such.
Manual partitioning & automated code generationDST specific properties exploited [Kumhom01]New formulations developed to exploit architectural features. [VanLoan92]SPIRAL and FFTW – code generation platforms exploring the space of equivalent algorithms. ([Pueschel05], [Frigo05])
[Arce05] – Automated partitioning methodology that incorporates DST features and formulation exploration
4
Partitioning Methodology Partitioning Methodology
KPA DSTFormulation
ArchitecturalDescription
FormulationManipulator
FormulationTo DFG
Heuristic Control
Partition/Placement
Estimators
High-level partition solution
KPAFormulation
DFG
Cost andIndicators
RuleSelection
KPAFormulation
HypergraphRepresentation
5
DSTs – General Concepts DSTs – General Concepts
),()..,(],..,[..],..,[ 11111
1
ddddnn
d knknnnxkkXd
General formula for d-dimensional DST
Essentially a vector-matrix multiplication
Fast versions exists, using divide and conquer techniquesHighly regular
Highly connected
Rules can be applied at formulation level: permutation,index-set..
α’s determine type of transform, e.g. DFT: iii Nknjiii ekn /2),(
( ) ( )( ) ( )8 2 4 1 2 2 2 0 4 2 8F F I T I F I T I F R Ä Ä Ä Ä
8R ( )4 2I FÄ ( )( )2 2 2 0I F I TÄ Ä ( )2 4 1F I TÄ
6
Kronecker Algebra Kronecker Algebra
4444 FFF x Ä)()( 242,4248 FITIFF ÄÄ
84242,4248 )()( PFITIFF ÄÄ
F4
F2 W
W
F2 W
W
F2 W
W
F2 W
WF4
7
Target topologyTarget topology
Similar to existing platforms in market and academia.Annapolis Micro Systems (Wildforce)Gidel (PROC20KE)Berkeley Emulation Engine (BEE) – being proposed as a cost effective alternative to traditional high performance computing systems.
M0
D0
M1
D1
Mk-1
Dk-1
Crossbar
8
Partitioning Methodology Partitioning Methodology
KPA DSTFormulation
ArchitecturalDescription
FormulationManipulator
FormulationTo DFG
Heuristic Control
Partition/Placement
Estimators
High-level partition solution
KPAFormulation
DFG
Cost andIndicators
RuleSelection
KPAFormulation
HypergraphRepresentation
9
DST properties in our methodologyDST properties in our methodology
Incorporated graph considerations to partitioning/placement process
Exploration of equivalent formulations
Partition/Placement
FormulationManipulator
FormulationTo DFG
Heuristic Control
Partition/Placement
Estimators
KPAFormulation
DFG
Cost andIndicators
RuleSelection
10
Graph partitioning considerationsGraph partitioning considerations
Focus on horizontal partitioning schemes (SIMD-like implementation)
Initial solution = balanced horizontal linear partitioning
scheduling consideration: swap nodes from same computational stages.
M0
D0
M1
D1
Mk-1
Dk-1
Crossbar
Kernigan Lin - bipartitioning Heterogeneous channel k-way partitioning
11
Formulation explorationFormulation exploration
( ) ( ), ,n p m n p p m n pF F I T I F P Ä ÄFormulationManipulator
FormulationTo DFG
Heuristic Control
Partition/Placement
KPAFormulation
DFG
Cost andIndicators
RuleSelection
FormulationManipulator
Applies permutation and factorization to Kronecker formulation of DSTs to obtain equivalent formulations
Rule
Number of possible reformulations grows exponentially with DST size
Heuristic control method, first answer questions:Do reformulations have an effect on solution quality?How can we effectively explore the equivalent formulation space to find more apt formulations?
Experiments Gain an understanding of algorithmic level effects on solution quality and convergence.
( ) ( )8 2 16,8 8 2 16,8F I T I F PÄ Ä
( ) ( )( )( )( )2 4 8,2 2 4 8,2 2
16,8 8 2 16,8
F I T I F P I
T I F P
Ä Ä Ä
Ä
12
Measuring quality of solutionMeasuring quality of solution
0 1 1, , , mCost where
‘weight’ of channel iii i WR
required communications through i
D0
D1
D2
D3
D0
D1
D2
D3
4,4 4, ,8Cost
Example: W01 = W12 = W23 = 1, WXBAR = 2
13
Experiment #1 – Inter-stage permutationsExperiment #1 – Inter-stage permutations
Since Cooley-Tukey’s FFT several common formulations available.( ) ( )( ) ( )8 2 4 1 2 2 2 0 4 2 8F F I T I F I T I F R Ä Ä Ä Ä Pease formulation here
Experiment – several sizes of 5 common formulations where partitioned.
ISP have effect on solution quality, yet no clear winner formulation.
StockahmTr. Stockahm
Cooley-TukeyG. Sande
Pease
14
Experiment #2 - GranularityExperiment #2 - Granularity
The weight of the nodes for the various computational stages of the transform.
F4F4 F4F4
F4F4
F4F4
F4F4
F4F4
F4F4
F4F4
F2F2
F2F2
F4F4
F4F4
F4F4
F4F4
F2F2
F2F2
F2F2
F2F2
F2F2
F2F2
F2F2
F2F2
F2F2
F2F2
F2F2
F2F2
F2F2
F2F2
164 4 4 4 4 4 4( ) ( )F F I T I F P Ä Ä 16
422422244444 )))()(()(( PFITIFIIFF ÄÄÄÄ
coarser finer
15
Experiment #2 – GranularityExperiment #2 – Granularity
Decomposition rules: Large DST = combinations of smaller DSTs analogous to node clustering
* Multiple formulations achieved best cost. Coarsest granularity is shown.
Size Cost Formulation Cost Formulation Cost Formulation Cost Formulation32 11 2/2/2/4* 7 2/2/2/4 32 8/2/2* 16 2/4/2/264 22 8/2/4* 14 2/2/8* 48 2/2/2/2/4 20 4/2/2/4
128 43 8/2/8* 26 16/2/2/2* 92 2/2/2/2/2/4 32 2/2/2/2/2/4256 86 4/2/32* 55 16/8/2* 132 4/2/2/2/2/4 58 2/2/2/2/2/2/4512 171 4/2/64* 106 64/4/2* 276 2/2/2/2/2/2/4/2 116 2/2/2/2/2/2/8
Array 4 Ring 4 Array 8 Ring 8
Effect of topology: Ring vs. Linear: 57% cost reductionFinest granularity not necessarily best.
( ) ( ) ( ) ( ) ( ) ( ) ( )( )( )8 4 2 8,4 4 2 8,4 2 4 8,2 2 4 8,2 2 4 8,2 2 2 2 4,2 2 2 4,2 8,2F F I T I F P F I T I F P F I T I F I T I F P P Ä Ä Ä Ä Ä Ä Ä Ä
16
Experiment #3 – Breakdown strategyExperiment #3 – Breakdown strategy
Breakdown strategy – order and divisors with which a transform is decomposed.
Split trees – a common graphical representation of break. Strategy
Example: Two split tress for a DFT size 64.
( ) ( )( )( ) ( )64 4 2 8,4 4 2 8,4 8 64,8 8 8 64,8F F I T I F P I T I F P Ä Ä Ä Ä
( )64 2 32 64,2F F I T Ä ( ) ( )( )( )2 2 16 16,2 2 16 16,2 64,2I F I T I F P PÄ Ä Ä
(a)
(b)
6
3 3
2 1
6
1 5
41
(a) (b)
17
Experiment #3 – Results Experiment #3 – Results
ProcedureExhaustive generation of split trees for DFT sizes n=16 to 256.
Formulations partitioned for various topologies
Observation of split tree decisions that lead to ‘partition friendly’ formulations
Generation of n > 256 formulations using rules.
18
Conclusions and Future WorkConclusions and Future WorkMethodology for partitioning of DST to DHAs:
DST graph considerations Formulation exploration
Graph considerationsGeneration of initial partition linear – provides better results than random.Limitation of node moves – faster convergence time.
Exploration at the algorithmic level experimentsIsolated features such as permutations and granularity
Effect was evidenced, but hard to establish a relation to solution quality.Coarse granularity = better convergence, good solution quality
Breakdown strategy – ‘partition friendly’ formulations generated.
Current Work: Experimentation with DCTs.Experimentation with other properties define overall exploration strategy
19
AcknowledgementsAcknowledgements
Puerto Rico Experimental Program to Stimulate Competitive
Research (PR-EPSCoR)
WALSAIP - Wide-Area Large Scale Automated Information Project
Puerto Rico NASA Space Grant
QUESTIONS?
top related