hongtao du
DESCRIPTION
Hongtao Du. Part 2. AICIP Research Dec 1, 2005. Partition Scheme. Driving Force. Data-driven How to divide data sets into different sizes for multiple computing resources - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Hongtao Du](https://reader035.vdocuments.net/reader035/viewer/2022062314/568148ad550346895db5c181/html5/thumbnails/1.jpg)
1
Pipelined and Parallel Computing
Partition for
Hongtao DuAICIP Research
Dec 1, 2005
Part 2
![Page 2: Hongtao Du](https://reader035.vdocuments.net/reader035/viewer/2022062314/568148ad550346895db5c181/html5/thumbnails/2.jpg)
2
Partition Scheme
![Page 3: Hongtao Du](https://reader035.vdocuments.net/reader035/viewer/2022062314/568148ad550346895db5c181/html5/thumbnails/3.jpg)
3
Driving Force
• Data-driven– How to divide data sets into different sizes for multiple
computing resources – How to coordinate data flows along different directions
such that brings appropriate data to the suitable resources at the right time.
• Function-driven– How to perform different functions of one task on
different computing resources at the same time.
![Page 4: Hongtao Du](https://reader035.vdocuments.net/reader035/viewer/2022062314/568148ad550346895db5c181/html5/thumbnails/4.jpg)
4
Data - Flynn's Taxonomy
• Single Instruction Flow Single Data Stream (SISD)
• Multiple Instruction Flow Single Data Stream (MISD)
• Single Instruction Flow Multiple Data Stream (SIMD)– MPI, PVM
• Multiple Instruction Flow Multiple Data Stream (MIMD)– Shard memory– Distributed memory
![Page 5: Hongtao Du](https://reader035.vdocuments.net/reader035/viewer/2022062314/568148ad550346895db5c181/html5/thumbnails/5.jpg)
5
Data Partitioning Schemes
Block
Scatter Contiguous point
Contiguous row
![Page 6: Hongtao Du](https://reader035.vdocuments.net/reader035/viewer/2022062314/568148ad550346895db5c181/html5/thumbnails/6.jpg)
6
Communication Patterns and Costs
• Communication expense is the first concern in data-driven partition.
• Successor/Predecessor (S-P) pattern • North/South/East/West (NSEW) pattern
is the message preparation latency, is the transmission speed (Byte/s),
is the number of processors, is the number of data, is the length of each data item to be transmitted.
p 2n
d
![Page 7: Hongtao Du](https://reader035.vdocuments.net/reader035/viewer/2022062314/568148ad550346895db5c181/html5/thumbnails/7.jpg)
7
Understanding Data-driven
• The arrivals of data initiate and synchronize operations in the systems.
• The whole system in execution is modeled as a network linked by data streams.
• Granularity of the algorithm: the size of data block that transmitted between processors. The flows of data blocks form data streams.
• Granularity selection: trade-off between computation and communication
– Large: reducing the degree of parallelism; increasing computation time; little overlapping between processors.
– Small: increasing the degree of overlapping; increasing communication and overhead time
![Page 8: Hongtao Du](https://reader035.vdocuments.net/reader035/viewer/2022062314/568148ad550346895db5c181/html5/thumbnails/8.jpg)
8
Data Dependency
• Decreasing even dismissing the speedup
• Caused by edge pixels on different blocks
Block Reverse diagonal
![Page 9: Hongtao Du](https://reader035.vdocuments.net/reader035/viewer/2022062314/568148ad550346895db5c181/html5/thumbnails/9.jpg)
9
Function
• Partitioning procedure– Evaluating the complexity of individual process in
function and the communication between processes
– Clustering processes according to objectives
– Partitioning optimization
![Page 10: Hongtao Du](https://reader035.vdocuments.net/reader035/viewer/2022062314/568148ad550346895db5c181/html5/thumbnails/10.jpg)
10
Space-time-domain Expansion
• Definition: sacrificing the processing time to meet the performance requirements.
Time complexity:
)),(( nmMaxO
![Page 11: Hongtao Du](https://reader035.vdocuments.net/reader035/viewer/2022062314/568148ad550346895db5c181/html5/thumbnails/11.jpg)
11
One Dimension Partitioning
• Keeping the processing size to one column at a time.
• Repeatedly feeding in data until the process finishes.
• Increases the time complexity by n (the number of column)
![Page 12: Hongtao Du](https://reader035.vdocuments.net/reader035/viewer/2022062314/568148ad550346895db5c181/html5/thumbnails/12.jpg)
12
Two Dimension Partitioning
• Fixing the processing size to a two-dimensional subset of the original processing.
• Increasing the time complexity by
lk
nm
![Page 13: Hongtao Du](https://reader035.vdocuments.net/reader035/viewer/2022062314/568148ad550346895db5c181/html5/thumbnails/13.jpg)
13
Resource Constraints
• Multi-processor– Software implementation– Homogenous system– Heterogeneous system
• Hardware/software (HW/SW) co-processing– Software and hardware components are co-designed– Process scheduling
• VLSI– Hardware implementation– Communication time is ignorable
![Page 14: Hongtao Du](https://reader035.vdocuments.net/reader035/viewer/2022062314/568148ad550346895db5c181/html5/thumbnails/14.jpg)
14
Multi-processor
• Heterogeneous system– Contains computers in different types of parallelism.– Overheads in communicating add extra delays.– Communication tasks such as allocating buffers and setting
up DMA channels have to be performed by the CPU and cannot be overlapped with the computation.
• Host/Master - a powerful processor
• Bottleneck processor - the processor taking the longest amount of time to perform the assigned task.
![Page 15: Hongtao Du](https://reader035.vdocuments.net/reader035/viewer/2022062314/568148ad550346895db5c181/html5/thumbnails/15.jpg)
15
HW/SW Co-processing
• System structure– SW - a single general purpose processor, Pentium or PowerPC– HW- a single hardware coprocessor, FPGA or ASIC– A block of shared memory
• Design view– Hardware components: RTL components (adders, multipliers,
ALUs, registers)– Software component: general-purpose processor– Communication: between the software component and the local
memory
• 90-10 Partitioning– Most frequent loops generally correspond to 90 percent of
execution time but only consisting of simple designs
![Page 16: Hongtao Du](https://reader035.vdocuments.net/reader035/viewer/2022062314/568148ad550346895db5c181/html5/thumbnails/16.jpg)
16
VLSI
• Constraints– Execution time (DSP ASIC)– Power consumption– Design area– Throughput
• Examples– Globally asynchronous locally synchronous on-chip
bus (Time)– 4-way pipelined memory partitioning (Throughput)
![Page 17: Hongtao Du](https://reader035.vdocuments.net/reader035/viewer/2022062314/568148ad550346895db5c181/html5/thumbnails/17.jpg)
17
Question ……
Thank you!