scenario-oriented design for single chip heterogeneous multiprocessors joann m. paul electrical and...
TRANSCRIPT
Scenario-Oriented Design for Single Chip Heterogeneous Multiprocessors
JoAnn M. PaulElectrical and Computer Engineering DepartmentCarnegie Mellon UniversityPittsburgh, PA
Presented by: Mohammad Farsakh
What is this paper about
Challenges of selection, programming, and coordination for future single chip computers designs
Consisting of processing elements (PEs) Heterogeneous type
Outlines the differences between Next generation single chip systems designs Traditional designs.
Focus on Scenario-Oriented Design (SOD) strategy Applications, schedulers, and hardware viewed as a system Leveraging one against the other. Reducing the modeling detail of each design domain within a
system in high level simulation.
Introduction
The design process of digital computation has three categories Models Tools Strategies
Existing models, tools and strategies are failing to permit designers of single chip computers, to efficiently capture the design space at hand.
Tools do not capture software as part of the system model. Instruction Set Simulators (ISS) are too detailed and slow to capture
systems with many processors Designers are left to their own devices, which limits the effective
realization of many potentially significant designs. Models, tools and strategies for both the software and
hardware of single chip heterogeneous multiprocessors is required.
Introduction (cont)
The future individual, programmable processors will be like registers in a larger framework.
– Processor blocks will be differentiated by Capabilities of the hardware in the processors, The way they are programmed, Their manner of interconnection.
Should maximize the ratio DQ/DE for a given design to initiate the next design level
DQ = Design Quality DE = Development Effort
A common basis for design, at a modeling level is required to manipulate the design decisions that has the most impact.
Programmable HeterogeneousMultiprocessors (PHMs)
Collections of PEs must be considered programmable The chip is a programmable collection of processors grouped
dynamically. Different design challenges in heterogeneous multiprocessors because
three primary reasons:
– A single chip is a finite resource, unlike wide-area networks.– The design will be semi-custom.
Under hardware more customized to the application space than traditional programmable system
Traditional heterogeneous multiprocessors , provide transaction-like services on a diverse collection of resources
Single chip devices such as SoCs are customized to meet fixed latency requirements as a reactive system.
PHMs will be semi-custom and have aspects of both design styles.– Coordination of system resources is required.
The large differential in on-chip vs. off-chip communications will force efficient utilization and management of on-chip system resources — including processing elements, memory, communications bandwidth and chip I/O.
Design Environment of Single Chip PHM
H , Single Chip heterogeneous multiprocessor. Data Inputs
– DP, time stamped system inputs that are conceptually presented to the system hardware on I/O pins.
– DM, data values reside in some external memory. Analogous to jobs, packets or other requests in a queue waiting to be processed by H.
Programs– BC, clocked benchmarks programs with fixed latency requirements with required latency
specified to a fixed time reference. Designed to meet the worst-case demands that are presented to the system by DP. Programs have fixed performance requirements.
– BI, programmatic inputs benchmarks for which performance is calculated by the internal timing of the processing capabilities of the design. This run over many PEs.
– BX, schedulers programs, that acts as a means of resolving the other benchmarks to the architecture.
Design Output
Single output Q, has the quality metric of the design including the performance for the two classes of behaviors (BC and BI)
General form of such environment E = {D, B, H, Q} In case of E = {BC, DP, Q}
Pass/fail Quality metric Fully specified by DP and BC and not a separately performance-evaluated architecture. Hardware Description Language (HDL)
In case of E = {BC, BX, DP, H, Q} where H is a single processor Pass/fail Quality metric Kind of analytical modeling typical of research in real-time operating systems (RTOSs). RTOS
In case of E = {BI, DM, H, Q} H is a single processor executing at the instruction set simulator level or below It is typical of simulators such as Simple scalar used to model a micro architecture or ISA. Simple ISS
Complexity of the application space Current day approach ISS, can’t permit effective exploration of the design
space Complete level of detail required in the model Takes long time to generate any single value of Q.
Scenario-Oriented Design
A novel design strategy Orients heterogeneous multiprocessor single chip
design according to a blend of performance requirements,
Implemented in new chip-wide programmer’s views. Leverages increased heterogeneity in the future
application space Results in greater efficiency in design process and
Resource utilization.
Fixed performance (FP)
To meet the current systems requirement for system with Dp and BC current system must be overdesigned for two reasons:– The capacity of system resources is wasted, with
the time taken to matching functionality to available processing power, to make sure that the WC behavior is met.
– The irregular loading situations and data dependent processing times contribute to underutilized processing resources except in peak loading situations with WC.
Throughput performance (TP)
Bi designed to be a broad representative set of program types used to evaluate and optimize a programmable device’s throughput performance (TP).
Optimize a common case (CC) instead of ensuring that WC behaviors are met.
Like network switches dropping packets presumed to be resent.
Applies to caches, branch predictors, OS scheduling strategies
Future Vs. Current Designs
Two design strategies are worlds apart.– worst case (WC) with fixed performance– common case (CC) with throughput performance (TP)
Future single chip designs – Execute a mix of the BC and BI to handle a mix of DP and DM – FP behaviors are met – CC behaviors are optimized.
Currently, systems with FP and TP performance oriented design
– Separated into different devices General purpose programming resides on the general purpose
processor, Other processors utilize individual RTOSs to ensure WC behaviors are
met, or WC behaviors are ensured by implementation in custom hardware.
Layered, SOD approach to SoC Design
SOD can satisfy performance for FP functionality and provide a basis for a TP-optimized remainder architecture.
Hardware architecture and a remainder architecture are co-designed.
Map the FP functionality across the entire chip, consuming part of the proposed architecture
Leverages the presence of both classes Optimize design time Optimize design quality
Measuring exact execution times for FP is not required at the start of design
Hardware architecture and a remainder architecture are co-designed.
SoC Hardware View
Different Processing Elements (PE)
Different functionality
Common communication channel
SoC With Remainder Architecture
Software partitioning PE divided to two parts PE = {F-I,R-i} COMM ={R-COMM, F-COMM} Functional Overlay, {F-i}, BC to
Processing resources Remainder architecture carry
BI, R = {R-I,R-COMM}
Layered, SOD approach to SoC Design
New layer between R-i and F-I Enlarge the boundary between
performance group partitions Reduce design time
– FP mapped to chip need not be known beforehand
– Optimize TP SOD partitioning produces a chip-wide,
horizontal view– Hardware resources in the bottom layer– Schedulers in the middle layer ( permit the co-
operation of …. ) – General software at the top layer – Last two layers could have multiple internal
layers Layering concept, leverages schedulers
as a basis for a soft partitioning of a hardware design.
Simulation Foundation — MESH
The Modeling Environment for Software and Hardware (MESH) is a good simulator
– Provide a layered modeling basis above ISS models – Use schedulers to model concurrent, high level software
running on high level models of processor resources.– Resolve the timing through design layers where unrestricted
software executes on hardware models without relying upon ISS
Modeling Environment for Software and Hardware (MESH)
ThLij — One of j logical threads (software) that will execute on processor i.
ThPi — A model of the ith physical resource in the system, such as a processor.
UPi — A scheduler that selects logical threads intended to execute on resource ThPi.
ULi — A logical scheduler that can schedule M threads to N resources. e.g., a pthread scheduler
How Mesh Works
Dynamic number of logical threads Execution is scheduled onto a single resource Scheduling decisions based on the state of the threads and
other system state. Resolves the logical events of the software threads to physical
timing Schedulers serve two roles:
Modeling scheduling decisions, Resolving logical computation to physical time.
Complex system have many resources (ThPi) Two dimensions of scheduling:
Based on physical time Based logical state.
M threads may dynamically mapped to N resources 2.5 times faster than an internal ISS level simulator
Conclusion
Challenges for future designs Performance Power Chip size
Future computer design should be evaluated as a system
SOD: strategy result from considering applications, schedulers, and hardware as they interact to form a system
Leveraging each against the other Reducing modeling details
Thanks