achieving structural and composable modeling of …achieving structural and composable modeling of...

8
Achieving Structural and Composable Modeling of Complex Systems David I. August Sharad Malik Li-Shiuan Peh Princeton University Princeton, NJ august,sharad,peh @princeton.edu Vijay Pai Rice University Houston, TX [email protected] Abstract This paper describes a recently-released, structural and composable modeling system called the Liberty Simula- tion Environment (LSE). LSE automatically constructs sim- ulators from system descriptions that closely resemble the structure of hardware at the chosen level of abstraction. Component-based reuse features allow an extremely diverse range of complex models to be modeled easily from a core set of component libraries. This paper also describes a set of such libraries currently under development. With LSE and these component libraries, students will be able to learn about systems in a more intuitive fashion, researchers will be able to collaborate with each other more easily, and de- velopers will be able to rapidly and meaningfully explore novel design candidates. 1. Motivation and Direction There is an increasing need to rapidly and accurately model a diverse set of hardware systems. Ideally, in the creation of hardware systems, researchers and developers would build prototypes of each design candidate for evalua- tion. Prototype building can yield extremely accurate mod- els, and the process of building the prototype itself can be informative. Of course, prototype building is impractical in almost all situations, but practical modeling methodologies that engineers employ should mimic the positive aspects of prototype building as much as possible. The most prevalent modeling methodology employed to- day is hand-writing monolithic simulators in sequential pro- gramming languages such as C or C++. While writing a simulator in this way, the simulator writer must map sys- tems, which are inherently structural and concurrent, to a sequential programming language with functional compo- sition. Though much more cost effective than prototype construction, this mapping process is still quite laborious, often consuming many person-years of effort. The manual mapping process is also prone to error, and because the re- sulting simulator code does not resemble the design or op- eration of actual systems, errors introduced tend to go unno- ticed [9, 6, 11]. Further, unlike prototype construction, little understanding of the system is gained during the mapping process. The manual mapping problem has broader negative ef- fects. These effects are most pronounced in the following three areas: Collaboration. There exist many correct ways to map a concurrent, structural system to a sequential language. Unfortunately, unless a common mapping scheme can be adopted, the resulting simulators cannot interoperate. Collaboration between and among mem- bers of academia and industry often stalls because of this tool incompatibility. Collaboration between do- mains is hardest hit for lack of common multi-domain solutions. Novel Research. Radical and disruptive research is often difficult to achieve with the current modeling methodology. Publicly available simulators provide a model only for systems similar to those preconceived by the tool’s authors. High risk ideas requiring a new simulator are often discarded because of the poten- tially enormous cost of failure. Rapid Reuse. Monolithic simulator tools tend to be “one-off” items often rewritten from scratch for each project. Models of the same single component may be written many times to fit each simulation system used. Researchers and developers should not have to con- tinue paying the price of these unnecessary recurring costs. These negative effects have been identified and much work has been performed to address them. Some have proposed the creation of standard simulator tool sets upon which extensions could be built [20, 14, 5], but the diversity of needs requires that no monolithic simulator standard be adopted. Tools have been created to rapidly produce a simu- lator, but these tools typically speed the process by making 0-7695-2132-0/04/$17.00 (C) 2004 IEEE Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS’04)

Upload: others

Post on 14-Aug-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Achieving Structural and Composable Modeling of …Achieving Structural and Composable Modeling of Complex Systems David I. August Sharad Malik Li-Shiuan Peh Princeton University Princeton,

Achieving Structural and Composable Modeling of Complex Systems

David I. August Sharad Malik Li-Shiuan PehPrinceton University

Princeton, NJ�august,sharad,peh�@princeton.edu

Vijay PaiRice University

Houston, [email protected]

Abstract

This paper describes a recently-released, structural andcomposable modeling system called the Liberty Simula-tion Environment (LSE). LSE automatically constructs sim-ulators from system descriptions that closely resemble thestructure of hardware at the chosen level of abstraction.Component-based reuse features allow an extremely diverserange of complex models to be modeled easily from a coreset of component libraries. This paper also describes a setof such libraries currently under development. With LSEand these component libraries, students will be able to learnabout systems in a more intuitive fashion, researchers willbe able to collaborate with each other more easily, and de-velopers will be able to rapidly and meaningfully explorenovel design candidates.

1. Motivation and Direction

There is an increasing need to rapidly and accuratelymodel a diverse set of hardware systems. Ideally, in thecreation of hardware systems, researchers and developerswould build prototypes of each design candidate for evalua-tion. Prototype building can yield extremely accurate mod-els, and the process of building the prototype itself can beinformative. Of course, prototype building is impractical inalmost all situations, but practical modeling methodologiesthat engineers employ should mimic the positive aspects ofprototype building as much as possible.

The most prevalent modeling methodology employed to-day is hand-writing monolithic simulators in sequential pro-gramming languages such as C or C++. While writing asimulator in this way, the simulator writer must map sys-tems, which are inherently structural and concurrent, to asequential programming language with functional compo-sition. Though much more cost effective than prototypeconstruction, this mapping process is still quite laborious,often consuming many person-years of effort. The manualmapping process is also prone to error, and because the re-

sulting simulator code does not resemble the design or op-eration of actual systems, errors introduced tend to go unno-ticed [9, 6, 11]. Further, unlike prototype construction, littleunderstanding of the system is gained during the mappingprocess.

The manual mapping problem has broader negative ef-fects. These effects are most pronounced in the followingthree areas:

� Collaboration. There exist many correct ways tomap a concurrent, structural system to a sequentiallanguage. Unfortunately, unless a common mappingscheme can be adopted, the resulting simulators cannotinteroperate. Collaboration between and among mem-bers of academia and industry often stalls because ofthis tool incompatibility. Collaboration between do-mains is hardest hit for lack of common multi-domainsolutions.

� Novel Research. Radical and disruptive research isoften difficult to achieve with the current modelingmethodology. Publicly available simulators provide amodel only for systems similar to those preconceivedby the tool’s authors. High risk ideas requiring a newsimulator are often discarded because of the poten-tially enormous cost of failure.

� Rapid Reuse. Monolithic simulator tools tend to be“one-off” items often rewritten from scratch for eachproject. Models of the same single component may bewritten many times to fit each simulation system used.Researchers and developers should not have to con-tinue paying the price of these unnecessary recurringcosts.

These negative effects have been identified and muchwork has been performed to address them. Some haveproposed the creation of standard simulator tool sets uponwhich extensions could be built [20, 14, 5], but the diversityof needs requires that no monolithic simulator standard beadopted. Tools have been created to rapidly produce a simu-lator, but these tools typically speed the process by making

0-7695-2132-0/04/$17.00 (C) 2004 IEEE

Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS’04)

Page 2: Achieving Structural and Composable Modeling of …Achieving Structural and Composable Modeling of Complex Systems David I. August Sharad Malik Li-Shiuan Peh Princeton University Princeton,

domain-specific assumptions about the system to be mod-eled [18, 21, 23, 13]. In other cases the tools are too genericand do not provide the necessary mapping constraints to en-sure component interoperability [4, 19, 10, 16]. In all thesecases, the root cause of these negative effects, the mappingproblem, was not directly addressed and one or more ofthese problems remain [25].

The ideal modeling solution is a system that enables amethodology approximating prototype building. The spec-ification of the model should resemble the hardware itself;it should be structural and concurrent, eliminating the needto map the candidate design to sequential code. Like pro-totyping, the specification process should force designersand researchers to think about the hardware, not to worryabout simulator implementation issues. The specificationshould involve reusable components at various levels of ab-straction. To facilitate collaboration, a domain independentcomponent contract should be used to ensure that compo-nents and specifications from any domain can interact with-out prior planning. The system should also be free of any as-sumptions that would limit the exploration of radical ideas.We intend to deliver such a system to the modeling commu-nity.

Our goal is to research, develop, and disseminate a com-plete simulation environment that supports a structural andcomposable modeling methodology. This simulation sys-tem consists of the Liberty Simulation Environment (LSE)and an initial set of robust component libraries.

LSE automatically constructs simulators from systemdescriptions that closely resemble the structure of hardwareand component libraries. This structural resemblance to thehardware provides confidence in the model and frees sys-tems researchers to think about systems, not simulator cod-ing concerns. LSE’s strict but general component communi-cation contract enables the creation of highly reusable com-ponent libraries and eases the task of rapidly exploring evermore exotic designs. LSE components and descriptions canbe hierarchically composed of other components and canexist at any level of abstraction (statistical to gate-level).This choice of abstraction level combined with partial spec-ification support allow models to be iteratively refined; de-scriptions generate fully functional simulators from the verystart, allowing users to specify and validate precise modelsincrementally.

Several component libraries must be provided to serveas a foundation for creating simulators that span multiplearchitectural levels. LSE components have already beenused to model a wide range of microprocessors and in-terconnection networks. By building on these componentlibraries, we intend to support a wide range of compu-tational systems including systems-on-a-chip (SoCs), dis-tributed clusters of workstations, tightly-coupled multipro-cessors, and high-end supercomputers. While traditional

Instruction SetEmulation

Customized and Interconnected Component Instances

Full System Simulator

SimulatorExecutableLiberty Simulator Constructor

UPL

User

PCL NIL

userdefined MPL CCL

LibertySimulator

Specification(LSS)

Components for use in LSS

Figure 1. An overview of LSE

simulators have focused on general-purpose programmableprocessors, LSE will allow the composition of complex het-erogeneous system models that include both programmablecomponents and dedicated application-specific hardwaremodels for tasks such as wireless communication or high-speed network I/O. Such composability becomes essentialas systems of interest evolve from commodity PCs to sensornetwork arrays and distributed low-power embedded sys-tems.

By providing a design-neutral, unrestricted open-sourcesimulation framework to the community, we intend to im-prove the quality of best-known techniques. Through struc-tural and composable model specification, LSE will allowresearchers to easily collaborate, exchange ideas, under-stand novel techniques, and evaluate the work of others in avariety of contexts, facilitating independent verification ofresearch. The resemblance of LSE descriptions to real sys-tems will allow it to be an effective educational tool whenintegrated with an interactive system visualizer. The devel-opment of several additional reusable core component li-braries will provide a starting point for exploration by otherresearchers, and the “Liberation” of existing popular simu-lation systems, through encapsulation into LSE modules orthrough equivalent configuration, will allow a smooth tran-sition for interested researchers.

2 The Liberty Simulation Environment

A user of the Liberty Simulation Environment (LSE)writes a Liberty Simulator Specification (LSS) to specifythe desired system by defining interconnections betweencustomized instances of reusable module templates. LSEreads the LSS, instantiates modules templates into mod-ule instances, and weaves the specification and moduleinstances together to form an executable simulator. Anoverview of the Liberty simulator construction process isshown in Figure 1.

LSE makes no assumptions about the target system whileensuring that components interoperate. This guarantees thatcomponents developed for one domain can be combinedwith components developed independently for another. Inthis way, LSE can serve a single platform for developing a

0-7695-2132-0/04/$17.00 (C) 2004 IEEE

Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS’04)

Page 3: Achieving Structural and Composable Modeling of …Achieving Structural and Composable Modeling of Complex Systems David I. August Sharad Malik Li-Shiuan Peh Princeton University Princeton,

limitless range of full system simulators. LSE also containsfeatures that allow for iterative refinement of designs, a va-riety of abstraction levels, and reuse of the components forreduced simulator construction time.

2.1 Reusable Components

LSE was developed with a specification language andcomponent system based on observations about the short-falls of existing systems. Many simulation systems, in par-ticular those for processor design [20, 5], are built upon se-quential programming languages, such as C or C++, andattempt to leverage traditional software componentizationand composition techniques to allow for hardware com-ponent reuse. Unfortunately, this approach does not workwell, since traditional software composition techniques aredesigned for components that execute sequentially in LIFO(program stack) order, typically waiting for all inputs beforebeginning computation. As a result, the timing, control, andfunctionality of components cannot be abstracted in a waythat would allow them to be reused across a large variety ofsystems [25]. This lack of reuse makes designing a systemmodel extremely cumbersome. Furthermore, since many ofthese systems do not have clean component interfaces withrespect to hardware blocks, it becomes difficult to refine acoarse model to a more accurate one by replacing high-levelmodels with more detailed ones.

In contrast, like real hardware, each LSE module in-stance executes concurrently with other LSE module in-stances. Modules specify their interface to other modulesvia ports. Each port represents an input or output chan-nel for the module, and may have multiple connections sothat users can easily scale the bandwidth a module instancehas to the other blocks. Each module instance is abstractedsolely by its communication interface, with no assumptionsabout sequentiality of the internal computation. Becauseof this decomposition, the LSE specification resembles thehardware that is being designed. As a result, it is far easierand less error prone to translate ideas about hardware to anLSS.

LSE module templates encapsulate functionality with aflexible control interface. Each connection in LSE actuallycorresponds to a connection of 3 signals. These 3 signalsare used to negotiate whether or not data can be transmittedacross a connection in a particular time-step. The signalsare similar to those used in bus handshaking protocols, andserve a similar purpose, to guarantee that two componentscan interoperate even if they weren’t explicitly designed tointeroperate with each other. Within a set of simple rules,the user can manipulate how the “handshake” signals areused to specify any control behavior they desire, indepen-dent of module functionality. To prevent the user from hav-ing to specify full control semantics, module templates pro-

vide default control semantics. Using the default control se-mantics, working system models can be constructed by con-necting the datapath and specifying minimal control. LSEallows the user to override the default control semantics sothat any system behavior can be specified.

In addition to control semantics, details regarding thefunctionality of hardware blocks often vary from systemto system. However, in monolithic simulators, these smallvariations often require extensive code modifications sincefunctionality, timing, and control are intertwined in thespecification. Since these changes become overwhelm-ing, simulators are often written from scratch for each newproject. To allow components to be reused, despite smallchanges or extensions to functionality, LSE also has a pow-erful component customization capabilities. Componentshave algorithmic parameters, parameters whose values de-scribe functionality. Via these parameters, users can inheritthe overall functionality of a module template, but adapt thespecific behavior to the system being modeled.

To further improve reuse, LSE allows users to build newmodule templates based on the interconnection and cus-tomization of instances of existing module templates. Tomake the resulting hierarchical module template flexible,the LSS language has powerful syntax by which users canspecify the hierarchical module template’s parameters andport connections relative to the sub-module-instances’ in-terconnections and customizations.

By providing hierarchical structural composition, cus-tomization of components, and a communication contractwith default control semantics, LSE allows construction ofmodule template that can be reused in many contexts. Forexample, a single module template can be instantiated tomodel a processor’s instruction window, its reorder buffer,and the I/O buffers in a packet router [25, 26].

2.2 Levels of Abstraction and Iterative refinement

One of LSE’s great strengths is the ability of LSE moduleinstances to interoperate. Even module instances with dif-ferent levels of abstraction can interoperate. As a result, it ispossible to mix components with different levels of detail inthe same LSS. For example, a model of an interconnect net-work may have connected to it a statistical packet generatorused to simulate network traffic. However, it is possible toreplace the statistical packet generator with a network inter-face controller for a microprocessor simply by replacing thepacket generator. In this way, the same interconnect modelcan be used with an abstract statistical model, as well as adetailed microprocessor model.

Full system abstraction is also possible with LSE. Eachmodule template can provide default semantics when someof its ports are left unconnected. This means that users canspecify a partial system and rely on the default behavior to

0-7695-2132-0/04/$17.00 (C) 2004 IEEE

Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS’04)

Page 4: Achieving Structural and Composable Modeling of …Achieving Structural and Composable Modeling of Complex Systems David I. August Sharad Malik Li-Shiuan Peh Princeton University Princeton,

fill in omitted details, thus forming an abstract model of theentire system. We use this feature extensively while build-ing processor microarchitectural models. The typical de-sign process starts by first specifying simple fetch and issuelogic. Then, once satisfied with this behavior, we add apipeline specification, speculation control logic, predictors,and memory hierarchies in turn. At each stage in this refine-ment process, the specification is compilable into a workingsimulator.

2.3 Relation to Other System Specification Ap-proaches

Other systems do not provide a domain independentmodeling system with a component contract that allows forreusable component libraries. Hardware description lan-guages, while capable of full system description, do not pro-vide enough support for component customization to buildreusable components. Thus, a user has no ability to affectthe internal timing of pre-built components. Furthermore,HDLs still require the user to specify all control explic-itly, making specification of new systems quite involved.Since the control interface is ad-hoc, pre-existing compo-nents may not be able to interact in the desired way if thecorrect control signals were not exposed.

SystemC [19] is superior to HDLs since it has bettersupport for inheritance and a more advanced type-systemallowing better abstraction. However, SystemC does notprovide any mechanism for simplifying the specification ofcontrol, or providing default control semantics. As a re-sult, specifying a model in SystemC can still be difficult.In addition, since the SystemC libraries do not provide acomponent communication contract, the control interfaceis, once again, ad-hoc. Thus, the same mapping problemexists for SystemC, making the standardization upon a stan-dard component library unlikely. Note, however, that LSEcould bring its benefits to SystemC to solve these problemsby wrapping it. This is an option being explored.

While not exclusively a hardware modeling tool, thePtolemy framework does allow users to model hardware bycomposing concurrently executing components [4]. How-ever, unlike LSE, SystemC, and HDLs, Ptolemy allows eachmodel to specify the model of computation (MoC) that gov-erns the semantics of communication and execution. Thisflexibility, however, comes at a price. When using differ-ent models of computation, work must be done to ensurethat the models can be composed. Sometimes this processis easy and automatic, at other times it may require solv-ing difficult problems [17]. Techniques allow some MoCsto interact [15, 12], but they do not cover all MoCs leav-ing the possibility of incompatible components. Further-more, for hardware modeling, this flexibility is unnecessarysince most hardware can easily be specified using a single

model of computation with little loss of clarity or specifica-tion ease. Also, MoC flexibility also comes at a simulationperformance price too steep for applications of interest.

LSE fixes its MoC to a reactive model of computation.This has several advantages. First, power users only haveto learn one set of computation and communication seman-tics easing the learning curve (though most users need notbe concerned with these details). Second, since all compo-nents use the same model of computation, the MoC does notpreclude reuse of components. It should be noted, however,that the standard MoC must be carefully chosen to avoid in-terfering with reusability [25]. Third, by carefully selectingthe model of computation it is possible to analyze the LSSfor optimization [22].

Other domain specific approaches for simulator con-struction have been proposed. These approaches [18, 21,23, 13] gain most of their benefit from domain specific as-sumptions and thus are not suitable for general system-levelsimulation. Other approaches have been proposed [10, 16]that could extend to full system simulation. However, theseapproaches lack the necessary features to allow construc-tion of interoperable components and highly reusable com-ponent libraries.

Perhaps the most important shortfall of many of thesesystems is availability. Many of these systems are pro-prietary or not publicly available. Through the support ofthe National Science Foundation’s Next Generation Soft-ware program, we have been able to release LSE Version1.0 without restriction ensuring that it will remain an open,standardized collaborative framework.

3 Component Libraries

The core of LSE is its libraries of components consistingof LSE modules. These pre-defined modules enable rapidmodeling of complex systems through seamless composi-tion. We classify the various components into the followinglibraries based on a functional partition:

Primitive Component Library (PCL) This consists ofprimitive building blocks that are likely to be used across awide range of applications. Examples include arbiters andmemory arrays.

Uni-processor Library (UPL) This consists of themicro-architectural elements of general purpose and appli-cation specific processors. Examples include instruction de-coders and branch prediction units.

Communication Component Library (CCL) Thisconsists of building blocks of communication fabrics. Ex-amples include buses and routers.

Network Interface Library (NIL) This consists ofcomponents that serve as interfaces across network bound-aries and in between networks and processors. As example

0-7695-2132-0/04/$17.00 (C) 2004 IEEE

Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS’04)

Page 5: Achieving Structural and Composable Modeling of …Achieving Structural and Composable Modeling of Complex Systems David I. August Sharad Malik Li-Shiuan Peh Princeton University Princeton,

is a format converter that sits between an Ethernet and a PCIbus.

Multi-processor Library (MPL) This consists of com-ponents used in multi-processor architectures. Examplesinclude cache-coherence engines for implementing sharedmemory systems and DMA controllers for implementingmessage passing.

The above classification is a functional one. It is drivenby the need for organization of what would otherwise be asingle vast and difficult to manage library. This classifica-tion helps in both the library development stage as well asthe library deployment stage. During library development,it is the natural partition of tasks among specialists in thevarious functional domains. During deployment, it servesas a catalog to help search for the appropriate match in thebuilding of complex systems.

It should be noted that this classification does not createany usage boundaries. Components in one library can freelybe used in other libraries. The primitives in PCL are likelyto be used in all the other libraries. Similarly, MPL is likelyto build on all the other libraries in constructing top-levelmodels of complex multi-processor systems.

We now illustrate the use of these component librariesin assembling models for a diverse range of systems. Fig-ure 3 sketches several systems that our proposed simula-tion framework will support. Each system can be com-posed in a plug-and-play fashion using the LSE modulesdefined in our suite of component libraries. By defining allthese modules with LSE, modules can inter-operate seam-lessly across component libraries. Thus, simulation of newcomplex systems can leverage the modules in these com-ponent libraries for substantial productivity gains. For in-stance, a chip multi-processor (see Figure 2(a)) will consistof general-purpose processor (GP) modules from UPL, in-terface modules (NI) from NIL, and network fabric modulesprovided by CCL, glued with multiprocessor modules fromMPL.

Many of these libraries will share modules with similarsemantics, with components carefully defined for reuse. Forinstance, in a sensor network node (see Figure 2(b)), whichis composed of a general-purpose processor (GP) and a dig-ital signal processor (DSP) from UPL, linked with a busfrom CCL, and interfacing to a wireless radio componentfrom CCL through a radio interface from NIL, the GP andDSP will share many common modules within UPL. Thevarious libraries will also share many modules of PCL, suchas the memory array primitive component which can doubleas bus queuing buffers for CCL as well as caches in UPL.

Our goal of reusing the components led to careful gener-alization of modules, so the same module can be param-eterized and plugged into substantially different systems.Figure 2(c) demonstrates how similar modules used to sim-

ulate a chip multiprocessor can now be extended to simu-late systems of a totally different scale - a petaflops multi-processor grid-in-a-box, with many GP modules from UPL,sophisticated network interface controllers from NIL, inter-connected with high-speed electrical or optical fabrics fromCCL, and glued with MPL modules such as cache coher-ence controllers.

The hierarchical and iterative refinement features of LSEare especially critical when we consider complex systems-of-systems such as that in Figure 2(d). Here, we envi-sion small sensor nodes peppered around an area, collectingand communicating data wirelessly back to coarser-grainnodes with chip multiprocessors that analyze and coordi-nate groups of sensors. Finally, analyzed data is aggre-gated back to a base camp where there are petaflops grids-in-a-box that performs computationally intensive tasks forcoordinating and controlling the nodes in the field. WithLSE, we can compose such a complex system hierarchi-cally from the subsystems built with components of the var-ious libraries. It also allows users to work at different levelsof abstraction, so a network architect can iteratively definethe wireless network component of CCL, perform detailedstudies, while keeping the rest of the system at a high levelof abstraction.

We will next detail the various component libraries, thechallenges faced in each, and discuss the current status andfuture work.

3.1 Primitive Component Library (PCL)

As we built early versions of UPL and CCL, we foundclear building blocks that are common across many librariessuch as queues and arbiters. These primitives can be read-ily leveraged while building the functional component li-braries, saving development time, maximizing reuse, andeasing debugging. An arbiter is an example of a primitivethat is readily used across various component libraries. Forinstance, the same arbiter module can be used in CCL tocontrol access to network buffers and links, and in UPL toregulate access to synchronization locks. The PCL has beenreleased with the support of the National Science Founda-tion along with LSE Version 1.0.

3.2 Uniprocessor Library (UPL)

The Uniprocessor Library (UPL) contains all the build-ing blocks for standard microprocessor models. UPL in-cludes basic buffering and queuing structures that can becustomized to model the main processor pipeline includingfunctional units, re-order buffers, instruction windows, andthe corresponding interconnections. It also includes manymodules that when composed hierarchically, can providecomplex components such as realistic cache configurations.

0-7695-2132-0/04/$17.00 (C) 2004 IEEE

Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS’04)

Page 6: Achieving Structural and Composable Modeling of …Achieving Structural and Composable Modeling of Complex Systems David I. August Sharad Malik Li-Shiuan Peh Princeton University Princeton,

On-chip Network

GP

NI

GP

NI

GP

NI

GP

NI

GP

NI

GP

NI

GP

NI

GP

NI

(a) Chip multi-processor

Wireless Network

GP DSP

Bus

NI

GPDSP

Bus

NI

......

Sensor node Sensor node

(b) Sensor network node

GPNI

board-to-board

interconnect

GPNI

GPNI

GPNI

GPNI

GPNI

(c) Grids-in-a-box

Chipmulti-

processors

Sensor nodes

WirelessNetwork

PetaflopsGriDs

WirelessNetwork

NI NI NI NI

(d) A complex system of systems

Figure 2. A diverse range of systems that LSE aims to support through a suite of component libraries.

We have built an extensive range of components in UPL,and have successfully modeled IA-64 and Alpha processors.These models are being prepared for release along with theUPL.

3.3 Communication Component Library (CCL)

Systems are becoming increasingly interconnected – Asimulation infrastructure that supports diverse communica-tion fabrics is critically needed. Orion [26], a CCL, wasproposed to address this need, targeting the communica-tion components of a wide array of systems, ranging fromon-chip networks in chip multi-processors, to electrical andoptical chip-to-chip and board-to-board fabrics in petaflopsgrids-in-a-box, to wireless fabrics in sensor networks.

The challenges of Orion lie mainly in three areas: model-ing of traffic workloads, development of component build-ing blocks that are generalizable to different domains, andcomponent attribute models that cover key design parame-ters in diverse applications.

An early version of Orion was developed, focusing onwired interconnection networks, supporting fabrics rang-ing from on-chip buses and networks for SoCs to chip-to-chip electrical backplanes for petaflop grids [26]. Now,in addition to dynamic power, Orion characterizes leakagepower [7] as well as the thermal impact of networks. Be-sides being used to model networks in multiprocessor sys-

tems, both on-chip and chip-to-chip, the basic componentsof Orion have been found to be applicable to interconnecteddistributed caches [3], as well as heterogeneous multi-coreSoCs [27], demonstrating its generality. We have also pro-posed and investigated various abstractions of different traf-fic patterns in mobile sensor networks. We plan to furtherextend Orion to a wider variety of communication fabrics,developing new attribute models for the design metrics thatare significant in diverse application domains.

3.4 Multiprocessor Component Library (MPL)

Multiprocessor architectures form one of the mostdifficult and important simulation domains for high-performance systems engineering. Modules from PCL andCCL form the foundation of a multiprocessor system sim-ulator. The additional complexities in multiprocessor sys-tem simulation stem from managing data replication, order-ing, and communication. The MPL includes the modularcomponents required for implementing a structural specifi-cation of a multiprocessor. These modules include DMAcontrollers (for simulating low-overhead message-passingsystems), pluggable cache coherence controllers (includ-ing bus-based snooping for small scale multiprocessors andpoint-to-point coherence transactions for scalable systems),and pluggable memory ordering controllers to restrict thereordering allowed by the processor according to desired

0-7695-2132-0/04/$17.00 (C) 2004 IEEE

Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS’04)

Page 7: Achieving Structural and Composable Modeling of …Achieving Structural and Composable Modeling of Complex Systems David I. August Sharad Malik Li-Shiuan Peh Princeton University Princeton,

constraints. Many of these components are sufficientlyflexible that they can be deployed in a variety of contextsand systems, ranging from chip multiprocessors to tightly-coupled systems to message-passing grids.

A hierarchical and component-based multiprocessorsimulator based on LSE serves as an ideal platform for re-search into multiprocessor simulation methodologies. Forexample, investigation of speed-enhancing techniques is es-sential because of the need to support large-scale applica-tions. Here, the reuse enabled by LSE eases the develop-ment of sampling versions of hierarchical modules, in addi-tion to allowing the incorporation and study of accelerationtechniques developed for the PCL. The support for multiplelevels of abstraction in LSE also allows for simulation ac-celeration by integrating a detailed simulator of some por-tions with analytical representations of other system com-ponents. Such abstraction may increase the applicability ofworkload-driven analytical models proposed for multipro-cessor performance evaluation [24].

We are currently porting the RSIM simulator releasedby Pai et al. to LSE as a base platform for developingcomponent-based multiprocessor simulators. RSIM’s mod-ular open-source design has enabled external users to addsubstantial functionality; our approach here is instead tobreak the simulator into Liberty components along the linesof its current modules and to formalize the interactions be-tween modules according to the Liberty model of compu-tation. Our prior experience with porting SimpleScalar toLiberty should help guide our development efforts in theseregards.

3.5 Network Interface Component Library (NIL)

Network interfaces bridge processors and fabrics, andmultiple networks, and are realized both in ASICs and morerecently, as programmable network interfaces. These de-vices translate between the formats understood on the ex-ternal network and the local interconnect; the most commonrealization is a network interface card (NIC) that translatesbetween Ethernet and PCI formats, implementing the arbi-tration policies of the PCI bus and the medium access poli-cies of Ethernet. Network interfaces have stringent spaceand power requirements, combined with little data reuse.Consequently, the primary techniques used in general pur-pose processors to improve performance – large caches andfaster clocks – are inapplicable. In contrast, multiprocessormicroarchitectures are particularly appealing for these pro-grammable network interfaces, since parallelism can allowfor performance without increasing clock speed.

Additionally, these devices have a heterogeneous set ofcomponents, including DMA and MAC assist logic. Theprocess of simulating these devices is further complicatedby the asynchronous nature of the I/O interactions with

both the network and the host. Some previous studies haveattempted performance simulation for these devices usingconventional processor simulators [8]. However, no workto date has provided a realistic simulation that accounts forthe special hardware features or software tasks supported bythese systems.

We are currently developing a network interface simula-tor, with an initial target of properly modeling the MIPS-based Tigon-2 programmable network interface chipset at alevel of detail sufficient to simulate the firmware that sup-ports its deployment as a Gigabit Ethernet interface [1, 2].This simulator development consists of two parallel tracks.One track focuses on bringing up a uniprocessor sufficientto run the desired firmware, adding support for the varioushardware assists and memory-mapped registers needed andcollecting the I/O traces of host and network traffic that willlater drive the simulation. The second track leverages theMPL to implement a scalable parallel programmable net-work interface and also works toward more aggressivelyparallelizing the target code. Such simulation efforts willboth allow the architectural exploration needed to reachnext-generation Ethernet speeds and facilitate the develop-ment of realistic models of other I/O devices. Clearly, de-velopment of the programmable network interface in NILwill leverage on modules of UPL and MPL.

4 Conclusions

Unlike traditional simulators, the Liberty Simulation En-vironment (LSE) automatically constructs simulators fromsystem descriptions that closely resemble the structure ofhardware. The well-defined component communication in-terfaces of LSE allow for the reuse and hierarchical con-figuration of components across systems, easing the explo-ration of a complex and diverse set of systems. The LSEsystem has been released. To get the most out of LSE, com-ponent libraries are to be released. These libraries targetdomains including microprocessors, multiprocessors, net-works, and programmable network interfaces, using bothdomain-independent and domain-specific modules.

By enabling varying levels of abstraction and a unifiedcomponent connection framework, LSE provides both idealsupport for education and for technology transfer. Stu-dents using LSE will be able to focus on system designconcepts and structural composition rather than the syntac-tic and functional composition of more common simulationschemes. Researchers may more easily release their mod-ules built with LSE both for collaboration in academia andfor technology transfer to industry, since the well-definedinterfaces of LSE enable these modules to be composed intoother LSE configurations with ease.

0-7695-2132-0/04/$17.00 (C) 2004 IEEE

Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS’04)

Page 8: Achieving Structural and Composable Modeling of …Achieving Structural and Composable Modeling of Complex Systems David I. August Sharad Malik Li-Shiuan Peh Princeton University Princeton,

Acknowledgments

We thank Kees Vissers, Timothy Kam, and FredericaDarema for their comments at various points throughout thedevelopment of LSE. This work has been supported by theNational Science Foundation (NGS-0305617). Opinions,findings, conclusions, and recommendations expressedthroughout this work are not necessarily the views of theNational Science Foundation.

References

[1] Alteon Networks. Tigon/PCI Ethernet Controller, Aug.1997. Revision 1.04.

[2] Alteon WebSystems. Gigabit Ethernet/PCI Network Inter-face Card: Host/NIC Software Interface Definition, July1999. Revision 12.4.13.

[3] B. M. Beckmann and D. Wood. Transmission line caches.In Proceedings of the International Symposium on Microar-chitecture, pages 43–55, December 2003.

[4] J. Buck, S. Ha, E. A. Lee, and D. G. Messerschmitt.Ptolemy: A framework for simulating and prototyping het-erogeneous systems. International Journal in ComputerSimulation, 4:155–182, 1994.

[5] D. Burger and T. M. Austin. The SimpleScalar tool setversion 2.0. Technical Report 97-1342, Department ofComputer Science, University of Wisconsin-Madison, June1997.

[6] H. W. Cain, K. M. Lepak, B. A. Schwartz, and M. H. Lipasti.Precise and accurate processor simulation. In Proceedingsof the Fifth Workshop on Computer Architecture Evaluationusing Commercial Workloads, February 2002.

[7] X. Chen and L.-S. Peh. Leakage power modeling and opti-mization of interconnection networks. In Proceedings of theInternational Symposium on Low Power and Energy Design,pages 90–95, August 2003.

[8] P. Crowley, M. Fiuczynski, J.-L. Baer, and B. Bershad. Char-acterizing Processor Architectures for Programmable Net-work Interfaces. In Proceedings of the 14th InternationalConference on Supercomputing, pages 54–65, May 2000.

[9] R. Desikan, D. Burger, and S. W. Keckler. Measuring ex-perimental error in microprocessor simulation. Proceedingsof the 28th International Symposium on Computer Architec-ture, July 2001.

[10] J. Emer, P. Ahuja, E. Borch, A. Klauser, C.-K. Luk,S. Manne, S. S. Mukherjee, H. Patil, S. Wallace, N. Binkert,R. Espasa, and T. Juan. Asim: A performance model frame-work. IEEE Computer, 0018-9162:68–76, February 2002.

[11] J. Gibson, R. Kunz, D. Ofelt, M. Horowitz, J. Hennessy,and M. Heinrich. FLASH vs. (simulated) FLASH: Clos-ing the simulation loop. In Proceedings of the 9th Inter-national Conference on Architectural Support for Program-ming Languages and Operating Systems (ASPLOS), pages49–58, November 2000.

[12] A. Girault, B. Lee, and E. A. Lee. Hierarchical finite statemachines with multiple concurrency models. IEEE Transac-tions on Computer Aided Design of Integrated Circuits andSystems, 18(6):742–760, June 1999.

[13] A. Halambi, P. Grun, V. Ganesh, A. Khare, N. Dutt, andA. Nicolau. EXPRESSION: A language for architecture ex-ploration through compiler/simulator retargetability. In Pro-ceedings of the European Conference on Design, Automa-tion and Test (DATE), March 1999.

[14] C. J. Hughes, V. S. Pai, P. Ranganathan, and S. V. Adve.Rsim: Simulating Shared-Memory Multiprocessors withILP Processors. IEEE Computer, 35(2):40–49, Feb. 2002.

[15] E. A. Lee and A. Sangiovanni-Vincentelli. Comparing mod-els of computation. In Proceedings of ICCAD, November1996.

[16] P. Mishra, N. Dutt, and A. Nicolau. Functional abstrac-tion driven design space exploration of heterogeneous pro-grammable architectures. In Proceedings of the Interna-tional Symposium on System Synthesis (ISSS), pages 256–261, October 2001.

[17] P. Murthy. Modeling and design of re-active systems, 1997. Presentation. URL:http://ptolemy.eecs.berkeley.edu/presentations/97/rasspfinal.pdf.

[18] S. Onder and R. Gupta. Automatic generation of microar-chitecture simulators. In Proceedings of the IEEE Interna-tional Conference on Computer Languages, pages 80–89,May 1998.

[19] Open SystemC Initiative (OSCI). Functional Specificationfor SystemC 2.0, 2001. http://www.systemc.org.

[20] V. S. Pai, P. Ranganathan, and S. V. Adve. RSIM ReferenceManual, Version 1.0. Electrical and Computer EngineeringDepartment, Rice University, Aug. 1997. Technical Report9705.

[21] S. Pees, A. Hoffmann, V. Zivojnovic, and H. Meyr. LISA– machine description language for cycle-accurate modelsof programmable DSP architectures. In Proceedings of theACM/IEEE Design Automation Conference (DAC), pages933–938, 1999.

[22] D. Penry and D. I. August. Optimizations for a simu-lator construction system supporting reusable components.In Proceedings of the 40th Design Automation Conference,June 2003.

[23] C. Siska. A processor description language supporting re-targetable multi-pipeline dsp program development tools. InProceedings of the 11th International Symposium on SystemSynthesis (ISSS), Dec. 1998.

[24] D. J. Sorin, V. S. Pai, S. V. Adve, M. K. Vernon, and D. A.Wood. Analytic Evaluation of Shared-Memory Systemswith ILP processors. In Proceedings of the 25th AnnualInternational Symposium on Computer Architecture, pages380–391, June 1998.

[25] M. Vachharajani, N. Vachharajani, D. A. Penry, J. A. Blome,and D. I. August. Microarchitectural exploration with Lib-erty. In Proceedings of the 35th International Symposium onMicroarchitecture, pages 271–282, November 2002.

[26] H.-S. Wang, X. Zhu, L.-S. Peh, and S. Malik. Orion: APower-Performance Simulator of Interconnection Networks.In Proceedings of the 35th International Symposium on Mi-croarchitecture, November 2002.

[27] T. T. Ye, L. Benini, and G. D. Micheli. Packetized on-chipinterconnect communication analysis for mpsoc. In Pro-ceedings of Design Automation and Test in Europe, pages344–349, March 2003.

0-7695-2132-0/04/$17.00 (C) 2004 IEEE

Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS’04)