the microgrid: a scientific tool for modeling computational

16
127 The MicroGrid: A scientific tool for modeling Computational Grids H.J. Song, X. Liu, D. Jakobsen, R. Bhagwan, X. Zhang , K. Taura ∗∗ and A. Chien Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA 92093, USA E-mail: [email protected] The complexity and dynamic nature of the Internet (and the emerging Computational Grid) demand that middleware and applications adapt to the changes in configuration and avail- ability of resources. However, to the best of our knowledge there are no simulation tools which support systematic explo- ration of dynamic Grid software (or Grid resource) behavior. We describe our vision and initial efforts to build tools to meet these needs. Our MicroGrid simulation tools en- able Globus applications to be run in arbitrary virtual grid resource environments, enabling broad experimentation. We describe the design of these tools, and their validation on micro-benchmarks, the NAS parallel benchmarks, and an en- tire Grid application. These validation experiments show that the MicroGrid can match actual experiments within a few percent (2% to 4%). 1. Introduction The explosive growth of the Internet and its use in computing, communication, and commerce have made it an integral and critical infrastructure of our soci- ety. The Internet’s increasing capability has created excitement about a vision for the next-generation In- ternet which enables seamless integration of comput- ing, storage, and communication into the Computa- tional Grid. While demonstrations on large scale test- beds using software infrastructures such as Globus [6] and Legion [7] highlight the potential of computational grids to enable large-scale resource pooling and sharing (compute, communicate, storage, information) in het- Corresponding author. E-mail: [email protected]. ∗∗ Also affiliated to University of Tokyo, Japan. erogeneous environments, significant challenges in the design of middleware software, application software, and even Grid hardware configuration remain. For in- stance, Internet/Grid environments exhibit extreme het- erogeneity of configuration, performance, and reliabil- ity. Consequently, software must be flexible and adap- tive to achieve either robustness or even occasionally good performance. To put it directly, we have no sys- tematic way to study the dynamics of such software to evaluate its effectiveness, robustness, or impact on Grid system stability. We believe that one critical chal- lenge facing the computer systems community is to un- derstand the decidedly non-linear dynamics of Inter- net/Grid scale systems. The current practice is to per- form actual test-bed experiments (at moderate scale) or simplistic simulations that are not validated. At UCSD, we are pursuing a research agenda which explores how to simulate and model large-scale Grid structures 1 – applications, services, and resource infrastructure 2 . Our objective is to develop a set of simulation tools called the MicroGrid that enable sys- tematic design and evaluation of middleware, applica- tions, and network services for the Computational Grid. These tools will provide an environment for scientific and repeatable experimentation. It is our hope that the MicroGrid will catalyze the development of experimen- tal methodologies to robustly extrapolate grid simula- tion results and rational grid design and management. To achieve these goals, at a minimum, Grid simulation tools must support realistic grid software environments, modeling of a wide variety of resources, and scalable performance. We have designed an initial set of MicroGrid tools which allow researchers to run Grid applications on virtual Grid resources, allowing the study of complex dynamic behavior. In this paper, we describe our initial 1 The ultimate simulation goals are networks of 10 to 100 million entities. 2 The development of these tools is part of the NSF funded Grid Application Development Software (GrADS) project, led by Ken Kennedy at Rice University (http://hipersoft.cs.rice.edu/grads/). Scientific Programming 8 (2000) 127–141 ISSN 1058-9244 / $8.00 2000, IEEE. Reprinted with permission from Proceedings of IEEE Supercomputing 2000, 4–10 November 2000, Dallas, Texas, USA.

Upload: others

Post on 25-Mar-2022

4 views

Category:

Documents


0 download

TRANSCRIPT

127

The MicroGrid: A scientific tool formodeling Computational Grids

H.J. Song, X. Liu, D. Jakobsen, R. Bhagwan,X. Zhang∗, K. Taura∗∗ and A. ChienDepartment of Computer Science and Engineering,University of California, San Diego, La Jolla, CA92093, USAE-mail: [email protected]

The complexity and dynamic nature of the Internet (and theemerging Computational Grid) demand that middleware andapplications adapt to the changes in configuration and avail-ability of resources. However, to the best of our knowledgethere are no simulation tools which support systematic explo-ration of dynamic Grid software (or Grid resource) behavior.

We describe our vision and initial efforts to build toolsto meet these needs. Our MicroGrid simulation tools en-able Globus applications to be run in arbitrary virtual gridresource environments, enabling broad experimentation. Wedescribe the design of these tools, and their validation onmicro-benchmarks, the NAS parallel benchmarks, and an en-tire Grid application. These validation experiments show thatthe MicroGrid can match actual experiments within a fewpercent (2% to 4%).

1. Introduction

The explosive growth of the Internet and its use incomputing, communication, and commerce have madeit an integral and critical infrastructure of our soci-ety. The Internet’s increasing capability has createdexcitement about a vision for the next-generation In-ternet which enables seamless integration of comput-ing, storage, and communication into the Computa-tional Grid. While demonstrations on large scale test-beds using software infrastructures such as Globus [6]and Legion [7] highlight the potential of computationalgrids to enable large-scale resource pooling and sharing(compute, communicate, storage, information) in het-

∗Corresponding author. E-mail: [email protected].∗∗Also affiliated to University of Tokyo, Japan.

erogeneous environments, significant challenges in thedesign of middleware software, application software,and even Grid hardware configuration remain. For in-stance, Internet/Grid environments exhibit extreme het-erogeneity of configuration, performance, and reliabil-ity. Consequently, software must be flexible and adap-tive to achieve either robustness or even occasionallygood performance. To put it directly, we have no sys-tematic way to study the dynamics of such softwareto evaluate its effectiveness, robustness, or impact onGrid system stability. We believe that one critical chal-lenge facing the computer systems community is to un-derstand the decidedly non-linear dynamics of Inter-net/Grid scale systems. The current practice is to per-form actual test-bed experiments (at moderate scale) orsimplistic simulations that are not validated.

At UCSD, we are pursuing a research agendawhich explores how to simulate and model large-scaleGrid structures1 – applications, services, and resourceinfrastructure2. Our objective is to develop a set ofsimulation tools called the MicroGrid that enable sys-tematic design and evaluation of middleware, applica-tions, and network services for the Computational Grid.These tools will provide an environment for scientificand repeatable experimentation. It is our hope that theMicroGrid will catalyze the development of experimen-tal methodologies to robustly extrapolate grid simula-tion results and rational grid design and management.To achieve these goals, at a minimum, Grid simulationtools must support realistic grid software environments,modeling of a wide variety of resources, and scalableperformance.

We have designed an initial set of MicroGrid toolswhich allow researchers to run Grid applications onvirtual Grid resources, allowing the study of complexdynamic behavior. In this paper, we describe our initial

1The ultimate simulation goals are networks of 10 to 100 millionentities.

2The development of these tools is part of the NSF funded GridApplication Development Software (GrADS) project, led by KenKennedy at Rice University (http://hipersoft.cs.rice.edu/grads/).

Scientific Programming 8 (2000) 127–141ISSN 1058-9244 / $8.00 2000, IEEE. Reprinted with permission from Proceedings of IEEE Supercomputing 2000, 4–10 November 2000,Dallas, Texas, USA.

128 H.J. Song / The MicroGrid: A scientific tool for modeling Computational Grids

design, implementation, and validation of the Micro-Grid tools. First, these tools enable the use of Globusapplications without change by virtualizing the execu-tion environment, providing the illusion of a virtualGrid. Thus, experimentation with a wide variety ofexisting Grid applications is feasible. Second, thesetools manage heterogeneous models in the virtual grid,using a global virtual time model to preserve simu-lation accuracy. Third, the MicroGrid provides basicresource simulation models for computing, memory,and networking. We describe the implementation ofthese elements, and validate them at the following threelevels.

– Micro-benchmarks Individual resource model testswhich show these models to be accurate.

– Parallel benchmarks Use of the NAS ParallelBenchmark suite which shows the combined ac-curacy of the MicroGrid for modeling system andapplication performance.

– An Application A complex application program(CACTUS PDE solver) which further validates theutility and fidelity of the MicroGrid models.

Experiments with MicroGrids on the NPB Class Adata sets ultimately match within 2 to 4% while main-taining high execution efficiency. In addition, we per-form an internal validation showing that the executionat each time step within a MicroGrid simulation closelyfollows that in an actual execution, varying only from3 to 8%. This internal validation is achieved usingthe Autopilot [17] tools, and running them within theMicroGrid environment. Finally, experiments with anentire application (CACTUS), match closely as well,within 5 to 7%.

These validations form an important first step inbuilding a set of tools which can support large scaleexperimentation with software for the ComputationalGrid. However, while we are pleased with our initialprogress, significant challenges remain. To achieve thelarger vision of Grid simulation, significant advancesin scalability, precision of modeling and network trafficmodeling must be achieved.

The remainder of this paper is organized as follows.Section 2 describes the elements of the MicroGrid andtheir implementation in our prototype. Experimentsand validation of our basic system can be found in Sec-tion 3. Section 4 discusses related work, and Section 5summarizes our results and points out new directions.

2. MicroGrid architecture

2.1. Overview

The basic functionality of the MicroGrid is to allowGrid experimenters to simulate their applications on avirtual Grid environment. The simulation tool shouldbe made flexible to allow accurate experimentation onheterogeneous physical resources, as shown in Fig. 1.Our MicroGrid implementation supports Grid applica-tions which use the Globus Grid middleware infrastruc-ture.

The basic architecture of the MicroGrid is shown inFig. 2, and each element in the figure corresponds to oneof the major challenges in constructing a high-fidelityvirtual Grid. These challenges are as follows:

Virtualization The application must perceive only thevirtual Grid resources (host names, networks), in-dependent of the physical resources being utilized.

Global Coordination To provide a coherent globalsimulation of potentially different numbers ofvarying virtual resources on heterogeneous phys-ical resources, global coordination of simulationprogress is required.

Resource Simulation Each virtual resource (host, cpu,network, disk, etc.) must be modeled accuratelyas an element of the overall simulation.

Our approach towards overcoming these challengesis discussed in the following subsections.

2.2. Virtualization

To provide the illusion of a virtual Grid environment,the MicroGrid intercepts all direct uses of resourcesor information services made by the application. Inparticular, it is necessary to mediate over all opera-tions which identify resources by name either to use orretrieve information about them.

2.2.1. Virtualizing resourcesIn general, the MicroGrid needs to virtualize pro-

cessing, memory, networks, disks, and any other re-sources being used in the system. However, since theoperating system in modern computer systems effec-tively virtualizes each of these resources – providingunique namespaces and seamless sharing – the majorchallenge is to virtualize host identity. In the Micro-Grid, each virtual host is mapped to a physical machineusing a mapping table from virtual IP address to physi-cal IP address. All relevant library calls are interceptedand mapped from virtual to physical space using thistable. These library calls include:

H.J. Song / The MicroGrid: A scientific tool for modeling Computational Grids 129

Virtual Grid

Grid Software: Applications and Services

LAN Workgroup Scalable Cluster

Simulation Computational Power

MicroGrid Software

Researcher’s Controls

Researcher’s Displays

Fig. 1. Illustration of a user experimenting with a MicroGrid system to explore a range of virtual resources and executing on a range of physicalresources.

Local R esource

Local R esources

Networks

Vi rtual Grid Interfa

Vi rtual Grid Interface

Vi rtual Grid Interface

Vi rtual Grid Interface

Vi rtual Grid Interface

Virtual Grid Resources

Local Resource Simulation

Local R esource

Local R esources

Networks

Vi rtual Grid Interface

Vi rtual Grid Interface

Vi rtual Grid Interface

Vi rtual Grid InterfaceVi rtual Grid Interface

Vi rtual Grid InterfaceVi rtual Grid Interface

Virtual Grid Resources

Local Resource Simulation

Virtual Grid Resources

Local Resource Simulation

Fig. 2. MicroGrid simulator diagram: the local resource simulations provide a virtual Grid resource environment and the network simulatorcaptures the interaction between local virtualized resources.

– gethostname()– bind, send, receive (e.g. socket libraries)– process creation3

By intercepting these calls, a program can run trans-parently on a virtual host whose hostname and IP ad-dress are virtual. The program can only communi-cate with processes running on other virtual Grid hosts.Many other program actions which utilize resources(such as memory allocation) only name hosts implic-itly, and thus do not need to be changed. A user ofthe MicroGrid will typically be logged in directly on aphysical host and submit jobs to a virtual Grid. Thus,the job submission must cross from the physical re-sources domain into the virtual resources domain. Forthe Globus middleware, our current solution is to runall gatekeeper, jobmanager and client processes on vir-tual hosts. Thus jobs are submitted to virtual servers

3We currently capture processes created through the Globus re-source management mechanisms, but not those created via othermechanisms.

through the virtual Grid resource’s gatekeeper. We canrun any socket-based application on the virtual Grid asthe MicroGrid completely virtualizes the socket inter-face.

2.2.2. Virtualizing information servicesInformation services are critical for resource discov-

ery and intelligent use of resources in ComputationalGrids. Since the MicroGrid currently supports Globus,this problem amounts to virtualization of the GlobusGrid Information Service (GIS).

Desirable attributes of a virtualized GIS include:

– Compatibility: virtualized information should beused as before by all programs

– Identification and Grouping: easy identificationand organization of virtual Grid entries should beprovided

– Use of identical information servers: there shouldbe no incompatible change in the entries

Our approach achieves all of these attributes by ex-tending the standard GIS LDAP records with fields

130 H.J. Song / The MicroGrid: A scientific tool for modeling Computational Grids

containing virtualization-specific information. Specif-ically, we extend records for compute and network re-sources. Extension by addition ensures subtype com-patibility of the extended records (a la Pascal, Modula-3, or C++). The added fields are designed to supporteasy identification and grouping of the virtual Grid en-tries (there may be information on many virtual Gridsin a single GIS server). Finally, all of these recordsare placed in the existing GIS servers — no additionalservers or daemons are needed. Figure 3 shows an ex-ample of the extensions to the basic host and networkGIS records:

2.3. Global coordination

Achieving a balanced simulation across distributedresources requires global coordination. Based on thedesired virtual resources and physical resources em-ployed (CPU capacity and network bandwidth/latency),the virtual time module can determine the maximumfeasible simulation rate.

The method for calculating the maximum feasiblesimulation rate is explained here, starting with a fewdefinitions. The simulation rate (SR) is defined foreach resource type r as the value

SRr = (specification of physical resource r)/

(Σ specification of the virtual resources

of type r mapped to this physical

resource)

The specifications are parameters such as CPUspeed, network bandwidth, reciprocal of propagationdelay, etc.4 The significance of the SR can be explainedin the following way. Suppose a process running on thephysical resource takes x time to complete. Then, sim-plistically, it can be said that the same process wouldtake x ∗ SR time to complete on the virtual resource.

The maximum value of SR over all the resourcesrepresents the fastest rate at which the simulation canbe run in a functionally correct manner, and is there-fore termed the maximum feasible simulation rate. Noresource should be allowed to work “faster” than thisrate though it can since this would lead to incorrectresults. This global coordination mechanism for the

4The parameters need to be expressed such that a higher valuesignifies faster execution. For example, if the CPU speed of a physicalresource is 100 MIPS and the CPU speed of the virtual resource is200 MIPS, the SR is 0.5.

rate of simulation over all available resources ensuresaccurate performance analysis of the processes run onthe MicroGrid.

Virtualizing Time Many Grid programs monitor theirprogress by calling system routines for time (e.g. get-timeofday()), and if their correct behavior is to be main-tained, the illusion of virtual time must be provided.Using the chosen simulation rate,we have implementeda virtualization library which returns appropriately ad-justed times to the system routines to provide the illu-sion of a virtual machine at full speed.

2.4. Resource simulation

Within the MicroGrid simulation, each of the Gridresources must also be simulated accurately, providereal-time performance feedback to the simulation, andbe simulated at the rate at which virtual time is allowedto progress. While ultimately many resources may becritical, we initially focus on two resource types —computing and communication.

2.4.1. Computing resource simulationFor Grid applications, many of which are compute

intensive, accurate modeling of CPU speed is critical.Therefore, our first simulation module focuses on sim-ulating the execution speed of various processors. Foreach virtual host, the speed of its processor is stored as aGIS attribute on the host record (see Fig. 3). Given thevirtual host CPU speed, the physical processor speedcan be used to calculate the simulation rate, which inthis case yields an actual CPU fraction which should beallocated for this Grid simulation. This CPU fractionis then divided across each process on a virtual host.The resulting fractions are then enforced by the localMicroGrid CPU scheduler.

The local MicroGrid CPU scheduler is a schedulerdaemon which uses signals [8] to allocate the localphysical CPU capacity to local MicroGrid tasks. Thecurrent scheduler uses a round-robin algorithm (seeFig. 4), and a quantum of 10 milliseconds as supportedby the Linux timesharing scheduler.

A challenge for this scheduler is to achieve exactlythe desired simulation rate, as against many multimediasystems, where the objective is to achieve a minimumCPU fraction for interactive response.

H.J. Song / The MicroGrid: A scientific tool for modeling Computational Grids 131

Fig. 3. Example virtual GIS host and GIS network records.

Fig. 4. The local CPU scheduler algorithm.

2.4.2. Network simulationIn Grid applications, network behavior is often a crit-

ical element of performance [13,22]. As such, the ba-sic requirements for network simulation include pre-cise modeling of arbitrary network structure and onlinesimulation (conveyance of the communication traffic tothe right destination with the right delay). Ultimately,scalability to the extreme (tens of millions of networkelements) is an important requirement. These require-ments for network simulation provide one of the keychallenges for the ultimate viability of the MicroGridfor large-scale modeling.

We have explored VINT/NSE, PDNS, and DaSSFand no existing network simulator fully satisfies the Mi-croGrid’s requirements. For our initial MicroGrid pro-totype, we have chosen to modify the real-time versionVINT/NSE (an online simulator), and have integrated

it into the MicroGrid. Due to its better scalability, weare currently exploring the use of DaSSF for our nextMicroGrid.

The VINT/NSE simulation system allows definitionof an arbitrary network configuration. Our MicroGridsystem reads desired network configuration files andinputs a network configuration for NSE according tothe virtual network information in the GIS. The net-work simulator (NSE) is then connected to our virtualcommunication infrastructure, and thereby mediates allcommunication. Of course, NSE delivers the commu-nications to each destination according to the networktopology at the expected time.

While this system works well, as shown in Section 3,VINT/NSE still presents several significant challengesfor use in a MicroGrid system:

132 H.J. Song / The MicroGrid: A scientific tool for modeling Computational Grids

– NSE is a best-effort simulator, delivering the mes-sages in close to real-time. If it cannot achievereal-time, it simply gives a warning and continuesat the best it can.

– NSE requires an unpredictable amount of com-putational effort to simulate a network and traf-fic pattern. As a result, determining a safe globalsimulation rate is difficult.

– NSE performs detailed simulation, with high over-head.

– NSE is a sequential simulator, and therefore doesnot scale up to large simulations well.

3. Experiments and validation

We are executing a series of experiments designedprimarily to validate the MicroGrid simulation tools.These experiments can be categorized as follows:

1. Micro-benchmarks of MicroGrid simulation mod-els

2. Benchmark MPI Programs (NAS Parallel Bench-marks)

3. Application programs (CACTUS)

These experiments test successively large parts ofthe MicroGrid simulation, and with successively morecomplex application behavior. With a solid validationof the tools, the range of interesting experiments isvirtually unbounded.

3.1. Experimental environment and methodology

All of the experiments were run on a cluster of533 MHz 21164 Compaq Alpha machines with 1 GBmain memory each, and connected by a 100 Mbit Eth-ernet. The software used is Globus 1.1 run on top ofthe MicroGrid layer. For each type of benchmark andexperiment, we report different metrics.

3.2. Micro-benchmarks

We have performed extensive compute node simu-lation and network simulation experiments. We usesome micro-benchmarks to validate the accuracy of themodels.

3.2.1. Memory capacity modelingThe following test is performed with the goal of

verifying the scheduler’s ability to enforce the memorylimitation as specified when assigning a process to avirtual machine.

A MicroGrid process is initialized on the schedulerand a process, and that process which constantly allo-cates memory until it generates an out of memory er-ror. The test is repeated with various memory limits forthe process and the maximum amount of memory suc-cessfully allocated is logged for each repeat of the test.Figure 5 shows the result of the tests where memorylimitations from 1 KB to 1 MB are specified.

As shown in Fig. 5, there is a clear linear correlationbetween the memory limit and the amount of memoryaccessible by the process. In each case, the processcould allocate about 1KB less then the specified mem-ory limitation. This is due to memory overhead for theprocess.

3.2.2. Processor speed modelingThis test focuses on how precisely the MicroGrid

processor scheduler maintains the processing model inthe presence of CPU competition on the physical ma-chine. This test is done in several steps: For each stepa virtual machine of a given speed is created on thescheduler, a process performing a fixed computation isscheduled on that virtual machine, and after the processcompletes, the virtual machine is deleted. This test isrepeated with various speeds for the virtual machinesand with competition in form of CPU intensive andIO intensive processes running in parallel with the Mi-croGrid scheduler. The process scheduled on the vir-tual machine is a reference process performing a fixedCPU-intensive computation (i.e. the process is neverblocked) work and its execution speed running as theonly process on the physical CPU.

The three versions of the processor microbenchmarkare:

– No Competition: This is used as a reference. Dur-ing this test the scheduler is running as the onlyprocess on the CPU.

– CPU Competition: In parallel with the schedulerdaemon, a computationally intense process is ex-ecuting. The computationally-intensive processdoes floating-point divisions continuously.

– IO Competition: In parallel with the schedulerdaemon, an IO intensive process is executing. TheIO intensive process used during this test continu-ously flushes a 1MB buffer to disk.

H.J. Song / The MicroGrid: A scientific tool for modeling Computational Grids 133

0100200300400500

600

700800900

1000

0 200 400 600 800 1000

Specified memory limit [kb]

MaximumAllocatedmemory

Fig. 5. Memory microbenchmark.

102030405060

708090

100

10 20 30 40 50 60 70 80 90 100

Specified CPU fraction [%]

Fra

ctio

no

fC

PU

[%]

No Competition IO Competition CPU Competition

Fig. 6. Processor microbenchmark.

From Fig. 6 it is seen that the MicroGrid scheduleris able to schedule the reference process according tothe specified speed for the virtual machine across awide range of speeds up to 95% of the total CPU ca-pacity. Both the operating system and the scheduler’sCPU usage form this upper boundary. For the IO com-petition and CPU competition tests, the results showthat above a specified CPU speed of 40%, the virtualmachine does not deliver the specified CPU fraction.This inadequacy is caused by the characteristics of thetime-sharing scheduling policy in Linux.

The following test is performed with the goal of char-acterizing the stability of the scheduled quantum size.We modified the MicroGrid scheduler daemon to col-lect performance data, logging the time slice size as-signed to each process. The test consists of three ses-sions, producing about 9000 samples, corresponding toabout 90 seconds of test. The process that actually runson the MicroGrid during this test is an inactive processthat constantly sleeps.

Figure 7 shows that with no competition, when theMicroGrid scheduler is running as the only process onthe CPU, the CPU time allocated to a process is almostthe same as the virtual time specified by the user. Thistest also shows that computational intensive processesrunning in parallel with the MicroGrid scheduler did nothave any influence on the performance of the scheduler.Extreme IO competition did affect resulting quantumsize, but not severely.

3.2.3. NSE network modelingIn this experiment we use MPI bandwidth and la-

tency benchmarks to test the accuracy of the simulatednetwork performance. These benchmarks run on twovirtual nodes connected by virtual networks (a 100 MbEthernet). The results are compared with outputs fromthe real system. We can see that the simulated networkhas the similar characteristics with the real system.

134 H.J. Song / The MicroGrid: A scientific tool for modeling Computational Grids

Normalized Time Slice Distributing

00.10.20.30.40.50.60.7

0.865

0.89

0.915 0.9

40.9

65 0.99

1.015 1.0

41.0

65 1.09

1.115 0

time slice

Freq

uen

cy

NO-Competition: Mean=1.000, Dev=0.002

CPU-Competition: Mean=1.01, Dev=0.015

IO_Competition:Mean=0.978,Dev=0.027

:

Fig. 7. Distribution of quanta sizes, normalized to a mean of unity.

Latency

05000

100001500020000250003000035000

4 16 64 256

1024

4096

1638

4

65536

2621

44

message size (byte)

late

ncy

(us)

Ethernet

Mgrid

Bandwidth

01020304050

6070

80

4 16 64 256

1024

4096

1638

4

6553

6

2621

44

m essage siz e (byte)

ba

ndw

idth

(MB

/s)

Ethernet

Mgrid

Fig. 8. NSE network modeling.

3.3. Benchmarks

We used the NPB (NAS Parallel Benchmarks) 2.3applications [19] to perform a more extensive valida-tion of the MicroGrid tools, encompassing not onlythe compute node and network models, but also theirintegration into a unified simulation.

3.4. Total run time

3.4.1. Machine configurationsOur first goal was to validate the MicroGrid sim-

ulator for the NAS Parallel Benchmarks across arange of machine configurations. To do so, we ranall of the benchmarks across a range of processorand network configurations matching those for whichperformance data is available on the NPB web site(http://www.nas.nasa.gov/Software/NPB/). We stud-ied the virtual Grid configurations shown in Fig. 9.

A bar graph comparing the actual run times withthose simulated on the MicroGrid is shown in Fig. 10.As shown in the figure, the MicroGrid matches IS, LU,and MG within 2%. For EP and BT, the match isslightly worse, but still quite good, within 4%. Be-cause the simulation tracks the performance differencesacross configurations fairly well, we consider the Mi-croGrid validated for these applications.

In searching to explain the slight mismatches in per-formance, microscopic analysis of these programs be-havior revealed that the MicroGrid’s current 10 mil-lisecond scheduling quantum was introducing model-ing error, when processes are synchronized at shorterintervals.

To test this hypothesis, we explored a range of quan-tum sizes for each of the benchmark applications, andused the Class S (small) data sets, which would exacer-bate the inaccuracies introduced by the quanta size. The

H.J. Song / The MicroGrid: A scientific tool for modeling Computational Grids 135

Fig. 9. Virtual grid configurations studied.

NPB for Alpha Cluster

0

50

100

150

200

250

300

350

400

EP BT LU MG IS

Exe

cuti

on

tim

e

Pgrid

Mgrid (vc)

NPB for HPVM

0

50

100

150

200

250

300

350

400

450

EP BT LU MG IS

Exe

cuti

on

tim

e pgrid

mgrid (vc)

Fig. 10. NPB benchmark performance (Class A), comparing physical machine runs with equivalent MicroGrid simulations.

results from these experiments are shown in Fig. 11,and demonstrate that varying the time quanta can sig-nificantly improve the modeling precision.

For the MG, BT, LU, and EP benchmarks on theClass S benchmarks, the best quantum sizes for match-ing are 2.5 ms, 5 ms, 2.5 ms, and 10 ms respectively.Without exception, the benchmarks that synchronizefrequently match better with shorter time quanta. Thematches with the best time quanta in each case are quiteclose, with differences of 12%, 0.6%, 0.4%, and 1.3%respectively. Clearly, for codes that communicate andsynchronize frequently, scheduling with smaller timequanta is an important feature of future MicroGrids.

3.4.2. Additional experimentsThe MicroGrid can also be used to extrapolate likely

performance on systems not directly available, or thoseof the future. For example, a MicroGrid simulation canbe used to explore the future implications of technologyscaling at different rates – network, memory, and pro-cessor speed – for application or system software. Forexample, Fig. 12 explores the impact of faster proces-sors, holding network performance constant (1 Mbpsand 50 millisecond latency). For these benchmarks,significant speedups can be achieved solely based onincreases in processor speed.

The MicroGrid can also be used to study systemswith complex network topologies and structures (aswill be common in many Grid systems). For example,a MicroGrid simulation can model the effect of spread-ing an application over a wide-area network testbedsuch as the United States National Science Founda-tion’s vBNS testbed [21] (see Fig. 13). We study thepossibility of executing the NAS parallel benchmarksover a fictional vBNS distributed cluster testbed, vary-ing the speed of the WAN network links. Our exper-iment uses 4-process NPB jobs with two processes atUCSD (in CSE department LAN) and two processesat UIUC (on the CS department LAN). Thus, the pathfrom one cluster to another traverses LAN, OC3, andOC12 links as well as several routers (finer detail suchas switches and hubs are not modelled). In the experi-ment, we vary a bottleneck link from 10 Mb/s to OC12,producing the NPB benchmark performance shown inFig. 14.

The results (see Fig. 14) show that the performance ofthe NAS parallel benchmarks distributed over a wide-area coupled cluster is only mildly sensitive to networkbandwidth. With the exception of EP, the latency ef-fects dominate, producing poor performance for nearlyall network bandwidths. This confirms the commonlyheld perspective that Grid applications need be latencytolerant to achieve good performance.

136 H.J. Song / The MicroGrid: A scientific tool for modeling Computational Grids

0

5

10

15

20

MG (class S) BT (class S) LU (class S) EP (class S)

Tota

l ru

nti

me(

in s

ec)

Physical grid

Mgrid (slice=2.5ms)

Mgrid (slice=5ms)

Mgird (slice=10ms)

Mgrid (slice=30ms)

Fig. 11. The effect of varying the scheduling quanta length on MicroGrid modeling accuracy (NPB Class S benchmarks).

0

0.2

0.4

0.6

0.8

1

1.2

MG BT LU EP

No

rmal

ized

Rat

io

1x CPU

2x CPU

4x CPU

8x CPU

Fig. 12. Total run times varying only the virtual cpu.

The MicroGrid can be run at a variety of actualspeeds, yet yield identical results in virtual Grid time.This capability is useful in sharing simulation re-sources, or in interfacing the MicroGrid simulationswith external resources for which we may not be ableto control rates precisely. For example, slowing theprocessor and network simulations can be used to makea slow disk seem much faster, or an external sensordata rate much higher (to model a future higher speedsensor). The results in Fig. 15 show how the fidelity ofsimulation varies as the speed of simulation varies overan order of magnitude.

3.5. Applications

The largest validation tests must include full-blownapplication programs. We have studied and validatedthe MicroGrid simulation on the CACTUS code –http://www.cactuscode.org/ – a flexible parallel PDEsolver.

Cactus [1] is an open source problem solving envi-ronment designed for scientists and engineers. It orig-inated in the academic research community, where itwas developed and used over many years by a large

international collaboration of physicists and computa-tional scientists.

To validate the MicroGrid, we again use a modelof DEC Alpha machines in a cluster. We present runtimes for CACTUS running WaveToy on this testbedand compare to runs on the MicroGrid with appropri-ate processor and network speed settings. These re-sults (see Fig. 16), show excellent match, within 5 to7%. Thus, a micro-benchmark, full blown benchmarksuite, and entire application program validation hasbeen done for the MicroGrid tools, demonstrating itsaccurate modeling capabilities for full program runs.

3.6. Internal performance

A more demanding validation of the MicroGridmatches the internal application behavior and timing ofeach run with that on an actual machine. To make thiscomparison, we used identical performance tools (theAutopilot system [17]) and monitored the NAS Paral-lel Benchmarks. Autopilot enables the tracking of thevalues of certain types of program variables over theexecution of the program. The resulting data, normal-

H.J. Song / The MicroGrid: A scientific tool for modeling Computational Grids 137

Fig. 13. Our fictional distributed cluster testbed uses the vBNS and models the network using the MicroGrid simulator.

ized over execution time of the programs are shown inFig. 17.

The graph shows the the changing value of a periodicfunction of counter variables in EP, BT and MG withtime for class A data sizes for the Alpha cluster (using100% CPU time) and the MicroGrid (using 4% CPUtime). The simulation rate is therefore 0.04. The x-axis shows the time, with one sample of the variablesbeing made every 1 second for the Alpha cluster, andevery 25 seconds for the MicroGrid to take into accountthe simulation rate. We see that the traces for boththe physical system and the simulation follow the samestructure, and match fairly well in time. We calculatedthe “skew” in the graphs for each benchmark as the rootmean square percentage difference recorded at eachsample time. This value was found to be 3.08% forEP, 2.02% for BT and 8.33% for MG. The closeness ofthe internal matching provides stronger evidence thatthe MicroGrid is precisely matched the modeled Gridsystem’s behavior.

4. Related work

While there are many related efforts, we know of noother Grid, distributed system, or parallel system mod-eling efforts that seek this level of modeling of Compu-tational Grids – validated, general purpose direct sim-ulation.

Early papers on the Globus MDS described a fea-ture called “resource images” which are similar to ourvirtual Grid entries. However, to our knowledge, theseimages were never used. The most closely relatedproject is the Ninf/Bricks project [20] which has fo-cused on evaluating scheduling strategies based on syn-thetic models of applications and innovative networktraffic models. However, to date Bricks has not usedGrid applications as a direct driver.

There are a wide variety of simulation tools for par-allel systems; however, they focus primarily on archi-tectural simulations [4,14,16,18] and program debug-ging [5,12]. These tools typically do not model systemheterogeneity, irregular network structure, a complex

138 H.J. Song / The MicroGrid: A scientific tool for modeling Computational Grids

NPB under network degrade

0

5

10

15

20

25

30

LU BT MG EP

bandwidth of one link of vBNS drops from 622Mb/s to10 Mb/s

Exe

cuti

on

tim

e(s

)

622Mb/s

155Mb/s

10Mb/s

Fig. 14. Performance of NPB over the vBNS distributed cluster testbed, varying the network speed of the major WAN links.

0.8

0.85

0.9

0.95

1

1.05

MG BT LU EP

No

rmal

ized

Ru

nti

me

1x system

2x system

4x system

8x system

Fig. 15. Total run times varying emulation rates.

Grid software environment, and the traffic modelingrequired by the network environments.

There a wide variety of network simulation tools, in-cluding those for parallel systems [10,11,15] and thosefor Internet style networks [2,3,9]. The parallel systemtools typically do not model the detailed structure incomplex heterogeneous network environments. Thosefor Internet systems typically do not support live trafficsimulation. Also, since Internet workloads are typi-cally large numbers of WWW users on low-bandwidthlinks, these simulators do not focus on scaling with

high intensity workloads and large dynamic range (sixorders of magnitude) in single user network demand.

5. Summary and future work

We are pursuing a research agenda which exploreshow to simulate and model large-scale Grid structures.To this end, we have developed a set of simulationtools called the MicroGrid that enable experimentersto run arbitrary Grid applications on arbitrary virtual

H.J. Song / The MicroGrid: A scientific tool for modeling Computational Grids 139

0

20

40

60

80

100

120

50 250

Grid Size (one edge)

Exe

cutio

nTi

me

Physical Grid

MicroGrid

0

0.5

1

1.5

2

50 250

Physical Grid

MicroGrid

Fig. 16. CACTUS runs on a physical cluster and on the MicroGrid tools modeling that same cluster performance.

Performance of EP

0

5

10

15

20

25

30

35

40

45

1 49 97 145

193

241

289

337

385

433

481

529

577

625

673

Sample number

Va

riab

leva

lue

Physical grid

MicroGrid

Performance of BT

0

1

2

3

4

5

6

7

8

9

1

212

423

634

845

1056

1267

1478

1689

1900

2111

2322

2533

2744

2955

3166

Sample number

Var

iabl

eva

lue

Physical grid

MicroGrid

Performance of MG

0

0.5

1

1.5

2

2.5

1 5 9 13 17 21 25 29 33 37 41 45 49 53 57

Sample number

Va

riab

leva

lue

Physical grid

MicroGrid

Fig. 17. Comparing autopilot data from a physical system and a MicroGrid simulation.

Grid resources, allowing the study of complex dynamicbehavior. We describe our initial design, implementa-tion, and validation of the MicroGrid tools on micro-benchmarks, a well-known parallel benchmark suites,the NAS parallel benchmarks, and a full-blown applica-

tion program, the CACTUS system for parallel partialdifferential equation solvers. These studies show thatthe MicroGrid approach is feasible and the basic im-plementation is validated. In addition, we perform aninternal validation showing that the execution at each

140 H.J. Song / The MicroGrid: A scientific tool for modeling Computational Grids

time step within a MicroGrid simulation closely fol-lows that in an actual execution. This internal valida-tion is achieved using the Autopilot tools, and runningthem within the MicroGrid environment.

While we have made tangible progress, signifi-cant challenges remain. For some applications, us-ing smaller scheduling quanta, enabled by a real-timescheduling subsystem can improve the fidelity of oursimulations. We are pursuing the development of anew scheduler based on real-time priorities. In the nearterm, we plan to support scaling to dozens of machines,dynamic mapping of virtual resources, and dynamicvirtual time. In the longer term, we plan to solve ques-tions of extreme scalability – how to get to 100 mil-lion simulated nodes, exploring a range of simulationspeed and fidelity, understanding how to extrapolatefrom a small set of Grid simulations to a much broaderspace of network environment and application behav-ior. We will also pursue the use of the MicroGrid toolswith a much larger range of applications, exploiting thegrowing range of Globus applications that are becom-ing available. We also invite the participation of otherresearch groups to extend and enhance the MicroGridsimulation infrastructure.

Acknowledgements

The research described is supported in part byDARPA thru the US Air Force Research LaboratoryContract F30602-99-1-0534. It is also supported byNSF EIA-99-75020 and supported in part by funds fromthe NSF Partnerships for Advanced Computational In-frastructure – the Alliance (NCSA) and NPACI. Sup-port from Microsoft, Hewlett-Packard, Myricom Cor-poration, Intel Corporation, and Packet Engines is alsogratefully acknowledged.

References

[1] G. Allen, T. Goodale, G. Lanfermann, T. Radke and E. Seidel,The cactus code, a problem solving environment for the grid,2000.

[2] S. Bajaj, L. Breslau, D. Estrin, K. Fall, S. Floyd, P. Haldar,M. Handley, A. Helmy, J. Heidemann, P. Huang, S. Kumar, S.McCanne, R. Rejaie, P. Sharma, K. Varadhan, Y. Xu, H. Yuand D. Zappala, Improving simulation for network research,Technical Report 99-702, University of Southern California,1999, pp. 99–702, http://netweb.usc.edu/vint/.

[3] J. Cowie, H. Liu, J. Liu, D. Nicol and A. Ogielski, Towards Re-alistic Million-Node Internet Simulations, in: Proceedings ofthe 1999 International Conference on Parallel and DistributedProcessing Techniques and Applications (PDPTA’99), 1999,http://www.cs.dartmouth.edu/˜ jasonliu/projects/ssf/.

[4] G.T.E. Tam, J. Rivers and E.S. Davidson, mlcache: A Flex-ible Multi-Lateral Cache Simulator, University of MichiganDepartment of Electrical Engineering and Computer Science,CSE-TR-363-98, 1998.

[5] M. Feng and C.E. Leiserson, Efficient Detection of Determi-nacy Races in Cilk Programs, in: Proceedings of the NinthAnnual ACM Symposium on Parallel Algorithms and Archi-tectures (SPAA), 1997, pp. 1–11.

[6] I. Foster and C. Kesselman, Globus: A metacomputing in-frastructure toolkit, International Journal of SupercomputingApplications, (1997), http://www.globus.org.

[7] A.S. Grimshaw, W.A. Wulf and the Legion team, The Legionvision of a worldwide virtual computer, Communications ofthe ACM (1997), http://legion.virginia.edu.

[8] C. Lin, H. Chu and K. Nahrstedt, A Soft Real-time SchedulingServer on the Windows NT, in: Proceedings of the SecondUSENIX Windows NT Symposium, 1998.

[9] D.M. Nicol, J. Cowie and A.T. Ogielski, Modeling the GlobalInternet,Computing in Science & Engineering 1(1) (1999), pp. 42–50,http://www.cs.dartmouth.edu/˜jasonliu/projects/ssf/.

[10] L. Snyder, K. Bolding and M. Fulgham, The Case for ChaoticAdaptive Routing, University of Washington, UW-CSE-94-02-04, 1994.

[11] J.H. Kim and A. Chien, Network Performance Under BimodalTraffic Loads, Journal of Parallel and Distributed Computing28(1) (1995).

[12] LAM/MPI Parallel Computing, Home Page, at http://mpi.nd.edu/lam/.

[13] N. Miller and P. Steenkiste, Collecting Network Status In-formation for Network-Aware Applications, in: Infocom’00,2000, http://www.cs.cmu.edu/afs/cs.cmu.edu/project/cmcl/www/remulac/index.html.

[14] V.S. Pai, P. Ranganathan and S.V. Adve, RSIM: An Execution-Driven Simulator for ILP-Based Shared-Memory Multipro-cessors and Uniprocessors, in: Proceedings of the Third Work-shop on Computer Architecture Education, February 1997,http://www.ece.rice.edu/˜rsim/.

[15] S. Prakash, Performance Prediction of Parallel Programs,University of California Los Angels, 1996, http://pcl.cs.ucla.edu/projects/sesame/.

[16] S.K. Reinhardt, M.D. Hill, J.R. Larus, A.R. Lebeck, J.C.Lewis and D.A. Wood, The Wisconsin Wind Tunnel: VirtualPrototyping of Parallel Computers, in: Proceedings of the1993 ACM Sigmetrics Conference on Measurement and Mod-eling of Computer Systems, 1993, pp. 48–60, http://www.cs.wisc.edu/˜wwt/.

[17] R.L. Ribler, J.S. Vetter, H. Simitci and D.A. Reed, Au-topilot: Adaptive Control of Distributed Applications, Pro-ceedings of the 7th IEEE Symposium on High-PerformanceDistributed Computing, 1998, http://www-pablo.cs.uiuc.edu/Project/Autopilot/AutopilotOverview.htm.

[18] M. Rosenblum, S.A. Herrod, E. Witchel and A. Gupta, Com-plete Computer Simulation: The SimOS Approach, IEEE Par-allel and Distributed Technology (1995), http://simos. stan-ford.edu/.

[19] W. Saphir, R.V. der Wijngaart, A. Woo and M. Yarrow, NewImplementation and Results for the NAS Parallel Benchmarks2, NASA Ames Research Center, http://www.nas.nasa.gov/Software/NPB/.

[20] A. Takefusa, S. Matsuoka, H. Nakada, K. Aida and U.Nagashima, Overview of a Performance Evaluation Sys-tem for Global Computing Scheduling Algorithms, Proceed-ings of 8th IEEE International Symposium on High Perfor-

H.J. Song / The MicroGrid: A scientific tool for modeling Computational Grids 141

mance Distributed Computing (HPDC8), 1999, pp. 97–104,http://ninf.etl.go.jp/.

[21] The United States National Science Foundations vBNS. Veryhigh performance backbone network service. MCI WorldCom,Inc. http://www.vnbs.net/.

[22] R. Wolski, Dynamically forecasting network performance us-ing the network weather service, Journal of Cluster Comput-ing (1998), http://nws.npaci.edu/NWS/.

Submit your manuscripts athttp://www.hindawi.com

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttp://www.hindawi.com

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation http://www.hindawi.com Volume 2014

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Applied Computational Intelligence and Soft Computing

 Advances in 

Artificial Intelligence

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Hindawi Publishing Corporation

http://www.hindawi.com Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Modelling & Simulation in EngineeringHindawi Publishing Corporation http://www.hindawi.com Volume 2014

The Scientific World JournalHindawi Publishing Corporation http://www.hindawi.com Volume 2014

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014