politehnica university of bucharest california institute of technology national center for...

24
POLITEHNICA POLITEHNICA University of Bucharest University of Bucharest California California Institute of Technology Institute of Technology National Center for Information Technology National Center for Information Technology Ciprian Mihai Dobre Ciprian Mihai Dobre Corina Stratan Corina Stratan MONARC 2 - distributed systems simulation -

Post on 21-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: POLITEHNICA University of Bucharest California Institute of Technology National Center for Information Technology Ciprian Mihai Dobre Corina Stratan MONARC

POLITEHNICA POLITEHNICA

University of BucharestUniversity of Bucharest

California California

Institute of TechnologyInstitute of Technology

National Center for Information TechnologyNational Center for Information Technology

Ciprian Mihai DobreCiprian Mihai Dobre

Corina StratanCorina Stratan

MONARC 2- distributed systems simulation

-

Page 2: POLITEHNICA University of Bucharest California Institute of Technology National Center for Information Technology Ciprian Mihai Dobre Corina Stratan MONARC

The Goals of the ProjectThe Goals of the Project

• To perform realistic simulation and modelling of large scale distributed computing systems, customised for specific large scale HEP applications.

• To provide a design framework to evaluate the performance of a range of possible computer systems, as measured by their ability to provide the physicists with the requested data in the required time, and to optimise the cost.

• To narrow down a region in this parameter space in which viable models can be chosen by any of the future LHC-era experiments.

• To offer a dynamic and flexible simulation environment.

Page 3: POLITEHNICA University of Bucharest California Institute of Technology National Center for Information Technology Ciprian Mihai Dobre Corina Stratan MONARC

online systemmulti-level trigger

filter out backgroundreduce data volume

level 1 - special hardware

40 MHz (40 TB/sec)level 2 - embedded processorslevel 3 - PCs

75 KHz (75 GB/sec)5 KHz (5 GB/sec)100 Hz(100-1000 MB/sec)

data processing offline analysis, selection

One of the four LHC detectors (CMS)

Raw recording rate 0.1 – 1 GB/sec3 - 8 PetaBytes / year

LHC Computing: Different from LHC Computing: Different from Previous Experiment GenerationsPrevious Experiment Generations

Page 4: POLITEHNICA University of Bucharest California Institute of Technology National Center for Information Technology Ciprian Mihai Dobre Corina Stratan MONARC

Geographical dispersion:Geographical dispersion: of people and resources of people and resources Complexity:Complexity: the detector and the LHC environment; the detector and the LHC environment; Scale:Scale: ~100 times more processing power; Petabytes per year of data ~100 times more processing power; Petabytes per year of data

1800 Physicists 150 Institutes 32 Countries

VERY LARGE SCALE DISTRIBUTED SYSTEM AND IT HAS TO PROVIDE (NEAR) REAL-TIME DATA ACCESS FOR ALL THE PARTICIPANTS

CMS

Off-Line LHC ComputingOff-Line LHC ComputingData AnalysisData Analysis

Page 5: POLITEHNICA University of Bucharest California Institute of Technology National Center for Information Technology Ciprian Mihai Dobre Corina Stratan MONARC

Tier2 Center

Online System

Offline Farm,CERN Computer

France Center

FNAL Center Italy Center UK Center

InstituteInstituteInstituteInstitute ~0.25TIPS

Workstations

100–1000 MBytes/sec

~2.4 Gbits/sec

100 - 1000

Mbits/sec

Bunch crossing per 25 nsecs.Event is ~1 MByte in size

Physicists work on analysis “channels”.

Processing power: ~200,000 of today’s fastest PCs

Physics data cache

~PBytes/sec

~0.6 - 2.5 Gbits/sec

Tier2 CenterTier2 CenterTier2 Center

~622 Mbits/sec

Tier 0 +1

Tier 1

Tier 3

Tier 4

Tier2 Center Tier 2

Experiment

Regional Center Hierarchy Regional Center Hierarchy (Worldwide Data Grid)(Worldwide Data Grid)

Page 6: POLITEHNICA University of Bucharest California Institute of Technology National Center for Information Technology Ciprian Mihai Dobre Corina Stratan MONARC

The simulation model: abstracts the components of the real system and their

interactions must be equivalent to the simulated system

Simulation models: continuous time - the system is described by a set of

differential equations discrete time - the state changes only at certain time

moments In MONARC: one of the discrete time models (Discrete

Event Simulation – DES); the events represent important activities from the system, managed with the aid of an internal clock

Simulation ModelsSimulation Models

Page 7: POLITEHNICA University of Bucharest California Institute of Technology National Center for Information Technology Ciprian Mihai Dobre Corina Stratan MONARC

A Global View for ModellingA Global View for Modelling

Simulation Engine

Basic Components

Specific Components

Computing Models

LAN WAN

DB CPU

Scheduler Job

Catalog

Analysis

Distributed Scheduler

MetaDataJobs

MONITORING

REAL Systems Testbeds

Page 8: POLITEHNICA University of Bucharest California Institute of Technology National Center for Information Technology Ciprian Mihai Dobre Corina Stratan MONARC

Regional Center ModelRegional Center Model

JobJobJob

Activity Activity Activity

Job Scheduler

AJob AJobAJobCPU

...LinkPort

AJob AJobAJobCPU

...LinkPort

AJob AJobAJobCPU

...LinkPort

DB

Index

DBServer

LinkPort

DBServer

LinkPort

FARM

REGIONAL CENTER

LAN

WAN

Page 9: POLITEHNICA University of Bucharest California Institute of Technology National Center for Information Technology Ciprian Mihai Dobre Corina Stratan MONARC

The Simulation EngineThe Simulation Engine

Provides the multithreading mechanism for the simulation The entities with time dependent behavior are mapped on

“active objects” In the simulation engine: management of active objects and

events Thread reusability (thread pool)

Scheduler

Task Event EventQueue

WorkerThread Pool

Activity

JobScheduler

Farm

CPUUnit

AJobJob

Engine

Page 10: POLITEHNICA University of Bucharest California Institute of Technology National Center for Information Technology Ciprian Mihai Dobre Corina Stratan MONARC

Multitasking Processing ModelMultitasking Processing Model

Concurrent running tasks share resources (CPU, memory, I/O)

“Interrupt” driven scheme: For each new task or when one task is finished, an interrupt is

generated and all “processing times” are recomputed.

It provides:

Handling of concurrent jobs with different priorities.

An efficient mechanism to simulate multitask processing.

An easy way to apply different load balancingschemes.

Page 11: POLITEHNICA University of Bucharest California Institute of Technology National Center for Information Technology Ciprian Mihai Dobre Corina Stratan MONARC

Engine testsEngine tests

Processing a TOTAL of 100 000 simple jobs in 1 , 10, 100, 1000, 2 000 , 4 000, 10 000 CPUs (number of CPUs = number of parallel threads):

1

10

100

1000

10000

10 100 1000 10000 100000

No of THREADS

Tim

e [

s]

2X2.4 GHz, Linux

2X450MHz Solaris

2X3GHz, Windows

more tests: http://monalisa.cacr.caltech.edu/MONARC/

Page 12: POLITEHNICA University of Bucharest California Institute of Technology National Center for Information Technology Ciprian Mihai Dobre Corina Stratan MONARC

Job SchedulingJob Scheduling

Dynamically loadable modules for each regional center

Basic job scheduler: assigns the jobs to CPUs from the local farm

More complex schedulers: allow job migration between regional centers

CPU FARM

JobScheduler

Site A

Dynamically loadable module

Page 13: POLITEHNICA University of Bucharest California Institute of Technology National Center for Information Technology Ciprian Mihai Dobre Corina Stratan MONARC

Centralized SchedulingCentralized Scheduling

CPU FARM

JobScheduler

Site A

CPU FARM

JobScheduler

Site B

GLOBAL

Job Scheduler

Page 14: POLITEHNICA University of Bucharest California Institute of Technology National Center for Information Technology Ciprian Mihai Dobre Corina Stratan MONARC

Distributed Scheduling Distributed Scheduling – – market model –market model –

CPU FARM

JobScheduler

Site A

CPU FARM

JobScheduler

Site B

CPU FARM

JobScheduler

Site A

Request

COST

DECISION

Page 15: POLITEHNICA University of Bucharest California Institute of Technology National Center for Information Technology Ciprian Mihai Dobre Corina Stratan MONARC

Example: simple distributed schedulingExample: simple distributed scheduling

Very simple scheduling algorithm, based on searching the center with the minimum load

We simulated the activity of 4 regional centers

When all the centers are heavily loaded, the number of job transfers grows unnecessarily

Page 16: POLITEHNICA University of Bucharest California Institute of Technology National Center for Information Technology Ciprian Mihai Dobre Corina Stratan MONARC

Network ModelNetwork Model

WAN

WAN

WAN

WAN

LANLAN LANLAN

LinkPortLinkPort

Farm Farm

Simulated local trafficSimulated inter-regional traffic

Simulated networkcomponents

Page 17: POLITEHNICA University of Bucharest California Institute of Technology National Center for Information Technology Ciprian Mihai Dobre Corina Stratan MONARC

Node Link

Node

Node

LANNode

Link

Node

Node

LAN

Node Link

Node

Node

LAN

Internet Connections

ROUTER

ROUTER“Interrupt” driven simulation : for each new message an interrupt is created and for all the active transfers the speed and the estimated time to complete the transfer are recalculated.

Continuous Flow between events !An efficient and realistic way to simulate concurrent transfers

having different sizes / protocols.

LAN/WAN Simulation Model

Page 18: POLITEHNICA University of Bucharest California Institute of Technology National Center for Information Technology Ciprian Mihai Dobre Corina Stratan MONARC

Network ModelNetwork Model

Network AccessLayer

Internet Layer

Transport Layer

Application Layer

MessageLinkPort, LAN,WAN

Protocol:TCPProtocolUDPProtocol

NetworkJob

The TCP/IP layers are closely followed

Page 19: POLITEHNICA University of Bucharest California Institute of Technology National Center for Information Technology Ciprian Mihai Dobre Corina Stratan MONARC

Data ModelData Model

Client Database Index

LinkPort DatabaseDatabase

Database

DContainer

DContainer

DContainerDatabase Server Mass Storage

Mapare

Task Database Entity

Page 20: POLITEHNICA University of Bucharest California Institute of Technology National Center for Information Technology Ciprian Mihai Dobre Corina Stratan MONARC

Data ModelData Model

Generic Data Container

Size Event Type Event Range Access Count INSTANCE

FTP ServerNode

DB Server NFS Server

FILE Data Base

Custom Data Server

NetworkFILE

META DATA CatalogReplication Catalog

Export / Import

Page 21: POLITEHNICA University of Bucharest California Institute of Technology National Center for Information Technology Ciprian Mihai Dobre Corina Stratan MONARC

Data ModelData Model

Data Container

JOB

META DATA CatalogReplication Catalog

Data Request

Data Container

Data Container

Data Container

List Of IO Transactions

Data Processing JOB

Select from the options

Page 22: POLITEHNICA University of Bucharest California Institute of Technology National Center for Information Technology Ciprian Mihai Dobre Corina Stratan MONARC

Activities: Arrival PatternsActivities: Arrival Patterns

A flexible mechanism to define the Stochastic process of how users perform data processing tasks

Dynamic loading of “Activity” tasks, which are threaded objects and are controlled by the simulation scheduling mechanism

Physics ActivitiesInjecting “Jobs”

Each “Activity” thread generates data processing jobs

for( int k =0; k< jobs_per_group; k++) { Job job = new Job( this, Job.ANALYSIS, "TAG”, 1, events_to_process); farm.addJob(job ); // submit the job sim_hold ( 1000 ); // wait 1000 s }

Regional Centre Farm

Job

Activity

Job

Job

Activity

These dynamic objects are used to model the users behavior

Page 23: POLITEHNICA University of Bucharest California Institute of Technology National Center for Information Technology Ciprian Mihai Dobre Corina Stratan MONARC

Output of the simulationOutput of the simulation

Simulation Engine

Node

DB

Router

User C

Output Listener Filters

Output Listener Filters

Log Files EXCEL

GRAPHICS

Any component in the system can generate generic results objects Any client can subscribe with a filter and will receive the results it is Interested in .VERY SIMILAR structure as in MonALISA . We will integrate soon The output of the simulation framework into MonaLISA

Page 24: POLITEHNICA University of Bucharest California Institute of Technology National Center for Information Technology Ciprian Mihai Dobre Corina Stratan MONARC

ConclusionsConclusions

http://monalisa.cacr.caltech.edu/MONARC

Modelling and understanding current systems, their performance and limitations, is essential for the design of the large scale distributed processing systems. This will require continuous iterations between modelling and monitoring

Simulation and Modelling tools must provide the functionality to help in designing complex systems and evaluate different strategies and algorithms for the decision making units and the data flow management.

For future development: efficient distributed scheduling algorithms, data replication, more complex examples.