distributed computing and analysis lamberto luminari italo – hellenic school of physics 2004...

82
Distributed Computing Distributed Computing and Analysis and Analysis Lamberto Luminari Lamberto Luminari Italo – Hellenic School of Physics Italo – Hellenic School of Physics 2004 2004 Martignano - May 20, 2004 Martignano - May 20, 2004

Upload: vernon-manning

Post on 04-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Distributed Computing and Analysis Lamberto Luminari Italo – Hellenic School of Physics 2004 Martignano - May 20, 2004

Distributed ComputingDistributed Computingand Analysisand Analysis

Lamberto LuminariLamberto Luminari

Italo – Hellenic School of Physics Italo – Hellenic School of Physics 20042004

Martignano - May 20, 2004Martignano - May 20, 2004

Page 2: Distributed Computing and Analysis Lamberto Luminari Italo – Hellenic School of Physics 2004 Martignano - May 20, 2004

Italo - Hellenic School of Physics 2004Italo - Hellenic School of Physics 2004 Lamberto LuminariLamberto Luminari 22

OutlineOutline

Introduction Introduction – General remarksGeneral remarks

Distributed computingDistributed computing– PrinciplesPrinciples– ProjectsProjects– Computing facilities: testbeds and production infrastructuresComputing facilities: testbeds and production infrastructures

Database SystemsDatabase Systems– PrinciplesPrinciples

Distributed analysisDistributed analysis – Requirements and issuesRequirements and issues

Page 3: Distributed Computing and Analysis Lamberto Luminari Italo – Hellenic School of Physics 2004 Martignano - May 20, 2004

Italo - Hellenic School of Physics 2004Italo - Hellenic School of Physics 2004 Lamberto LuminariLamberto Luminari 33

General remarksGeneral remarks

Schematic approachSchematic approach– For the purpose of clarity, differences among possible For the purpose of clarity, differences among possible

alternatives are stressed: in reality, solutions are often a alternatives are stressed: in reality, solutions are often a mix or a compromisemix or a compromise

– Only main features of relevant items are described: no Only main features of relevant items are described: no aim of exhaustivityaim of exhaustivity

HEP (LHC) oriented presentation HEP (LHC) oriented presentation – Examples are mainly taken from HEP worldExamples are mainly taken from HEP world– Projects with HEP community involvement are preferredProjects with HEP community involvement are preferred– Options chosen by LHC Options chosen by LHC

Page 4: Distributed Computing and Analysis Lamberto Luminari Italo – Hellenic School of Physics 2004 Martignano - May 20, 2004

Distributed ComputingDistributed Computing

Page 5: Distributed Computing and Analysis Lamberto Luminari Italo – Hellenic School of Physics 2004 Martignano - May 20, 2004

Italo - Hellenic School of Physics 2004Italo - Hellenic School of Physics 2004 Lamberto LuminariLamberto Luminari 55

Distributed computingDistributed computing What is it:What is it:

– processing of data and objects across a network of connected systems;processing of data and objects across a network of connected systems;– hardware and software infrastructure that provides pervasive (and hardware and software infrastructure that provides pervasive (and

inexpensive) access to computational capabilities.inexpensive) access to computational capabilities.

A long story:– mainframes more and more expensive;– cluster technology;– RISC machines very powerful.

What makes it appealing now:What makes it appealing now:– CPU power!CPU power!– Storage capacity!!Storage capacity!!– Network bandwidth!!!Network bandwidth!!!

... but Distr. Comp. is not a choice,... but Distr. Comp. is not a choice,rather a necessity or an opportunity. rather a necessity or an opportunity.

Page 6: Distributed Computing and Analysis Lamberto Luminari Italo – Hellenic School of Physics 2004 Martignano - May 20, 2004

Italo - Hellenic School of Physics 2004Italo - Hellenic School of Physics 2004 Lamberto LuminariLamberto Luminari 66

Network performancesNetwork performances

Page 7: Distributed Computing and Analysis Lamberto Luminari Italo – Hellenic School of Physics 2004 Martignano - May 20, 2004

Italo - Hellenic School of Physics 2004Italo - Hellenic School of Physics 2004 Lamberto LuminariLamberto Luminari 77

Advantages of distributed Advantages of distributed computingcomputing

Scalability and flexibilityScalability and flexibility:: – in principle, distributed computing systems are infinitely scalable: in principle, distributed computing systems are infinitely scalable:

simply add more units and get more computing power.simply add more units and get more computing power. Moreover you Moreover you can add or remove specific resources and adapt the system to your can add or remove specific resources and adapt the system to your needs.needs.

Efficiency:Efficiency:– private resources are usually poorly used: pooling them greatly private resources are usually poorly used: pooling them greatly

increases their exploitation.increases their exploitation. Reliability:Reliability:

– failure of a component little affects the overall performances.failure of a component little affects the overall performances. Load balancing and averaging:Load balancing and averaging:

– distributing tasks according to the availability of resources optimize distributing tasks according to the availability of resources optimize the behavior of the whole system and minimize the execution time;the behavior of the whole system and minimize the execution time;

– load peaks arising from different user communities rarely sum up, then load peaks arising from different user communities rarely sum up, then the use of resources is averaged (and optimized) over long periods.the use of resources is averaged (and optimized) over long periods.

Page 8: Distributed Computing and Analysis Lamberto Luminari Italo – Hellenic School of Physics 2004 Martignano - May 20, 2004

Italo - Hellenic School of Physics 2004Italo - Hellenic School of Physics 2004 Lamberto LuminariLamberto Luminari 88

Disadvantages of distributed Disadvantages of distributed computingcomputing

Difficult integration and coordination:Difficult integration and coordination: – many heterogeneous computing systems have to be integrated;many heterogeneous computing systems have to be integrated;– data sets are splitted over different storage systems;data sets are splitted over different storage systems;

– many users have to cooperate and share resources.many users have to cooperate and share resources.

Unpredictability:Unpredictability:– the quantity of available resources may largely fluctuate;the quantity of available resources may largely fluctuate;– computing units may become unavailable or unreachable suddenly computing units may become unavailable or unreachable suddenly

and for long periods, making unpredictable the completion time of and for long periods, making unpredictable the completion time of the tasks running there.the tasks running there.

Security problems:Security problems:– distributed systems are prone to intrusion.distributed systems are prone to intrusion.

Page 9: Distributed Computing and Analysis Lamberto Luminari Italo – Hellenic School of Physics 2004 Martignano - May 20, 2004

Italo - Hellenic School of Physics 2004Italo - Hellenic School of Physics 2004 Lamberto LuminariLamberto Luminari 99

Applications and distributed Applications and distributed computingcomputing

Suitable: Suitable: – high compute to data ratio;high compute to data ratio; – batch processes;batch processes;– loosely coupled tasks;loosely coupled tasks;– statistical evaluations dependent on random trials;statistical evaluations dependent on random trials;– data mining through distributed filesystems or databases.data mining through distributed filesystems or databases.

Unsuitable:Unsuitable:– real time;real time;– interactive processes;interactive processes;– strongly coupled;strongly coupled;– sequentialsequential..

Page 10: Distributed Computing and Analysis Lamberto Luminari Italo – Hellenic School of Physics 2004 Martignano - May 20, 2004

Italo - Hellenic School of Physics 2004Italo - Hellenic School of Physics 2004 Lamberto LuminariLamberto Luminari 1010

Distributed computing Distributed computing architecturesarchitectures

Peer-to-peer: Peer-to-peer: – flat organization of components, with similar functionalities, talking to each flat organization of components, with similar functionalities, talking to each

other;other;– suitable for:suitable for:

independent tasks or poor inter-task communication;independent tasks or poor inter-task communication; access to sparse data organized in a non hierarchical way.access to sparse data organized in a non hierarchical way.

Client - serverClient - server ::– components with different functionalities and rolescomponents with different functionalities and roles::

processing unit (client) provided with a lightweight agent able to perform processing unit (client) provided with a lightweight agent able to perform simple operations:simple operations: detect system status and notify it to the server, ask (or detect system status and notify it to the server, ask (or wait) for tasks, accept and send data, execute processes according to wait) for tasks, accept and send data, execute processes according to priorities or in spare cycles, ....priorities or in spare cycles, ....

dedicated unit (server) provided with complex software able to: take or dedicated unit (server) provided with complex software able to: take or send computing requests, monitor the status of the jobs sent to the clients, send computing requests, monitor the status of the jobs sent to the clients, receive the results and assemble them, possibly in a database. It also takes receive the results and assemble them, possibly in a database. It also takes care of security and access policy, and stores statistics and accounting care of security and access policy, and stores statistics and accounting data.data.

– suitable for:suitable for: complex architectures and tasks.complex architectures and tasks.

Page 11: Distributed Computing and Analysis Lamberto Luminari Italo – Hellenic School of Physics 2004 Martignano - May 20, 2004

Italo - Hellenic School of Physics 2004Italo - Hellenic School of Physics 2004 Lamberto LuminariLamberto Luminari 1111

Multi-tier computing systemsMulti-tier computing systems

Components with different levels of service, Components with different levels of service, arranged in tiers: arranged in tiers: – computing centers (multi-processors, PC farms, data storage computing centers (multi-processors, PC farms, data storage

systems);systems); – clusters of dedicated machines;clusters of dedicated machines;– individual, general use PCs.individual, general use PCs.

Different functionalities for each tier:Different functionalities for each tier:– amount of CPU power installed and data stored;amount of CPU power installed and data stored;– quality and schedule of user support;quality and schedule of user support;– level of reliability and securitylevel of reliability and security..

Page 12: Distributed Computing and Analysis Lamberto Luminari Italo – Hellenic School of Physics 2004 Martignano - May 20, 2004

Italo - Hellenic School of Physics 2004Italo - Hellenic School of Physics 2004 Lamberto LuminariLamberto Luminari 1212

Page 13: Distributed Computing and Analysis Lamberto Luminari Italo – Hellenic School of Physics 2004 Martignano - May 20, 2004

Italo - Hellenic School of Physics 2004Italo - Hellenic School of Physics 2004 Lamberto LuminariLamberto Luminari 1313

Distributed computing modelsDistributed computing models

Clusters: Clusters: – groups of homogeneous, groups of homogeneous, tightly coupledtightly coupled components, sharing components, sharing

file systems and file systems and peripheral devices (e.g., peripheral devices (e.g., BeowulfBeowulf));;

Pools of desktop PCs: Pools of desktop PCs: – loosely interconnected private machines (e.g., loosely interconnected private machines (e.g., CondorCondor););

Grids: Grids: – heterogeneous systems of (mainly dedicated) resources (e.g., heterogeneous systems of (mainly dedicated) resources (e.g.,

LCGLCG).).

Page 14: Distributed Computing and Analysis Lamberto Luminari Italo – Hellenic School of Physics 2004 Martignano - May 20, 2004

Italo - Hellenic School of Physics 2004Italo - Hellenic School of Physics 2004 Lamberto LuminariLamberto Luminari 1414

Comparison of computing modelsComparison of computing models

Page 15: Distributed Computing and Analysis Lamberto Luminari Italo – Hellenic School of Physics 2004 Martignano - May 20, 2004

Italo - Hellenic School of Physics 2004Italo - Hellenic School of Physics 2004 Lamberto LuminariLamberto Luminari 1515

Condor is a specialized workload management system for compute-intensive Condor is a specialized workload management system for compute-intensive jobs. It provides:jobs. It provides:– a job queueing mechanism;a job queueing mechanism;– scheduling policy;scheduling policy;– priority scheme;priority scheme;– resource monitoring;resource monitoring;– resource management.resource management.

Users submit their serial or parallel jobs to Condor, which places them into a Users submit their serial or parallel jobs to Condor, which places them into a queue, chooses when and where to run the jobs based upon a policy, queue, chooses when and where to run the jobs based upon a policy, carefully monitors their progress, and ultimately informs the user upon carefully monitors their progress, and ultimately informs the user upon completion.completion.

Unique mechanisms enable Condor to effectively harness wasted CPU power Unique mechanisms enable Condor to effectively harness wasted CPU power from otherwise idle desktop workstations. Condor is able to transparently from otherwise idle desktop workstations. Condor is able to transparently produce a checkpoint and migrate a job to a different machine.produce a checkpoint and migrate a job to a different machine.

Condor does not require a shared file system across machines: if no shared Condor does not require a shared file system across machines: if no shared file system is available, Condor can transfer the job's data files on behalf of file system is available, Condor can transfer the job's data files on behalf of the user, or Condor may be able to transparently redirect all the job's I/O the user, or Condor may be able to transparently redirect all the job's I/O requests back to the submit machine. requests back to the submit machine.

Page 16: Distributed Computing and Analysis Lamberto Luminari Italo – Hellenic School of Physics 2004 Martignano - May 20, 2004

Italo - Hellenic School of Physics 2004Italo - Hellenic School of Physics 2004 Lamberto LuminariLamberto Luminari 1616

resourcesresources

datadata

networknetwork

Page 17: Distributed Computing and Analysis Lamberto Luminari Italo – Hellenic School of Physics 2004 Martignano - May 20, 2004

Italo - Hellenic School of Physics 2004Italo - Hellenic School of Physics 2004 Lamberto LuminariLamberto Luminari 1717

Distributed computing Distributed computing environmentenvironment

DCE standards: DCE standards: – A distributed computing network may include many different systems. The A distributed computing network may include many different systems. The

Distributed Computing Environment (DCE)Distributed Computing Environment (DCE) — formulated by The Open — formulated by The Open Group — formalizes the technologies needed to make the components Group — formalizes the technologies needed to make the components communicate with each other, such as remote procedural calls and communicate with each other, such as remote procedural calls and middleware. middleware. DCE runs on all major computing platforms and is designed to DCE runs on all major computing platforms and is designed to support distributed applications in heterogeneous hardware and software support distributed applications in heterogeneous hardware and software environmentsenvironments..

DCE provides a complete infrastructure, with services, DCE provides a complete infrastructure, with services, interfaces, protocols, encoding rules for:interfaces, protocols, encoding rules for: – authentication and security (authentication and security (Kerberos, Public Key certificateKerberos, Public Key certificate););– objects interoperability across different platforms (objects interoperability across different platforms (CORBA: Common Object CORBA: Common Object

Request Broker ArchitectureRequest Broker Architecture); ); – directories (with global name and cell namedirectories (with global name and cell name)) for distributed resources; for distributed resources;– time services (including synchronization);time services (including synchronization);– distributed file systems;distributed file systems;– Remote Procedure Call;Remote Procedure Call; – Internet/Intranet communications. Internet/Intranet communications.

Page 18: Distributed Computing and Analysis Lamberto Luminari Italo – Hellenic School of Physics 2004 Martignano - May 20, 2004

Italo - Hellenic School of Physics 2004Italo - Hellenic School of Physics 2004 Lamberto LuminariLamberto Luminari 1818

Grid computing specificationsGrid computing specifications

The Global Grid Forum (The Global Grid Forum (GGFGGF) is the primary organization whose purpose is ) is the primary organization whose purpose is to define specifications about Grid Computing. to define specifications about Grid Computing. It is a forum for information exchange and collaboration among people who are

doing Grid research, designing and building Grid software, deploying Grids, using Grids,

spanning technology areas: scheduling, data handling, security…

The The Globus Globus Toolkit (developed in Argonne Nat. Lab. and Univ. of Southern Toolkit (developed in Argonne Nat. Lab. and Univ. of Southern California) is an implementation of these standards, and has become a de California) is an implementation of these standards, and has become a de facto standard for grid middleware because of some attractive features: facto standard for grid middleware because of some attractive features: – a object-oriented approach, which allows developers of specific a object-oriented approach, which allows developers of specific

applications to take just what meets their needs, to introduce tools applications to take just what meets their needs, to introduce tools one at a time and to make programs increasingly "one at a time and to make programs increasingly "Grid-enabledGrid-enabled“;“;

– the toolkit software is “the toolkit software is “open-sourceopen-source“: this allows developers to freely “: this allows developers to freely make and add improvements. make and add improvements.

Page 19: Distributed Computing and Analysis Lamberto Luminari Italo – Hellenic School of Physics 2004 Martignano - May 20, 2004

Italo - Hellenic School of Physics 2004Italo - Hellenic School of Physics 2004 Lamberto LuminariLamberto Luminari 1919

Globus toolkitGlobus toolkit

Practically all major Grid projects are being built on protocols and Practically all major Grid projects are being built on protocols and services provided by the Globus Toolkit, a software "services provided by the Globus Toolkit, a software "work-in-work-in-progressprogress" which is being developed by the " which is being developed by the Globus AllianceGlobus Alliance, , which involves primarily Ian Foster's team at Argonne National which involves primarily Ian Foster's team at Argonne National Laboratory and Carl Kesselman's team at the University of Laboratory and Carl Kesselman's team at the University of Southern California in Los Angeles.Southern California in Los Angeles.

The toolkit provides a set of The toolkit provides a set of software toolssoftware tools to implement the to implement the basic services and capabilitiesbasic services and capabilities required to construct a required to construct a computational Grid, such as security, resource location, resource computational Grid, such as security, resource location, resource management, and communications. management, and communications.

Globus includes programs such as: Globus includes programs such as: – Computing ElementComputing Element: receives job requests and delivers : receives job requests and delivers

them to the Worker Nodes, which will perform the real work. them to the Worker Nodes, which will perform the real work. The Computing Element provides an interface to the local The Computing Element provides an interface to the local batch queuing systems. A Computing Element can manage batch queuing systems. A Computing Element can manage one or more Worker Nodes:one or more Worker Nodes:

Page 20: Distributed Computing and Analysis Lamberto Luminari Italo – Hellenic School of Physics 2004 Martignano - May 20, 2004

Italo - Hellenic School of Physics 2004Italo - Hellenic School of Physics 2004 Lamberto LuminariLamberto Luminari 2020

Globus ToolkitGlobus Toolkit

The The GlobusGlobus toolkit provides a set of software tools to implement the basic toolkit provides a set of software tools to implement the basic services and capabilities required to construct a computational Grid, such as services and capabilities required to construct a computational Grid, such as security, resource location, resource management, and communicationssecurity, resource location, resource management, and communications: :

– GRAMGRAM (Globus Resource Allocation Manager), to convert a request for resources (Globus Resource Allocation Manager), to convert a request for resources into commands that local computers can understand;into commands that local computers can understand;

– GSIGSI (Grid Security Infrastructure), to provide authentication of the user and work (Grid Security Infrastructure), to provide authentication of the user and work out that person's access rights;out that person's access rights;

– MDSMDS (Monitoring and Discovery Service), to collect information about resource (Monitoring and Discovery Service), to collect information about resource (processing capacity, bandwidth capacity, type of storage, etc);(processing capacity, bandwidth capacity, type of storage, etc);

– GRISGRIS (Grid Resource Information Service), to query resources for their current (Grid Resource Information Service), to query resources for their current configuration, capabilities, and status;configuration, capabilities, and status;

– GIISGIIS (Grid Index Information Service), to coordinate arbitrary GRIS services; (Grid Index Information Service), to coordinate arbitrary GRIS services;– GridFTPGridFTP, to provide a high-performance, secure and robust data transfer , to provide a high-performance, secure and robust data transfer

mechanismmechanism– Replica CatalogReplica Catalog, a catalog that allows other Globus tools to look up where on the , a catalog that allows other Globus tools to look up where on the

Grid other replicas of a given dataset can be foundGrid other replicas of a given dataset can be found– Replica Management systemReplica Management system, which ties together the Replica Catalog and , which ties together the Replica Catalog and

GridFTP technologies, allowing applications to create and manage replicas of large GridFTP technologies, allowing applications to create and manage replicas of large datasets.datasets.

Page 21: Distributed Computing and Analysis Lamberto Luminari Italo – Hellenic School of Physics 2004 Martignano - May 20, 2004

Italo - Hellenic School of Physics 2004Italo - Hellenic School of Physics 2004 Lamberto LuminariLamberto Luminari 2121

OGSA: the future?OGSA: the future?

Page 22: Distributed Computing and Analysis Lamberto Luminari Italo – Hellenic School of Physics 2004 Martignano - May 20, 2004

Italo - Hellenic School of Physics 2004Italo - Hellenic School of Physics 2004 Lamberto LuminariLamberto Luminari 2222

Grid projects

… … and many others!and many others!

Page 23: Distributed Computing and Analysis Lamberto Luminari Italo – Hellenic School of Physics 2004 Martignano - May 20, 2004

Italo - Hellenic School of Physics 2004Italo - Hellenic School of Physics 2004 Lamberto LuminariLamberto Luminari 2323

Grid projects

•NASA Information Power Grid•DOE Science Grid•NSF National Virtual Observatory•NSF GriPhyN•DOE Particle Physics Data Grid•NSF TeraGrid•DOE ASCI Grid•DOE Earth Systems Grid•DARPA CoABS Grid•NEESGrid•DOH BIRN•NSF iVDGL•Grid2003•…….

•UK – GRIPP

•Netherlands – DutchGrid•Germany – UNICORE, Grid project•France – Grid funding approved•Italy – INFN Grid•Eire – Grid project•Switzerland - Network/Grid project•Hungary – DemoGrid•Norway, Sweden – NorduGrid•………•DataGrid (CERN, ...)

•EuroGrid (Unicore)•DataTag (CERN,…)•Astrophysical Virtual Observatory•GRIP (Globus/Unicore)•GRIA (Industrial applications)•GridLab (Cactus Toolkit)•CrossGrid (Infrastructure Components)•EGSO (Solar Physics)•EGEE•………

Page 24: Distributed Computing and Analysis Lamberto Luminari Italo – Hellenic School of Physics 2004 Martignano - May 20, 2004

Italo - Hellenic School of Physics 2004Italo - Hellenic School of Physics 2004 Lamberto LuminariLamberto Luminari 2424

Middleware projects relevant for Middleware projects relevant for HEPHEP

EDGEDG– European Data Grid (EU project)European Data Grid (EU project)

EGEEEGEE– Enabling Grids for E-science in Europe Enabling Grids for E-science in Europe (EU project)(EU project)

Grid2003Grid2003– joint project of the U.S. Grid projects iVDGL, GriPhyN and PPDG, and joint project of the U.S. Grid projects iVDGL, GriPhyN and PPDG, and

the U.S. participants in the LHC experiments ATLAS and CMS.the U.S. participants in the LHC experiments ATLAS and CMS.

Page 25: Distributed Computing and Analysis Lamberto Luminari Italo – Hellenic School of Physics 2004 Martignano - May 20, 2004

Italo - Hellenic School of Physics 2004Italo - Hellenic School of Physics 2004 Lamberto LuminariLamberto Luminari 2525

Page 26: Distributed Computing and Analysis Lamberto Luminari Italo – Hellenic School of Physics 2004 Martignano - May 20, 2004

Italo - Hellenic School of Physics 2004Italo - Hellenic School of Physics 2004 Lamberto LuminariLamberto Luminari 2626

Page 27: Distributed Computing and Analysis Lamberto Luminari Italo – Hellenic School of Physics 2004 Martignano - May 20, 2004

Italo - Hellenic School of Physics 2004Italo - Hellenic School of Physics 2004 Lamberto LuminariLamberto Luminari 2727

Page 28: Distributed Computing and Analysis Lamberto Luminari Italo – Hellenic School of Physics 2004 Martignano - May 20, 2004

Italo - Hellenic School of Physics 2004Italo - Hellenic School of Physics 2004 Lamberto LuminariLamberto Luminari 2828

LCG hierarchical information LCG hierarchical information serviceservice

Page 29: Distributed Computing and Analysis Lamberto Luminari Italo – Hellenic School of Physics 2004 Martignano - May 20, 2004

Italo - Hellenic School of Physics 2004Italo - Hellenic School of Physics 2004 Lamberto LuminariLamberto Luminari 2929

Replica managementReplica management

Page 30: Distributed Computing and Analysis Lamberto Luminari Italo – Hellenic School of Physics 2004 Martignano - May 20, 2004

Italo - Hellenic School of Physics 2004Italo - Hellenic School of Physics 2004 Lamberto LuminariLamberto Luminari 3030

Page 31: Distributed Computing and Analysis Lamberto Luminari Italo – Hellenic School of Physics 2004 Martignano - May 20, 2004

Italo - Hellenic School of Physics 2004Italo - Hellenic School of Physics 2004 Lamberto LuminariLamberto Luminari 3131

Page 32: Distributed Computing and Analysis Lamberto Luminari Italo – Hellenic School of Physics 2004 Martignano - May 20, 2004

Italo - Hellenic School of Physics 2004Italo - Hellenic School of Physics 2004 Lamberto LuminariLamberto Luminari 3232

Page 33: Distributed Computing and Analysis Lamberto Luminari Italo – Hellenic School of Physics 2004 Martignano - May 20, 2004

Italo - Hellenic School of Physics 2004Italo - Hellenic School of Physics 2004 Lamberto LuminariLamberto Luminari 333333

A J ob Submission Example

UIJDL

Logging &Logging &BookBook--keepingkeeping

ResourceResourceBrokerBroker

Output “sandbox”

Input “sandbox”

Job SubmissionJob SubmissionServiceService

StorageStorageElementElement

ComputeComputeElementElement

Brokerinfo

Output “sandbox”

Input “sandbox”

Information Information ServiceService

Job Status

LFN->PFN

Data ManagementData ManagementServicesServices

Author.&Authen.

Job Subm

it

Job Query

Job Status

Page 34: Distributed Computing and Analysis Lamberto Luminari Italo – Hellenic School of Physics 2004 Martignano - May 20, 2004

Italo - Hellenic School of Physics 2004Italo - Hellenic School of Physics 2004 Lamberto LuminariLamberto Luminari 3434

Job submission steps (1)Job submission steps (1)

Page 35: Distributed Computing and Analysis Lamberto Luminari Italo – Hellenic School of Physics 2004 Martignano - May 20, 2004

Italo - Hellenic School of Physics 2004Italo - Hellenic School of Physics 2004 Lamberto LuminariLamberto Luminari 3535

Job submission steps (2)Job submission steps (2)

Page 36: Distributed Computing and Analysis Lamberto Luminari Italo – Hellenic School of Physics 2004 Martignano - May 20, 2004

Italo - Hellenic School of Physics 2004Italo - Hellenic School of Physics 2004 Lamberto LuminariLamberto Luminari 3636

PortalsPortalsWhy a portal?

• It can be accessed from everywhere and by “everything” (desktop, laptop, PDA, phone).

• It can keep the same user interface independently of the underlying middleware.

• It must be redundantly “secure” at all levels:

• secure for web transactions, • secure for user credentials, • secure for user authentication, • secure at VO level.

• All available grid services must be incorporated in a logic way, just “one mouse click away”.

• Its layout must be easily understandable and user friendly.

Page 37: Distributed Computing and Analysis Lamberto Luminari Italo – Hellenic School of Physics 2004 Martignano - May 20, 2004

Italo - Hellenic School of Physics 2004Italo - Hellenic School of Physics 2004 Lamberto LuminariLamberto Luminari 3737

Page 38: Distributed Computing and Analysis Lamberto Luminari Italo – Hellenic School of Physics 2004 Martignano - May 20, 2004

Italo - Hellenic School of Physics 2004Italo - Hellenic School of Physics 2004 Lamberto LuminariLamberto Luminari 3838

Page 39: Distributed Computing and Analysis Lamberto Luminari Italo – Hellenic School of Physics 2004 Martignano - May 20, 2004

Italo - Hellenic School of Physics 2004Italo - Hellenic School of Physics 2004 Lamberto LuminariLamberto Luminari 3939

Computing facilities (1)Computing facilities (1)

Computing facilities (testbeds or production infrastructures) Computing facilities (testbeds or production infrastructures) are are made up of one or more nodes. Each node (computer center or made up of one or more nodes. Each node (computer center or cluster of resources) contains a certain number of components, cluster of resources) contains a certain number of components, which may be playing different roles. Some are site specific:which may be playing different roles. Some are site specific:

– Computing ElementComputing Element: receives job requests and delivers them to the : receives job requests and delivers them to the Worker Nodes, which will perform the real work. The Computing Worker Nodes, which will perform the real work. The Computing Element provides an interface to the local batch queuing systems. A Element provides an interface to the local batch queuing systems. A Computing Element can manage one or more Worker Nodes:Computing Element can manage one or more Worker Nodes:

Worker NodeWorker Node: the machine that will actually process data. : the machine that will actually process data. Typically managed via a local batch system. A Worker Node can also be installed on A Worker Node can also be installed on the same machine as the Computing Element. the same machine as the Computing Element.

– Storage ElementStorage Element: provides storage space to the facility. : provides storage space to the facility. The storage element may control large disk arrays, mass storage systems and the like; however, the SE interface hides the differences between these systems allowing uniform user access.

– User InterfaceUser Interface: the machine that allows users to access the facility. : the machine that allows users to access the facility. This is typically the machine the end-user logs into to submit jobs to the grid and to retrieve the output from those jobs.

Page 40: Distributed Computing and Analysis Lamberto Luminari Italo – Hellenic School of Physics 2004 Martignano - May 20, 2004

Italo - Hellenic School of Physics 2004Italo - Hellenic School of Physics 2004 Lamberto LuminariLamberto Luminari 4040

Computing facilities (2)Computing facilities (2)

Some other roles are shared by groups of users or by thwe Some other roles are shared by groups of users or by thwe whole grid:whole grid:

– Resource BrokerResource Broker: receives users' requests and queries the : receives users' requests and queries the Information Index to find suitable resources. Information Index to find suitable resources.

– Information IndexInformation Index: resides on the same machine as the : resides on the same machine as the Resource Broker, keeps information about the available Resource Broker, keeps information about the available resources.resources.

– Replica ManagerReplica Manager: coordinates file replication from one Storage : coordinates file replication from one Storage Element to another. Useful for data redundancy but also to move Element to another. Useful for data redundancy but also to move data closer to the machines which will perform computation.data closer to the machines which will perform computation.

– Replica CatalogReplica Catalog:: can reside on the same machine as the Replica can reside on the same machine as the Replica Manager, keeps information about file replicas. A logical file can Manager, keeps information about file replicas. A logical file can be associated to one or more physical files which are replicas of be associated to one or more physical files which are replicas of the same data. Thus a logical file name can refer to one or more the same data. Thus a logical file name can refer to one or more physical file names.physical file names.

Page 41: Distributed Computing and Analysis Lamberto Luminari Italo – Hellenic School of Physics 2004 Martignano - May 20, 2004

Italo - Hellenic School of Physics 2004Italo - Hellenic School of Physics 2004 Lamberto LuminariLamberto Luminari 4141

Computing facilities relevant for Computing facilities relevant for HEPHEP

EDGEDG– TestbedTestbed

LCGLCG– Production infrastructureProduction infrastructure

EGEEEGEE– Production infrastructureProduction infrastructure

Grid3Grid3– Production infrastructure operated jointly by the U.S. Grid projects Production infrastructure operated jointly by the U.S. Grid projects

iVDGL, GriPhyN and PPDG, and the U.S. participants in the LHC iVDGL, GriPhyN and PPDG, and the U.S. participants in the LHC experiments ATLAS and CMS.experiments ATLAS and CMS.

Page 42: Distributed Computing and Analysis Lamberto Luminari Italo – Hellenic School of Physics 2004 Martignano - May 20, 2004

Italo - Hellenic School of Physics 2004Italo - Hellenic School of Physics 2004 Lamberto LuminariLamberto Luminari 4242

Page 43: Distributed Computing and Analysis Lamberto Luminari Italo – Hellenic School of Physics 2004 Martignano - May 20, 2004

Italo - Hellenic School of Physics 2004Italo - Hellenic School of Physics 2004 Lamberto LuminariLamberto Luminari 4343

Page 44: Distributed Computing and Analysis Lamberto Luminari Italo – Hellenic School of Physics 2004 Martignano - May 20, 2004

Italo - Hellenic School of Physics 2004Italo - Hellenic School of Physics 2004 Lamberto LuminariLamberto Luminari 4444

LCG hybrid architectureLCG hybrid architecture

Multi-tier hierarchyMulti-tier hierarchy++

GridsGrids

Page 45: Distributed Computing and Analysis Lamberto Luminari Italo – Hellenic School of Physics 2004 Martignano - May 20, 2004

Italo - Hellenic School of Physics 2004Italo - Hellenic School of Physics 2004 Lamberto LuminariLamberto Luminari 4545

Page 46: Distributed Computing and Analysis Lamberto Luminari Italo – Hellenic School of Physics 2004 Martignano - May 20, 2004

Italo - Hellenic School of Physics 2004Italo - Hellenic School of Physics 2004 Lamberto LuminariLamberto Luminari 4646

EGEE TimelineEGEE Timeline May 2003: proposal submittedMay 2003: proposal submitted July 2003: proposal acceptedJuly 2003: proposal accepted April 2004: start projectApril 2004: start project

Page 47: Distributed Computing and Analysis Lamberto Luminari Italo – Hellenic School of Physics 2004 Martignano - May 20, 2004

Italo - Hellenic School of Physics 2004Italo - Hellenic School of Physics 2004 Lamberto LuminariLamberto Luminari 4747

Grid3 infrastructureGrid3 infrastructure

Page 48: Distributed Computing and Analysis Lamberto Luminari Italo – Hellenic School of Physics 2004 Martignano - May 20, 2004

Italo - Hellenic School of Physics 2004Italo - Hellenic School of Physics 2004 Lamberto LuminariLamberto Luminari 4848

Virtual Organizations (User Virtual Organizations (User Communities)Communities)

Page 49: Distributed Computing and Analysis Lamberto Luminari Italo – Hellenic School of Physics 2004 Martignano - May 20, 2004

Italo - Hellenic School of Physics 2004Italo - Hellenic School of Physics 2004 Lamberto LuminariLamberto Luminari 4949

Multi-VO and one GridMulti-VO and one Grid

Grid (shared resources and services)Grid (shared resources and services)

Page 50: Distributed Computing and Analysis Lamberto Luminari Italo – Hellenic School of Physics 2004 Martignano - May 20, 2004

Italo - Hellenic School of Physics 2004Italo - Hellenic School of Physics 2004 Lamberto LuminariLamberto Luminari 5050

ATLAS Production SystemATLAS Production System

One VO and multi-GridOne VO and multi-Grid

Page 51: Distributed Computing and Analysis Lamberto Luminari Italo – Hellenic School of Physics 2004 Martignano - May 20, 2004

Italo - Hellenic School of Physics 2004Italo - Hellenic School of Physics 2004 Lamberto LuminariLamberto Luminari 5151

Multi-VO and multi-GridMulti-VO and multi-Grid

Shared Resources and Services

VO servicesand private resources

VO services

and private

resources

VO services VO services

Shared Resources and Services

VO servicesand private resources

Shared Resources and Services

Page 52: Distributed Computing and Analysis Lamberto Luminari Italo – Hellenic School of Physics 2004 Martignano - May 20, 2004

Italo - Hellenic School of Physics 2004Italo - Hellenic School of Physics 2004 Lamberto LuminariLamberto Luminari 5252

HEP RequirementsHEP Requirements

User requirements:User requirements:– Concerning services, the HEP community has already made a lot of work Concerning services, the HEP community has already made a lot of work

within EDG and LCG. The basic requirements have already been specified within EDG and LCG. The basic requirements have already been specified as use cases for HEP data processing ( as use cases for HEP data processing ( HEPCALHEPCAL report, May 2002). Using report, May 2002). Using the HEPCAL document to provide templates for requirements analysis, the HEPCAL document to provide templates for requirements analysis, the EDG/AWG(Application Working Group) aim at defining the EDG/AWG(Application Working Group) aim at defining requirements requirements for a high level common application layerfor a high level common application layer based on the needs of HEP, based on the needs of HEP, Bio-medicine and Earth Sciences, and is. High level APIs for Grid Services Bio-medicine and Earth Sciences, and is. High level APIs for Grid Services have also been defined by the EU funded project Gridlab. have also been defined by the EU funded project Gridlab.

– Concerning resources, the production service must provide a continuous, Concerning resources, the production service must provide a continuous, stable, robust environment and a controlled, reliable access to the stable, robust environment and a controlled, reliable access to the resources. The agreed sharing policies must be fully implemented and resources. The agreed sharing policies must be fully implemented and easily changeable.easily changeable.

Besides implementing the user requirements, practical help should Besides implementing the user requirements, practical help should be given in interfacing the experiment applications to grid services, be given in interfacing the experiment applications to grid services, and to evaluate the performance of the software deployed within and to evaluate the performance of the software deployed within the production environment, as well as in pre-production testbeds.the production environment, as well as in pre-production testbeds.

Page 53: Distributed Computing and Analysis Lamberto Luminari Italo – Hellenic School of Physics 2004 Martignano - May 20, 2004

Italo - Hellenic School of Physics 2004Italo - Hellenic School of Physics 2004 Lamberto LuminariLamberto Luminari 5353

SecuritySecurity

Security PolicySecurity Policy– The security organizational model, often tailored so far on the The security organizational model, often tailored so far on the

needs and characteristics of homogeneous communities, should in needs and characteristics of homogeneous communities, should in the future be based on service needs of many heterogeneous the future be based on service needs of many heterogeneous V.O.’s, introducing in the Grid organizational and security model a V.O.’s, introducing in the Grid organizational and security model a new complexity.new complexity.

CA PolicyCA Policy– A European Grid Policy Management Authority is a prerequisite for A European Grid Policy Management Authority is a prerequisite for

running a Grid infrastructure both in Europe and worldwide. The running a Grid infrastructure both in Europe and worldwide. The Grid Security Infrastructure relies on trusted Certification Grid Security Infrastructure relies on trusted Certification Authorities (CA). It is therefore essential that a Authorities (CA). It is therefore essential that a network of CA’snetwork of CA’s, , based on a commonly agreed set of requirements, is established based on a commonly agreed set of requirements, is established and maintained in Europe.and maintained in Europe.

Page 54: Distributed Computing and Analysis Lamberto Luminari Italo – Hellenic School of Physics 2004 Martignano - May 20, 2004

Italo - Hellenic School of Physics 2004Italo - Hellenic School of Physics 2004 Lamberto LuminariLamberto Luminari 5454

VO managementVO management

As more and more communities will join common production As more and more communities will join common production infrastructures, VO management is becoming crucial.infrastructures, VO management is becoming crucial.

Current technology offers support for rather Current technology offers support for rather staticstatic and and largelarge communities. The assignment of access rights is separated into two communities. The assignment of access rights is separated into two parts: local parts: local resource administratorsresource administrators grant rights to the VO as a grant rights to the VO as a whole, while whole, while VO administratorsVO administrators grant them to individual members of grant them to individual members of the community.the community.

In the future there will be the need for small (even only two people), In the future there will be the need for small (even only two people), short-lived (of the order of few days) and unforeseen (dynamically short-lived (of the order of few days) and unforeseen (dynamically discovered) VO’s. The goal would be to provide a very fine-grained discovered) VO’s. The goal would be to provide a very fine-grained authorization and access control mechanism, where applicable authorization and access control mechanism, where applicable based on global standards. based on global standards.

Page 55: Distributed Computing and Analysis Lamberto Luminari Italo – Hellenic School of Physics 2004 Martignano - May 20, 2004

Italo - Hellenic School of Physics 2004Italo - Hellenic School of Physics 2004 Lamberto LuminariLamberto Luminari 5555

Resource allocation and usageResource allocation and usage

Resource allocation and reservationResource allocation and reservation– In order to meet the needs of all the different Grid users, mechanisms will be In order to meet the needs of all the different Grid users, mechanisms will be

required to control and balance usage of the resources (including networks) by required to control and balance usage of the resources (including networks) by highly demanding applications, and to categorise and prioritise jobs so that they highly demanding applications, and to categorise and prioritise jobs so that they can receive the required level of service. can receive the required level of service.

– In particular, users should be able to allocate resources both immediately and in In particular, users should be able to allocate resources both immediately and in advance. Allocations must be restricted to authenticated users acting within advance. Allocations must be restricted to authenticated users acting within authorized roles, the services available must be determined by policies agreed authorized roles, the services available must be determined by policies agreed with the user organisations, and the aggregate services made available to VOs with the user organisations, and the aggregate services made available to VOs must be monitored to ensure adherence to the agreements.must be monitored to ensure adherence to the agreements.

Resource usage and accountingResource usage and accounting– A major issue is the control of usage of resources, once access to them has been A major issue is the control of usage of resources, once access to them has been

established. This includes interfaces to traditional Usage Control mechanisms established. This includes interfaces to traditional Usage Control mechanisms such as quotas and limits, and also the extraction and recording of usage for such as quotas and limits, and also the extraction and recording of usage for Budgeting, Accounting and Auditing purposes. Budgeting, Accounting and Auditing purposes.

– The usage quotas may be owned either by individuals or by VO's, and specified The usage quotas may be owned either by individuals or by VO's, and specified both in site-specific or Grid-wide protocols. This will include the ability to allow both in site-specific or Grid-wide protocols. This will include the ability to allow enforcement of quotas across a set of distributed resources.enforcement of quotas across a set of distributed resources.

Page 56: Distributed Computing and Analysis Lamberto Luminari Italo – Hellenic School of Physics 2004 Martignano - May 20, 2004

Italo - Hellenic School of Physics 2004Italo - Hellenic School of Physics 2004 Lamberto LuminariLamberto Luminari 5656

Organizational issuesOrganizational issues The need for resource sharing gives rise to a set of organisational The need for resource sharing gives rise to a set of organisational

issues to be faced, analysed and solved. Indeed, when a given issues to be faced, analysed and solved. Indeed, when a given organisation makes its own resources available on line:organisation makes its own resources available on line:

– Each organisation has its own decision and management independence: Each organisation has its own decision and management independence: the resources to be shared with other organisations should not the resources to be shared with other organisations should not jeopardize such independence.jeopardize such independence.

– Each organisation has its own Each organisation has its own access policiesaccess policies. It's not true that . It's not true that everybody in the Grid can use everything, but it's true that new everybody in the Grid can use everything, but it's true that new generations of network and grid technologies allow to define new generations of network and grid technologies allow to define new sharing models. Each organisation should be able to decide on each sharing models. Each organisation should be able to decide on each individual data, on each individual resource and on which organisation individual data, on each individual resource and on which organisation have the access/use right.have the access/use right.

– Each organisation has its own security policies: University security Each organisation has its own security policies: University security policies are usually completely different from those of physics laboratory policies are usually completely different from those of physics laboratory that works in close co-operation with government and the army. In order that works in close co-operation with government and the army. In order to guarantee a real resources sharing among different kinds of to guarantee a real resources sharing among different kinds of organisations, it's necessary to ensure the maximum level of flexibility organisations, it's necessary to ensure the maximum level of flexibility in the management of the above mentioned issues.in the management of the above mentioned issues.

Page 57: Distributed Computing and Analysis Lamberto Luminari Italo – Hellenic School of Physics 2004 Martignano - May 20, 2004

Italo - Hellenic School of Physics 2004Italo - Hellenic School of Physics 2004 Lamberto LuminariLamberto Luminari 5757

Requirements in LCGRequirements in LCG

Requirements are set by Experiments in the SC2 + Requirements are set by Experiments in the SC2 + Requirements and Technical Assessment Groups (Requirements and Technical Assessment Groups (RTAGRTAGs):s): On applications:On applications:

– data persistencydata persistency– software support processsoftware support process– mathematical librariesmathematical libraries– detector geometry descriptiondetector geometry description– Monte Carlo generatorsMonte Carlo generators– applications architectural blueprintapplications architectural blueprint– detector simulationdetector simulation

On FabricsOn Fabrics– mass storage requirementsmass storage requirements

On Grid technology and deployment areaOn Grid technology and deployment area– Grid technology use casesGrid technology use cases– Regional Center categorizationRegional Center categorization

Page 58: Distributed Computing and Analysis Lamberto Luminari Italo – Hellenic School of Physics 2004 Martignano - May 20, 2004

Italo - Hellenic School of Physics 2004Italo - Hellenic School of Physics 2004 Lamberto LuminariLamberto Luminari 5858

HEPCALHEPCAL

GENERAL USE CASES:GENERAL USE CASES:– Obtain Grid AuthorizationObtain Grid Authorization– Revoke Grid AuthorizationRevoke Grid Authorization– Grid LoginGrid Login– Browse Grid ResourcesBrowse Grid Resources

LCG RTAGLCG RTAG: Common Use Cases for a : Common Use Cases for a

HEPHEP CCommon ommon AApplication pplication LLayerayer

Requirements are given as a Requirements are given as a set of use casesset of use cases free of free of implementation detailsimplementation details

Page 59: Distributed Computing and Analysis Lamberto Luminari Italo – Hellenic School of Physics 2004 Martignano - May 20, 2004

Italo - Hellenic School of Physics 2004Italo - Hellenic School of Physics 2004 Lamberto LuminariLamberto Luminari 5959

HEPCALHEPCAL DATA MANAGEMENT USE CASES:DATA MANAGEMENT USE CASES:

– Data Set (DS) Metadata UpdateData Set (DS) Metadata Update– DS Metadata AccessDS Metadata Access– DS RegistrationDS Registration– Virtual DS DeclarationVirtual DS Declaration– Virtual DS MaterializationVirtual DS Materialization– DS UploadDS Upload– Catalogue CreationCatalogue Creation– DS AccessDS Access– DS transfer to non-Grid storageDS transfer to non-Grid storage– DS Replica UploadDS Replica Upload– DS Access Cost EvaluationDS Access Cost Evaluation– DS ReplicationDS Replication– Physical DS Instance DeletionPhysical DS Instance Deletion– DS DeletionDS Deletion– Catalogue DeletionCatalogue Deletion– Read from Remote DSRead from Remote DS– DS VerificationDS Verification– DS BrowsingDS Browsing– Browse Expt DatabaseBrowse Expt Database

Page 60: Distributed Computing and Analysis Lamberto Luminari Italo – Hellenic School of Physics 2004 Martignano - May 20, 2004

Italo - Hellenic School of Physics 2004Italo - Hellenic School of Physics 2004 Lamberto LuminariLamberto Luminari 6060

HEPCALHEPCAL

JOB MANAGEMENT USE CASES:JOB MANAGEMENT USE CASES:– Job Catalogue UpdateJob Catalogue Update– Job Catalogue QueryJob Catalogue Query– Job SubmissionJob Submission– Job Output Access or RetrievalJob Output Access or Retrieval– Job Error RecoveryJob Error Recovery– Job ControlJob Control– Steer Job SubmissionSteer Job Submission– Job Resource EstimationJob Resource Estimation– Job Environment ModificationJob Environment Modification– Job SplittingJob Splitting– Production JobProduction Job– AnalysisAnalysis– DS TransformationDS Transformation– Job MonitoringJob Monitoring– Conditions PublishingConditions Publishing– Software PublishingSoftware Publishing– Simulation JobSimulation Job– Exp’t Software Dev for GridExp’t Software Dev for Grid

Page 61: Distributed Computing and Analysis Lamberto Luminari Italo – Hellenic School of Physics 2004 Martignano - May 20, 2004

Italo - Hellenic School of Physics 2004Italo - Hellenic School of Physics 2004 Lamberto LuminariLamberto Luminari 6161

HEPCALHEPCAL VO MANAGEMENT USE CASES:VO MANAGEMENT USE CASES:

– Configuring the VO: Configuring the DS metadata catalogue (either initially or reconfiguring).Configuring the DS metadata catalogue (either initially or reconfiguring). Configuring the job catalogue (either initially or reconfiguring).Configuring the job catalogue (either initially or reconfiguring). Configuring the user profile (if this is possible at all on a VO basis).Configuring the user profile (if this is possible at all on a VO basis). Adding or removing VO elements, e.g. computing elements, storage elements, Adding or removing VO elements, e.g. computing elements, storage elements,

etc…etc… Configuring VO elements, including quotas, privileges etc.Configuring VO elements, including quotas, privileges etc.

– Managing the Users:Managing the Users: Add and remove users to/from the VO.Add and remove users to/from the VO. Modify the user information ( privileges, quotas, priorities…) either for single users Modify the user information ( privileges, quotas, priorities…) either for single users

or for subgroups of users within a VO.or for subgroups of users within a VO.– VO wide resource reservationVO wide resource reservation

The Grid should provide a tool to estimate the time-to-completion given as input The Grid should provide a tool to estimate the time-to-completion given as input an estimate of the resources needed by the joban estimate of the resources needed by the job. This is needed in particular to . This is needed in particular to estimate the access cost.estimate the access cost.

There should be use cases for There should be use cases for releasing reserved resourcesreleasing reserved resources, and system use cases , and system use cases for what to do in case a user does not submit a job for which resources are for what to do in case a user does not submit a job for which resources are reserved.reserved.

– VO wide resource allocation to users or groups/users of a VOVO wide resource allocation to users or groups/users of a VO– Software (or condition set) publishing, i.e. making it available on the GridSoftware (or condition set) publishing, i.e. making it available on the Grid

Page 62: Distributed Computing and Analysis Lamberto Luminari Italo – Hellenic School of Physics 2004 Martignano - May 20, 2004

Database SystemsDatabase Systems

Page 63: Distributed Computing and Analysis Lamberto Luminari Italo – Hellenic School of Physics 2004 Martignano - May 20, 2004

Italo - Hellenic School of Physics 2004Italo - Hellenic School of Physics 2004 Lamberto LuminariLamberto Luminari 6363

Database SystemsDatabase Systems

Database : Database : – one or more, large structured sets of persistent data. Usually one or more, large structured sets of persistent data. Usually

associated with software to update and query the data. A simple associated with software to update and query the data. A simple database might be a single file containing many records, each of database might be a single file containing many records, each of which contains the same set of fields, where each field is a certain which contains the same set of fields, where each field is a certain fixed width. A database is one component of a database management fixed width. A database is one component of a database management system.system.

Database Management System (Database Management System (DBMSDBMS))::– a set of programs (functions) that allows to manage the large, a set of programs (functions) that allows to manage the large,

structured sets of persistent data, which make up the database, and structured sets of persistent data, which make up the database, and provide access to the data for multiple, concurrent users whilst provide access to the data for multiple, concurrent users whilst maintaining the integrity of the data. The DBMS is in charge of all the maintaining the integrity of the data. The DBMS is in charge of all the functionalities related to the database: access, security, storage…functionalities related to the database: access, security, storage…

Page 64: Distributed Computing and Analysis Lamberto Luminari Italo – Hellenic School of Physics 2004 Martignano - May 20, 2004

Italo - Hellenic School of Physics 2004Italo - Hellenic School of Physics 2004 Lamberto LuminariLamberto Luminari 6464

Database Management SystemsDatabase Management Systems

DBMS providesDBMS provides::– security facilities to prevent unauthorized users from accessing the system, security facilities to prevent unauthorized users from accessing the system,

using names and passwords to identify operators, programs and individual using names and passwords to identify operators, programs and individual machines and sets of privileges assigned to them; these privileges can include machines and sets of privileges assigned to them; these privileges can include the ability to read, write and update data in the database;the ability to read, write and update data in the database;

– lock facilities to maintain data integrity; locks are used for read and write to lock facilities to maintain data integrity; locks are used for read and write to chunks of data: by doing this only one user at a time can alter data or users can chunks of data: by doing this only one user at a time can alter data or users can be prevented from accessing data being changed.be prevented from accessing data being changed. These requirements are These requirements are referred as ACIDreferred as ACID ( (AtomicityAtomicity, , ConsistencyConsistency, , IsolationIsolation and and DurabilityDurability):):

Atomicity: all the parts of a transaction's execution are either all committed or all Atomicity: all the parts of a transaction's execution are either all committed or all rolled back. All changes take effect, or none do. This ensures that there is not rolled back. All changes take effect, or none do. This ensures that there is not erroneous data in the systems or data which does not correspond to other data as it erroneous data in the systems or data which does not correspond to other data as it should.should.

Consistency: the database is transformed from one valid state to another valid state. A Consistency: the database is transformed from one valid state to another valid state. A transaction is legal only if it obeys user-defined integrity constraints. Illegal transaction is legal only if it obeys user-defined integrity constraints. Illegal transactions aren't allowed and, if an integrity constraint can't be satisfied the transactions aren't allowed and, if an integrity constraint can't be satisfied the transaction is rolled back to its previously valid state and the user informed that the transaction is rolled back to its previously valid state and the user informed that the transaction has failed. transaction has failed.

Isolation: the results of a transaction are invisible to other transactions until the Isolation: the results of a transaction are invisible to other transactions until the transaction is complete. transaction is complete.

Durability: once a transaction has been committed (completed), the results of a Durability: once a transaction has been committed (completed), the results of a transaction are permanent and can survive future system and media failures. transaction are permanent and can survive future system and media failures.

Page 65: Distributed Computing and Analysis Lamberto Luminari Italo – Hellenic School of Physics 2004 Martignano - May 20, 2004

Italo - Hellenic School of Physics 2004Italo - Hellenic School of Physics 2004 Lamberto LuminariLamberto Luminari 6565

Database SystemsDatabase Systems

Databases are based on many different models, each of which is Databases are based on many different models, each of which is designed with a specific problem, industry or set of functions in designed with a specific problem, industry or set of functions in mind. Here we attempt to look at the main types in some depthmind. Here we attempt to look at the main types in some depth: : – Relational DatabasesRelational Databases: data are structured in a series of tables, which have : data are structured in a series of tables, which have

columns representing the variables and rows that contain specific instances of columns representing the variables and rows that contain specific instances of data.data. Currently the most wide spread model. Currently the most wide spread model.

– Object Oriented DatabasesObject Oriented Databases: information is stored as a persistent object, and not : information is stored as a persistent object, and not as a row in a tableas a row in a table.. User defines objects and operations which can be executed User defines objects and operations which can be executed on them.on them.

– Object Relational DatabasesObject Relational Databases: relational systems to which object oriented : relational systems to which object oriented functionsfunctions are added. They allow data to be manipulated in the form of objects, are added. They allow data to be manipulated in the form of objects, as well as providing the traditional relational interface.as well as providing the traditional relational interface.

– Distributed DatabasesDistributed Databases: data are stored on two or more computers, called nodes, : data are stored on two or more computers, called nodes, and that these nodes are connected over a networkand that these nodes are connected over a network across a country, continent across a country, continent or planet.or planet.

– Multimedia DatabasesMultimedia Databases: model for storing several different types of file i.e. text, : model for storing several different types of file i.e. text, audio, video and images in a single database.audio, video and images in a single database.

– Network DatabasesNetwork Databases: organizes data in a network of linked records. A very early : organizes data in a network of linked records. A very early form of database, fast but not very adaptable, which is little used at present.form of database, fast but not very adaptable, which is little used at present.

– Hierarchical DatabasesHierarchical Databases: data are stored as records, linked with Parent-Child : data are stored as records, linked with Parent-Child RelationshipsRelationships.. Mostly used in the past on mainframesMostly used in the past on mainframes..

Page 66: Distributed Computing and Analysis Lamberto Luminari Italo – Hellenic School of Physics 2004 Martignano - May 20, 2004

Italo - Hellenic School of Physics 2004Italo - Hellenic School of Physics 2004 Lamberto LuminariLamberto Luminari 6666

Relational Database SystemsRelational Database Systems The Relational Model is one of the oldest models used for creating a The Relational Model is one of the oldest models used for creating a

database, and the one that is used by the majority of businesses today. It database, and the one that is used by the majority of businesses today. It was first outlined in a paper published by Ted Codd in 1970. The relational was first outlined in a paper published by Ted Codd in 1970. The relational model is based on model is based on Set TheorySet Theory and and Predicate LogicPredicate Logic: : – set theory allows data to be structured in a series of tables, which have columns set theory allows data to be structured in a series of tables, which have columns

representing the variables and rows that contain specific instances of data. These representing the variables and rows that contain specific instances of data. These tables are organized usingtables are organized using normalization, which is a process (derived from normalization, which is a process (derived from Normal FormsNormal Forms theory) of reducing the occurrences of repeated data by breaking theory) of reducing the occurrences of repeated data by breaking it into smaller pieces and creating new tables (e.g., personal data of a customer).it into smaller pieces and creating new tables (e.g., personal data of a customer).

– predicate logic is the basis of the predicate logic is the basis of the query languagequery language, i.e. the set of commands that , i.e. the set of commands that allows to insert, retrieve, modify or delete data, according to some specified allows to insert, retrieve, modify or delete data, according to some specified criteria.criteria. Data can also be virtually or effectively joined in new tables.Data can also be virtually or effectively joined in new tables.

The current standard for relational databases is set out in the Structured The current standard for relational databases is set out in the Structured Query Language. Version 2 of the language is currently in use with Version Query Language. Version 2 of the language is currently in use with Version 3 expected to be released in the near future by the International Standards 3 expected to be released in the near future by the International Standards Organization (ISO) and American National Standards Institution (ANSI). Organization (ISO) and American National Standards Institution (ANSI). – The most widely used relational database systems are produced by Oracle The most widely used relational database systems are produced by Oracle

Corporation, Microsoft, Sybase, IBM, but there is a large number of other RDBMS Corporation, Microsoft, Sybase, IBM, but there is a large number of other RDBMS designed to be either a general system or for specific applications used in HEP, designed to be either a general system or for specific applications used in HEP, like MySQL and PostgreSQL. like MySQL and PostgreSQL.

Page 67: Distributed Computing and Analysis Lamberto Luminari Italo – Hellenic School of Physics 2004 Martignano - May 20, 2004

Italo - Hellenic School of Physics 2004Italo - Hellenic School of Physics 2004 Lamberto LuminariLamberto Luminari 6767

Object Oriented Database Object Oriented Database SystemsSystems

The ODBMS’s were introduced to overcome many restrictions imposed by The ODBMS’s were introduced to overcome many restrictions imposed by the relational model on certain types of data (mainly in case of huge the relational model on certain types of data (mainly in case of huge amounts or complex structures). Its main advantage is the degree of low amounts or complex structures). Its main advantage is the degree of low level control of the system it allows the programmer. This gives the level control of the system it allows the programmer. This gives the programmer control of how the data is to be store and manipulated: programmer control of how the data is to be store and manipulated: – information is stored as a persistent object (and not as a row in a table). This information is stored as a persistent object (and not as a row in a table). This

makes it more efficient in terms of storage space requirements and ensures that makes it more efficient in terms of storage space requirements and ensures that users can only manipulate data in the ways the programmer has specified. It also users can only manipulate data in the ways the programmer has specified. It also saves on the disk space needed for queries, as instead of having to allocate saves on the disk space needed for queries, as instead of having to allocate resources for the results, the space required is already there in the objects resources for the results, the space required is already there in the objects themselves.themselves.

Because of the specific low level methods used in a ODBMS, it is very difficult Because of the specific low level methods used in a ODBMS, it is very difficult for third parties to produce add-on products. Whilst relational databases can for third parties to produce add-on products. Whilst relational databases can benefit from software which has been produced by other vendors, users of benefit from software which has been produced by other vendors, users of ODBMS's either have to produce additional software in house, by contracting ODBMS's either have to produce additional software in house, by contracting other firms or in collaboration with other organizations using the same other firms or in collaboration with other organizations using the same system. system. – The first commercially available object oriented DBMS became available in the The first commercially available object oriented DBMS became available in the

mid-1980's. By the early 1990's there were a range of ODBMS's available from a mid-1980's. By the early 1990's there were a range of ODBMS's available from a variety of vendors.variety of vendors. Objectivity/DBObjectivity/DB is the most widely used in HEP community.is the most widely used in HEP community.

Page 68: Distributed Computing and Analysis Lamberto Luminari Italo – Hellenic School of Physics 2004 Martignano - May 20, 2004

Italo - Hellenic School of Physics 2004Italo - Hellenic School of Physics 2004 Lamberto LuminariLamberto Luminari 6868

Distributed Database SystemsDistributed Database Systems

Distributed databases have the common characteristics that they Distributed databases have the common characteristics that they are stored on two or more computers, called nodes, connected over are stored on two or more computers, called nodes, connected over a network. They are classified as a network. They are classified as homogeneoushomogeneous and and heterogeneousheterogeneous : : – homogeneous databaseshomogeneous databases: use the same DBMS software and have the same : use the same DBMS software and have the same

applications on each node. They have a common schema (a file specifying the applications on each node. They have a common schema (a file specifying the structure of the database), and can have varying degrees of local autonomy. structure of the database), and can have varying degrees of local autonomy. They can be based on any DBMS which supports this function, but it is not They can be based on any DBMS which supports this function, but it is not possible to have more than one DBMS type in the system. To be efficient, they possible to have more than one DBMS type in the system. To be efficient, they have to have very large network connections and a lot of processing power.have to have very large network connections and a lot of processing power.

– heterogeneous databasesheterogeneous databases: have a very high degree of local autonomy. Each : have a very high degree of local autonomy. Each node in the system has its own local users, applications and data and dealing node in the system has its own local users, applications and data and dealing with them itself, and only connects to other nodes for information it does not with them itself, and only connects to other nodes for information it does not have. This type of distributed database is often just called a have. This type of distributed database is often just called a federated systemfederated system or a or a federationfederation. It is becoming more popular with organizations, both for its . It is becoming more popular with organizations, both for its scalability and the reduced cost in being able to add extra nodes when scalability and the reduced cost in being able to add extra nodes when necessary and the ability to mix software packages. Unlike the homogenous necessary and the ability to mix software packages. Unlike the homogenous systems, heterogeneous systems can include different database management systems, heterogeneous systems can include different database management systems in the system. This makes them appealing to organizations since they systems in the system. This makes them appealing to organizations since they

can incorporate legacy systems and data into new systems.can incorporate legacy systems and data into new systems.

Page 69: Distributed Computing and Analysis Lamberto Luminari Italo – Hellenic School of Physics 2004 Martignano - May 20, 2004

Italo - Hellenic School of Physics 2004Italo - Hellenic School of Physics 2004 Lamberto LuminariLamberto Luminari 6969

Beyond standard Database SystemsBeyond standard Database Systems

Page 70: Distributed Computing and Analysis Lamberto Luminari Italo – Hellenic School of Physics 2004 Martignano - May 20, 2004

Italo - Hellenic School of Physics 2004Italo - Hellenic School of Physics 2004 Lamberto LuminariLamberto Luminari 7070

Page 71: Distributed Computing and Analysis Lamberto Luminari Italo – Hellenic School of Physics 2004 Martignano - May 20, 2004

Italo - Hellenic School of Physics 2004Italo - Hellenic School of Physics 2004 Lamberto LuminariLamberto Luminari 7171

Page 72: Distributed Computing and Analysis Lamberto Luminari Italo – Hellenic School of Physics 2004 Martignano - May 20, 2004

Italo - Hellenic School of Physics 2004Italo - Hellenic School of Physics 2004 Lamberto LuminariLamberto Luminari 7272

Page 73: Distributed Computing and Analysis Lamberto Luminari Italo – Hellenic School of Physics 2004 Martignano - May 20, 2004

Distributed AnalysisDistributed Analysis

Page 74: Distributed Computing and Analysis Lamberto Luminari Italo – Hellenic School of Physics 2004 Martignano - May 20, 2004

Italo - Hellenic School of Physics 2004Italo - Hellenic School of Physics 2004 Lamberto LuminariLamberto Luminari 7474

Distributed AnalysisDistributed Analysis

Within LCG a working group, with representatives from all Within LCG a working group, with representatives from all LHC experiments is working on a blueprint architecture for LHC experiments is working on a blueprint architecture for grid services: grid services: ARDAARDA ( (AA RRoadmap to oadmap to DDistributed istributed AAnalysis)nalysis). . This will serve as a first input to the EGEE Architecture team. This will serve as a first input to the EGEE Architecture team. The HEPCAL work is continuing in the framework of the The HEPCAL work is continuing in the framework of the LCG/GAG (Grid Applications Group), developing use cases LCG/GAG (Grid Applications Group), developing use cases and requirements for the analysis of physics data. This will and requirements for the analysis of physics data. This will also give important input to architecture and design work.also give important input to architecture and design work.

GAG reports:GAG reports:– HepcalHepcal

Systematic descriptions of HEP Grid Use CasesSystematic descriptions of HEP Grid Use Cases CERN-LCG-2002-020 (29 May 2002) CERN-LCG-2002-020 (29 May 2002)

lcg.web.cern.ch/LCG/sc2/RTAG4/finalreport.doclcg.web.cern.ch/LCG/sc2/RTAG4/finalreport.doc Hepcal-prime: Hepcal-prime: cern.ch/fca/HEPCAL-prime.doccern.ch/fca/HEPCAL-prime.doc

– Hepcal 2Hepcal 2 Analysis Use CasesAnalysis Use Cases CERN-LCG-2003-032 (29 October 2003) CERN-LCG-2003-032 (29 October 2003)

lcg.web.cern.ch/LCG/SC2/GAG/HEPCAL-II.doclcg.web.cern.ch/LCG/SC2/GAG/HEPCAL-II.doc

Page 75: Distributed Computing and Analysis Lamberto Luminari Italo – Hellenic School of Physics 2004 Martignano - May 20, 2004

Italo - Hellenic School of Physics 2004Italo - Hellenic School of Physics 2004 Lamberto LuminariLamberto Luminari 7575

ARDA working group mandateARDA working group mandate

To review the current Distributed Analysis activities and to To review the current Distributed Analysis activities and to capture their architectures in a consistent waycapture their architectures in a consistent way

To confront these existing projects to the HEPCAL II use cases To confront these existing projects to the HEPCAL II use cases and the user's potential work environments in order to explore and the user's potential work environments in order to explore potential shortcomings.potential shortcomings.

To consider the interfaces between Grid, LCG and experiment To consider the interfaces between Grid, LCG and experiment specific servicesspecific services– Review the functionality of experiment-specific packages, state of Review the functionality of experiment-specific packages, state of

advancement and role in the experimentadvancement and role in the experiment

– Identify similar functionalities in the different packagesIdentify similar functionalities in the different packages

– Identify functionalities and components that could be integrated in the Identify functionalities and components that could be integrated in the generic GRID middlewaregeneric GRID middleware

To confront the current projects with critical GRID areasTo confront the current projects with critical GRID areas To develop a roadmap specifying wherever possible the To develop a roadmap specifying wherever possible the

architecture, the components and potential sources of architecture, the components and potential sources of deliverables to guide the medium term (2 year) work of the LCG deliverables to guide the medium term (2 year) work of the LCG and the DA planning in the experiments.and the DA planning in the experiments.

Page 76: Distributed Computing and Analysis Lamberto Luminari Italo – Hellenic School of Physics 2004 Martignano - May 20, 2004

Italo - Hellenic School of Physics 2004Italo - Hellenic School of Physics 2004 Lamberto LuminariLamberto Luminari 7676

ARDA ArchitectureARDA Architecture

Page 77: Distributed Computing and Analysis Lamberto Luminari Italo – Hellenic School of Physics 2004 Martignano - May 20, 2004

Italo - Hellenic School of Physics 2004Italo - Hellenic School of Physics 2004 Lamberto LuminariLamberto Luminari 7777

SEAL OverviewSEAL Overview

SEAL aims toSEAL aims to Provide the software infrastructure, basic frameworks, libraries and Provide the software infrastructure, basic frameworks, libraries and

tools that are common among the LHC experimentstools that are common among the LHC experiments Select, integrate, develop and support foundation and utility class Select, integrate, develop and support foundation and utility class

librarieslibraries Develop a coherent set of basic framework services to facilitate the Develop a coherent set of basic framework services to facilitate the

integration of LCG and non - LCG softwareintegration of LCG and non - LCG software

The scope of the SEAL project is basically the scope of the The scope of the SEAL project is basically the scope of the LCG Applications Area.LCG Applications Area.

Shared Environment for Applications at LHC

Page 78: Distributed Computing and Analysis Lamberto Luminari Italo – Hellenic School of Physics 2004 Martignano - May 20, 2004

Italo - Hellenic School of Physics 2004Italo - Hellenic School of Physics 2004 Lamberto LuminariLamberto Luminari 7878

PROOF (Parallel ROOT PROOF (Parallel ROOT Facility)Facility)

Collaboration between core ROOT group Collaboration between core ROOT group at CERN and MIT Heavy Ion Groupat CERN and MIT Heavy Ion Group

Part of and based on ROOT framework– Uses heavily ROOT networking and other

infrastructure classes

Currently no external technologies

The PROOF system allows:The PROOF system allows:– parallel analysis of trees in a set of filesparallel analysis of trees in a set of files– parallel analysis of objects in a set of filesparallel analysis of objects in a set of files– parallel execution of scriptsparallel execution of scripts

on a cluster of heterogeneous machineson a cluster of heterogeneous machines

Page 79: Distributed Computing and Analysis Lamberto Luminari Italo – Hellenic School of Physics 2004 Martignano - May 20, 2004

Italo - Hellenic School of Physics 2004Italo - Hellenic School of Physics 2004 Lamberto LuminariLamberto Luminari 7979

Page 80: Distributed Computing and Analysis Lamberto Luminari Italo – Hellenic School of Physics 2004 Martignano - May 20, 2004

Italo - Hellenic School of Physics 2004Italo - Hellenic School of Physics 2004 Lamberto LuminariLamberto Luminari 8080

Page 81: Distributed Computing and Analysis Lamberto Luminari Italo – Hellenic School of Physics 2004 Martignano - May 20, 2004

Italo - Hellenic School of Physics 2004Italo - Hellenic School of Physics 2004 Lamberto LuminariLamberto Luminari 8181

Page 82: Distributed Computing and Analysis Lamberto Luminari Italo – Hellenic School of Physics 2004 Martignano - May 20, 2004

Italo - Hellenic School of Physics 2004Italo - Hellenic School of Physics 2004 Lamberto LuminariLamberto Luminari 8282

Useful linksUseful links

Projects Projects – EDG (European Data Grid): EDG (European Data Grid): http://eu-datagrid.web.cern.ch/eu-datagrid/http://eu-datagrid.web.cern.ch/eu-datagrid/– GGF (Global Grid Forum): GGF (Global Grid Forum): http://www.gridforum.org/ http://www.gridforum.org/ – Globus: Globus: http://www.globus.org/http://www.globus.org/– LCG (LCG (LHC Computing Grid): LHC Computing Grid): http://lcg.web.cern.ch/LCG/http://lcg.web.cern.ch/LCG/– Pool (Pool (Pool Of persistent Objects for LHC): Pool Of persistent Objects for LHC): http://pool.cern.ch/http://pool.cern.ch/