dapsys tutorial: lcg-2 overview – sep 19th, 2004 -1 the lhc grid egee is funded by the european...

80
DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -1 The LHC Grid EGEE is funded by the European Union under contract IST- 2003-508833 Peter Kacsuk MTA SZTAKI

Post on 19-Dec-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -1

The LHC Grid

EGEE is funded by the European Union under contract IST-2003-508833

Peter Kacsuk

MTA SZTAKI

DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -2

Acknowledgement

• This tutorial is based on the work of many people:• Fabrizio Gagliardi, Flavia Donno and Peter Kunszt (CERN)• the EDG developer team• the EDG training team • the NeSC training team • the SZTAKI training team

DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -3

What is LHC Grid?

• LHC stands for Large Hadron Collider to be built by CERN

http://lhc-new-homepage.web.cern.ch/lhc-new-homepage/

• The LHC will be put in operation in 2007 with many experiments collecting 5-6 PetaB data per year

• The LHC Grid was built by CERN in order to provide storage and computing capacity for the process of this huge data set

• The LHC Grid current version is called LCG-2• It was built based on the sw developed by the European

DataGrid project and by the Gryphin US project• Now LCG-2 is the first EGEE infrastructure

DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -4

What is LHC Grid?• The first EGEE infrastructure - Largest functioning Grid:

• more than 70 sites, over 5,000 CPUs, 4,000 TB• 5,000 jobs simultaneously

DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -5

What is EGEE ? (I)

• EGEE (Enabling Grids for Escience in Europe) is a seamless Grid infrastructure for the support of scientific research, which:

• Integrates current national, regional and thematic Grid efforts, especially in HEP (High Energy Physics)

• Provides researchers in academia and industry with round-the-clock access to major computing resources, independent of geographic location

Applications

Geant network

Grid infrastructure

DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -6

What is EGEE ? (II)

• 70 leading institutions in 27 countries, federated in regional Grids

• 32 M Euros EU funding (2004-5), O(100 M) total budget

• Aiming for a combined capacity of over 20’000 CPUs (the largest international Grid infrastructure ever assembled)

• ~ 300 dedicated staff

DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -7

EGEE Community

DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -8

EGEE infrastructure

• Access to networking services provided by GEANT and the NRENs

• Production Service:• in place (based on HEP LCG-2) • for production applications• MUST run reliably, runs only proven

stable, debugged middleware and services

• Will continue adding new sites in EGEE federations

• Pre-production Service:• For middleware re-engineering

• Certification and Training/Demo testbeds

DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -9

What do we expect from the Grid?

• Access to a world-wide virtual computing laboratory with almost infinite resources

• Possibility to organize distributed scientific communities in VOs

• Transparent access to distributed data and easy workload management

• Easy to use application interfaces

DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -10

What are the characteristics of a Grid system?

Numerous Resources

Ownership by MutuallyDistrustful

Organizations & Individuals

Potentially FaultyResources

Different SecurityRequirements

& Policies Required

Resources areHeterogeneous

GeographicallySeparated

Different ResourceManagementPolicies

Connected byHeterogeneous, Multi-Level Networks

DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -11

The LCG-2 Architecture

Collective ServicesCollective Services

Information & MonitoringInformation

& MonitoringReplica

ManagerReplica

ManagerGrid

SchedulerGrid

Scheduler

Local ApplicationLocal Application Local DatabaseLocal Database

Underlying Grid ServicesUnderlying Grid Services

Computing Element Services

Computing Element Services

Authorization Authentication & Accounting

Authorization Authentication & Accounting

Replica CatalogReplica Catalog

Storage Element Services

Storage Element Services

Database Services

Database Services

Fabric servicesFabric services

ConfigurationManagement

ConfigurationManagement

Node Installation &Management

Node Installation &Management

Monitoringand Fault Tolerance

Monitoringand Fault Tolerance

Resource Management

Resource Management

Fabric StorageManagement

Fabric StorageManagement

Grid

Fabric

Local Computing

Grid Grid Application LayerGrid Application Layer

Data Management

Data Management

Job Management

Job Management

Metadata Management

Metadata Management

Logging & Book-

keeping

Logging & Book-

keeping

DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -12

Main Logical Machine Types (Services) in LCG-2

• User Interface (UI)

• Information Service (IS)

• Computing Element (CE)• Frontend Node• Worker Nodes (WN)

• Storage Element (SE)

• Replica Catalog (RC,RLS)

• Resource Broker (RB)

DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -13

User Interface

• The initial point of access to the LCG-2 Grid is the User Interface• This is a machine where

• LCG users have a personal account• The user’s certificate is installed

• The UI is the gateway to Grid services• It provides a Command Line Interface to perform the following basic

Grid operations:• submit a job for execution on a Computing Element;• list all the resources suitable to execute a given job;• replicate and copy files;• cancel one or more jobs;• retrieve the output of one or more finished jobs;• show the status of one or more submitted jobs.

• One or more UIs are available at each site part of the LCG-2 Grid

DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -14

Main Logical Machine Types (Services) in LCG-2

• User Interface (UI)

• Information Service (IS)

• Computing Element (CE)• Frontend Node• Worker Nodes (WN)

• Storage Element (SE)

• Replica Catalog (RC,RLS)

• Resource Broker (RB)

DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -15

Computing Element (CE)

• Defined as a Grid batch Queue and identified by a pair<hostname>:<port>/<batch queue name>

• Several queues defined for the same hostname are considered different CEs. For example:

adc0015.cern.ch:2119/jobmanager-lcgpbs-longadc0015.cern.ch:2119/jobmanager-lcgpbs-short

• A Computing Element is built on a homogeneous farm of computing nodes (called Worker Nodes)

• One node acts as a Grid Gate (GG) or front-end to the Grid and runs:• a Globus gatekeeper • the Globus GRAM (Globus Resource Allocation Manager) • the master server of a Local Resource Management System that can be:

• PBS, LSF or Condor • a local Logging and Bookkeeping server

• Each LCG-2 site runs at least one CE and a farm of WNs behind it.

DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -16

Computing Element

CPU:PIIIRAM:0.5GB

OS:Linux

Grid Gatenode

gatekeeper

infoServic

e

CPU:PIIIRAM:0.5GB

OS:Linux

CPU:PIVRAM:2GBOS:Linux

CPU:PIVRAM:2GBOS:Linux

Batch server

in the examplethe red queue is assignedfor two hosts

Computing Element: entrypoint into a queue of a batch system information associated with

a computing element is limited only to information relevant to the queue

Resource details relates to the system

DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -17

Main Logical Machine Types (Services) in LCG-2

• User Interface (UI)

• Information Service (IS)

• Computing Element (CE)• Frontend Node• Worker Nodes (WN)

• Storage Element (SE)

• Replica Catalog (RC,RLS)

• Resource Broker (RB)

DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -18

Storage Element (SE)

• A Storage Element (SE) provides uniform access and services to large storage spaces.

• Each site includes at least one SE

• They use two protocols:• GSIFTP for file transfer• Remote File Input/Output (RFIO) for file access

DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -19

Storage Resource Management (SRM)

Data are stored on disk pool servers or Mass Storage Systemsstorage resource management needs to take into account

Transparent access to files (migration to/from disk pool)

Space reservation

File status notification

Life time management

SRM (Storage Resource Manager) takes care of all these details

SRM is a Grid Service that takes care of local storage interaction and provides a Grid interface to outside world

DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -20

Storage Resource Management

Support for local policyEach storage resource can be managed independentlyInternal priorities are not sacrificed by data movement between Grid agents

Disk and tape resources are presented as a single element

Reservation on demand and advance reservationSpace can be reserved for registering a new filePlan the storage system usage

File status and estimates for planningProvides info on file statusProvide estimates on space availability/usage

DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -21

A Simple Configuration

User InterfaceResource BrokerReplica CatalogInformation Service

Storage Element 1

Storage Element 2

Computing Element 1

Computing Element 2

“CLOSE”“CLOSE”

“CLOSE”“CLOSE”

DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -22

LCG-2 local configuration

SZTAKI’s LCG-2 system

GRID GATE

n31.hpcc.sztaki.hu

-User Interface

-Computing Element

-Storage Element (69GB)

-Resource Broker

-ReplicaManager

(512MB,Intel Pentium4 2.53GHz)

n27.hpcc.sztaki.hu

(128MB,Genuine Intel PentiumIII Dual Proc. 2x500MHz)

(128MB,Genuine Intel PentiumIII Dual Proc. 2x500MHz)

n28.hpcc.sztaki.huWorkernode #1

Default:Workernode#2

DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -23

Main Logical Machine Types (Services) in LCG-2

• User Interface (UI)

• Information Service (IS)

• Computing Element (CE)• Frontend Node• Worker Nodes (WN)

• Storage Element (SE)

• Replica Catalog (RC,RLS)

• Resource Broker (RB)

DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -24

Information System (IS)

• The Information System (IS) provides information about the LCG-2 Grid resources and their status

• The current IS is based on LDAP: a directory service infrastructure which is a specialized database optimized for

• reading, • browsing and • searching information.

• the LDAP schema used in LCG2 implements the GLUE (Grid Laboratory for a Uniform Environment) Schema

DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -25

How to store Information?

• The LDAP information model is based on entries. • An entry usually describes an object such as a

• person, • a computer, • a server, and so on.

• Each entry contains one or more attributes that describe the entry. • Each attribute has a type and one or more values. • Each entry has a name called a Distinguished Name (DN) that

uniquely identifies it. • A DN is formed by a sequence of attributes and values.• Example: The DN of a particular CE entry would be:

• an attribute identifying the site (site_ID=cern) and • an attribute identifying the CE (CE_ID=lxn1102.cern.ch), • so the complete DN would be: CE_ID=lxn1102.cern.ch,site_ID=cern.

DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -26

The Directory Information Tree

• Based on their DNs, the entries can be arranged into a hierarchical tree-like structure.

• This tree of directory entries is called the Directory Information Tree (DIT).

DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -27

Information System (IS)

• The IS is a hierarchical system with 3 levels from bottom up:• GRIS (Grid Resource Information Servers) level (CE and SE level)• Grid Index Information Server (GIIS) level (site level)• Top, centralized level (Grid level)

• the Globus Monitoring and Discovery Service (MDS) mechanism has been adopted at the GRIS level

• The other two levels use the Berkeley DB Information Index (BDII) mechanism

DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -28

LCG-2 hierarchical Info system

BDII: Berkley DB Information Index

GIIS: Grid Index Information Server

GRIS: Grid Resource Information

Server

CE: Computing Element

SE: Storage Element

DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -29

How to collect and store information?

• All services are allowed to enter information into the IS• The BDII at the top

• queries every GIIS in every 2 min and • acts as a cache storing information about the Grid status in its

LDAP database• The BDII at the GIIS

• collects info from every GRIS in every 2 min and • acts as a cache storing information about the site status in its

LDAP database

• The GRIS updates information according to the MDS protocol

DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -30

How to obtain Information?

• All users can browse the catalogues• To obtain the information the client should:

• Ask BDII about possible GIIS/GRIS• Directly query GIIS/GRIS• Or use BDII cache

• The IS scales to ~1000 sites (MDS much less: ~100)

DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -31

Main Logical Machine Types (Services) in LCG-2

• User Interface (UI)

• Information Service (IS)

• Computing Element (CE)• Frontend Node• Worker Nodes (WN)

• Storage Element (SE)

• Replica Catalog (RC,RLS)

• Resource Broker (RB)

DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -32

Data Management

• The Data Management services are provided by• the Replica Management System (RMS) of EDG• and the LCG Data Management client tools

• In LCG, the data files are replicated:• on a temporary basis, • to many different sites depending on • where the data is needed.

• The users or applications do not need to know where the data is located, they use logical files names

• the Data Management services are responsible for locating and accessing the data.

DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -33

File Management Motivation

Site A

Storage Element A Storage Element B

Site B

File BFile A File X

File Y File BFile A File C

File D

Replica Catalog: Map Logical to Site files

File Transfer

Replica Manager:‘atomic’ replication operationsingle client interfaceorchestrator

Pre- Post-processing: Prepare files for transferValidate files after transfer

Replica Selection: Get ‘best’ file

Replication Automation:

Data Source subscription

Load balancing: Replicate based on usageMetadata:

LFN metadataTransaction informationAccess patterns

DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -34

Data Management Tools

• Tools for• Locating data• Copying data• Managing and replicating data• Meta Data management

• In LCG-2 you have• Replica Manager (RM)• Replica Location Service (RLS)• Replica Metadata Catalog (RMC)

RMRLS

RMC

DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -35

StorageElement

Replication Services: Basic Functionality

Replica ManagerReplica Location

Service

Replica Metadata Catalog

StorageElement

Files have replicas stored at many Grid sites on Storage Elements.

Each file has a unique Grid ID.Locations corresponding to the GUID are kept in the Replica Location Service.

Users may assign aliases to the GUIDs. These are kept in the Replica Metadata Catalog.

The Replica Manager provides atomicity for file operations, assuring consistency of SE and catalog contents.

DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -36

StorageElement

Interactions with other Grid components

Replica ManagerReplica Location

Service

Replica Metadata Catalog

Information Service

Resource Broker

User Interface orWorker Node

StorageElement

Virtual OrganizationMembership Service

Applications and users interface to data through the Replica Manager either directly or through the Resource Broker. Management calls should never go directly to the SRM.

DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -37

Replica Managerclient

SRM

ReplicaCatalog

Storage

2

3

4

1

1. The Client asks a catalog to provide the location of a file2. The catalog responds with the name of an SRM3. The client asks the SRM for the file4. The SRM asks the storage system to provide the file5. The storage system sends the file to the client through the SRM or 6. directly

56

6

Simplified Interaction Replica Manager – Storage Resource Manager

DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -38

Replica Manager (RM)

• High level data management on the Grid, takes care of:• Location of data• Replication of data• Efficient access to data

• Hides the SRM (Storage Resource Manager):• User cannot access directly the SRM, only through the RM

• Coordinates the use of• Replica Location Service• Replica Metadata Catalog

DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -39

Interaction of the Replica Manager (RM) with other Grid services

• The RM presents a single interface to the user or other services• Some of the RM functionalities have been replaced by a new, faster

interface: the LCG Data Management client tools.

DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -40

File References and Replica Catalogs

• The files in the Grid are referenced by different names: • Grid Unique IDentifier (GUID)• Logical File Name (LFN)• Storage URL (SURL)• Transport URL (TURL).

• the GUID or LFN refer to files and not replicas, and say nothing about locations

• the SURLs and TURLs give information about where a physical replica is located.

RMC : Replica Metadata Catalog

LRC : Local Replica Catalog

DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -41

Abstract file names

• GUID• A file can always be identified by its GUID• GUID is assigned at data registration time• GUID is based on the UUID standard to guarantee unique IDs• A GUID is of the form: guid:<unique string>• All the replicas of a file will share the same GUID

• LFN• In order to locate a Grid accessible file, the human user will normally use a

LFN• LFNs are human-readable strings, they are allocated by the user as GUID

aliases• LFN’s form is: lfn:<any alias>

DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -42

Physical file names

• SURL• used by the RMS to find where a replica is physically stored and by the SE

to locate the file• SURLs are of the form: sfn:<SE hostname>/<local string>• where <local string> is used internally by the SE to locate the file.

• TURL• TURL gives the necessary information to retrieve a physical replica,

including • hostname

• path

• protocol

• port (as any conventional URL);

DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -43

Replica Location Service (RLS)

• RLS maintains information about the physical location of the replicas (mapping with the GUIDs).

• It is composed of several Local Replica Catalogs (LRCs) which hold the information of replicas for a single VO.

DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -44

Replica Metadata Catalog (RMC)

• The RMC stores the mapping between GUIDs and the respective aliases (LFNs)

• Maintains other metada information (sizes, dates, ownerships. . . )

DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -45

User Interfaces for Data Management

• Users are mainly referred to use the interface of the Replica Manager client:

• Management commands• Catalog commands• File Transfer commands

• The services RLS and RMC provide additional user interfaces

• Mainly for additional catalog operations• Additional server administration commands

• Should mainly be used by administrators

• Can also be used to check the availability of a service

RMRLS

RMC

DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -46

The Replica Manager Interface –Management Commands

• copyAndRegisterFile args: source, dest, lfn, protocol, streams

• Copy a file into grid-aware storage and register the copy in the Replica Catalog as an atomic operation.

• replicateFile args: source/lfn, dest, protocol, streams• Replicate a file between grid-aware stores and register the replica in

the Replica Catalog as an atomic operation.

• deleteFile args: source/seHost, all  • Delete a file from storage and unregister it.

DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -47

The Replica Manager Interface –Catalog Commands (1)

• registerFile args: source, lfn• Register a file in the Replica Catalog that is already stored

on a Storage Element.

• unregisterFile args: source, guid• Unregister a file from the Replica Catalog.

• listReplicas args: lfn/surl/guid• List all replicas of a file.

• registerGUID args: surl, guid• Register an SURL with a known GUID in the Replica

Catalog.

• listGUID args: lfn/surl• Print the GUID associated with an LFN or SURL.

DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -48

The Replica Manager Interface –Catalog Commands (2)

• addAlias args: guid, lfn• Add a new alias to GUID mapping

• removeAlias args: guid, lfn• Remove an alias LFN from a known GUID.

• printInfo()• Print the information needed by the Replica

Manager to screen or to a file.

• getVersion()• Get the versions of the replica manager client.

DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -49

The Replica Manager Interface –File Transfer Commands

• copyFile args: source, dest • Copy a file to a non-grid destination.

• listDirectory args: dir • List the directory contents on an SRM or a GridFTP server.

DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -50

Main Logical Machine Types (Services) in LCG-2

• User Interface (UI)

• Information Service (IS)

• Computing Element (CE)• Frontend Node• Worker Nodes (WN)

• Storage Element (SE)

• Replica Catalog (RC,RLS)

• Resource Broker (RB)

DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -51

Job Management

• The user interacts with Grid via a Workload Management System (WMS)

• The Goal of WMS is the distributed scheduling and resource management in a Grid environment.

• What does it allow Grid users to do?• To submit their jobs• To execute them on the “best resources”

• The WMS tries to optimize the usage of resources

• To get information about their status• To retrieve their output

DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -52

WMS Components

• WMS is currently composed of the following parts:1. Workload Manager, which is the core component of the system

2. Match-Maker (also called Resource Broker), whose duty is finding the best resource matching the requirements of a job (match-making process).

3. Job Adapter, which prepares the environment for the job and its final description, before passing it to the Job Control Service.

4. Job Control Service (JCS), which finally performs the actual job management operations (job submission, removal. . .)

5. Logging and Bookkeeping services (LB) : store Job Info available for users to query

DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -53

• Information to be specified• Job characteristics• Requirements and Preferences of the computing

system• Software dependencies

• Job Data requirements • Specified using a Job Description Language

(JDL)

Job Preparation:Let’s think the way the Grid thinks!

DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -54

Job Flow

• a. the user logs to the UI machine and creates a proxy certificate that authenticates her in every secure interaction, and has a limited lifetime.

DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -55

Job Flow

• b. The user submits the job from the UI to the WMS.• The user can specify in the JDF one or more files to be copied from the UI to the RB node; this set of files is called Input Sandbox.

The event is logged in the LB and the status of the job is SUBMITTED.

DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -56

Job Flow

• c. The WMS, and in particular the Match-Maker component, looks for the best available CE to execute the job. The Match-Maker interrogates the BDII to query the status of CEs and SEs, and the RLS to find location of data. The event is logged in the LB and the status of the job is WAITING.

DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -57

Job Flow

• d. The WMS Job Adapter prepares the job for submission creating a wrapper script that is passed, together with other parameters, to the JCS for submission to the selected CE. The event is logged in the LB and the status of the job is READY.

DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -58

Job Flow

• e. The Globus Gatekeeper on the CE receives the request and sends the Job for execution to the LRMS (e.g. PBS, LSF or Condor). The event is logged in the LB and the status of the job is SCHEDULED.

DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -59

Job Flow

• f. The LRMS handles the job execution on the available local farm worker nodes. User’s files are copied from the RB to the WN where the job is executed. The event is logged in the LB and the status of the job is RUNNING.

DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -60

Job Flow

• f. While the job runs, Grid files can be accessed on a (close) SE using either the RFIO protocol or local access if the files are copied to the WN local filesystem. In order for the job to find out which is the close SE, or what is the result of the Match-Maker process, a file with this information is produced by the WMS and shipped together with the job to the WN. This is known as the .BrokerInfo file. Information can be retrieved from this file using the BrokerInfo CLI or the API library.

DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -61

Job Flow

• f. The job can produce new output data that can be uploaded to the Grid and made available for other Grid users to use. This can be achieved using the Data Management tools. Uploading a file to the Grid means • copying it on a Storage Element and • registering its location, metadata and attribute to the RMS.

• During job execution, data files can be replicated between two SEs using again the Data Management tools.

DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -62

Job Flow

• i. If the job reaches the end without errors, the output (not large data files, but just small output files specified by the user in the so called Output Sandbox) is transferred back to the RB node. The event is logged in the LB and the status of the job is DONE.

DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -63

Job Flow

• j. At this point, the user can retrieve the output of his/her job from the UI using the WMS CLI or API. The event is logged in the LB and the status of the job is CLEARED.

DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -64

Job Flow Status and Errors

• Status queries from the UI machine:• job status queries are addressed to the LB database. • Resource status queries are addressed to the BDII

• If the site where the job is being run falls down, the job will be automatically resent to another CE that is analogue to the previous one, w.r.t. requirements the user asked for.

• In the case that this new submission is disabled, the job will be marked as aborted.

• Users can get information about what happened by simply questioning the LB service.

DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -65

How do I login on the Grid ?

• Distribution of resources: secure access is a basic requirement

• secure communication • security across organisational boundaries • single “sign-on" for users of the Grid

• Two basic concepts:

• Authentication: Who am I?• “Equivalent” to a pass port, ID card etc.

• Authorisation: What can I do?• Certain permissions, duties etc.

DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -66

Security in the Grid

• In industry, several security standards exist:• Public Key Infrastructure (PKI)

• PKI keys

• SPKI keys (focus on authorisation rather than certificates)

• RSA

• Secure Socket Layer (SSL)• SSH keys

• Kerberos

• Need for a common security standard for Grid services• Above standards do not meet all Grid requirements (e.g. delegation,

single sign-on etc.)

• Grid community mainly uses X.509 PKI for the Internet• Well established and widely used (also for www, e-mail, etc.)

DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -67

PKI – Basic overview

• Public Key Infrastructure (also called asymmetric cryptography)• One primary advantage: it is generally easier than distributing

secret keys securely, as required in symmetric keys

ciphertext c = Ee(m)

m = Dd(c).

public key eprivate key d

encryption transformation Ee

decryption transformation Dd

wishing to send a message m to A:

applies the decryption transformation

Entity A (Alice) Entity B (Bob)

public key

private key

Uses A’s public key

Message direction

Uses own private key

DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -68

Digital Certificates

• How can B be sure that A’s public key is really A’s public key and not someone else’s?● A third party guarantees the correspondence between public key and

owner’s identity, by signing a document which contains the owner’s identity and his public key (Digital Certificate)

● Both A and B must trust this third party

• Two models:● X.509: hierarchical organization;

● PGP: “web of trust”.

DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -69

Involved entities

User

Certificate Authority

Public keyPrivate keycertificate

CA

Resource (site offering services)

DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -70

Certificate Request

Private Key encrypted on

local disk

CertRequestPublic Key

ID

Cert

User generatespublic/private

key pair.

User send public key to CA along

with proof of identity.

CA confirms identity, signs

certificate and sends back to user.

Signed public key.

DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -71

Grid Security Infrastructure (GSI)

• Globus ToolkitTM proposed and implements the Grid Security Infrastructure (GSI)

• Protocols and APIs to address Grid security needs

• GSI protocols extend standard public key protocols• Standards: X.509 & SSL/TLS• Extensions: X.509 Proxy Certificates (single sign-on) &

Delegation

• Proxy Certificate:• Short term, restricted certificate that is derived form a long-term

X.509 certificate• Signed by the normal end entity cert, or by another proxy• Allows a process to act on behalf of a user• Not encrypted and thus needs to be securely managed by file system

DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -72

Delegation

• Proxy creation can be recursive• each time a new private key and new X.509 proxy certificate, signed

by the original key

• Allows remote process to act on behalf of the user

• Avoids sending passwords or private keys across the network

• The proxy may be a “Restricted Proxy”: a proxy with a reduced set of privileges (e.g. cannot submit jobs).

DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -73

Virtual organisations

• An EGEE user must belong to a VO• A VO

• Controls access to specified CE, SE• Usually comprises geographically distributed people• Requires the ability to know who has done what, and who will not be

allowed to do it again…. Security.

• Current VO’s:• HEP communities, biology, astronomy,…

• VOMS: enhanced flexibility in VO management

DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -74

EGEE Pilot Applications (I)

High Energy Physics:

• Have been running large distributed computing systems for many years

• Now focus on computing for LHC hence LCG (LHC computing grid project)

• several current HEP experiments use grid technology (Babar,CDF, etc.)

• LHC experiments are currently executing large scale data challenges

DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -75

EGEE Pilot Applications (II)

• Biomedics• Bioinformatics (gene/proteome

databases distributions)• Medical applications (screening,

epidemiology, image databases distribution, etc.)

• Interactive application (human supervision or simulation)

• BioMed applications deployed and expect to run first job on LCG-2 by September

DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -76

Who else will benefit from EGEE?

• EGEE Generic Applications Advisory Panel:• 4 applications presented• 3 applications (comp. chemistry,

earth science, astro-particle) recommended for deployment with allocation of NA4 resources

• EU projects: GRACE, Mammogrid and Diligent asking for support

• Expression of interest: Planck/Gaia (astroparticle), SimDat (drug discovery)

DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -77

How to access EGEE (I)

0) Review information provided on the EGEE website:

www.eu-egee.org1) Establish contact with the EGEE

applications group lead by Vincent Breton ([email protected])

2) Provide information by completing a questionnaire describing your application

3) Applications selected based on • scientific criteria, • Grid added value, • effort involved in deployment, • resources consumed/contributed etc.

DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -78

How to access EGEE (II)

4) Follow a training session

5) Migrate application to EGEE infrastructure with the support of EGEE technical experts

6) Initial deployment for testing purposes

7) Production usage (contribute computing resources for heavy production demands)

DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -79

SZTAKI’s role in EGEE

• Education Centre of the Central European Region• Developing and providing various training courses• Provide user support for applications• See the web page: www.lpds.sztaki.hu/egee

• Leader of middleware deployment effort in the SEEGRID project

• Creating a regional EGEE Grid for South-East Europe

• Promote the establishment of the Hungarian EGEE Grid together with other members of MGKK:

• Creating a national EGEE Grid for Hungary• Extend the EGEE Grid with new layers:

• Mercury monitor www.lpds.sztaki.hu/mercury

• P-GRADE portal www.lpds.sztaki.hu/pgportal

DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -80

Conclusions

• EGEE is the first attempt to build a worldwide Grid infrastructure for data intensive applications from many scientific domains

• A large-scale production grid service, LCG-2 is already deployed and being used for HEP and BioMed applications

• Resources and user groups will rapidly expand during the course of the project

• A process has been established for migrating new applications to the EGEE infrastructure

• A training programme has been established with a number of events already held

• Prototype “next generation” grid middleware is being tested now

• See the on-line broker demo at: http://www.hep.ph.ic.ac.uk/~mp801/applet/