diligent project

24
D ilige n t A DI gital Library I nfrastructure on G rid EN abled T echnology DILIGENT Project Andrea Manzi ISTI-CNR, Pisa

Upload: brad

Post on 09-Jan-2016

43 views

Category:

Documents


1 download

DESCRIPTION

DILIGENT Project. Andrea Manzi ISTI-CNR, Pisa. Outline. Project Description Interaction with EGEE gLite DILIGENT Infrastructures gLite Experimentation Problem Using gLite Services DILIGENT Requirements Future plans. Project Description. Duration: 36 Months Start Date: Sept 2004 - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: DILIGENT Project

Di l i gentA DIgital Library Infrastructureon Grid ENabled Technology

DILIGENT Project

Andrea ManziISTI-CNR, Pisa

Page 2: DILIGENT Project

09/01/2006 NA4 Generic Application Meeting 2

Outline

Project Description

Interaction with EGEE

gLite DILIGENT Infrastructures

gLite Experimentation

Problem Using gLite Services

DILIGENT Requirements

Future plans

Page 3: DILIGENT Project

09/01/2006 NA4 Generic Application Meeting 3

Project Description

Duration: 36 MonthsStart Date: Sept 2004Person/Months: 1024Total Costs: 9.5 M € (6.3 M € from EU)

15%

24% 61%

Technological development

Validation Activities

Innovation Activities

Objective: Create a Digital Library Infrastructure that will allow members of dynamic virtual research organizations to create on-demand transient digital libraries based on shared computing, storage, multimedia, multi-type content, and application resources

Page 4: DILIGENT Project

09/01/2006 NA4 Generic Application Meeting 4

Participants

Italian National Research Coucil – ISTI (Italy, Scientific Co-ordinator) European Research Consortium for Informatics and Mathematics (France, Administrative Co- ordinator)

European Organization for Nuclear Research (Switzerland)

Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. – IPSI (Germany) University of Athens (Greece) University of Basel (Switzerland) University for Health Informatics and Technology Tyrol (Austria) University of Strathclyde (United Kingdom)

Engineering Ingegneria Informatica SpA (Italy) Fast Search & Transfer ASA (Norway) 4D SOFT Software Development Ltd. (Hungary)

European Space Agency – ESRIN (Italy) Scuola Normale Superiore (Italy) RAI Radio Televisione Italiana (Italy)

Page 5: DILIGENT Project

09/01/2006 NA4 Generic Application Meeting 5

DLCreation service

Service C

Service B

Service A

Service D

Service E

DILIGENT DL infrastructure

simulation

Speech recognition

Feature extraction

3D processing

ConsumersConsumers ProducersProducers

Implementation of Environmental Conventions

Research on Culture Heritage

Page 6: DILIGENT Project

09/01/2006 NA4 Generic Application Meeting 6

Interaction with EGEE

Coordination with EGEE

Technical interactions Technical interactions

9 technical meetings (mainly with JRA1)9 technical meetings (mainly with JRA1)

gLite mailing lists subscription:gLite mailing lists subscription: [email protected]@cern.ch [email protected]@cern.ch

1 training on “1 training on “Grid Technologies for Digital Grid Technologies for Digital Libraries”Libraries”

1 tutorial on “gLite Deployment”1 tutorial on “gLite Deployment”

Other interactionsOther interactions

4 EGEE conferences (Cork, The Hague, Athens, Pisa)4 EGEE conferences (Cork, The Hague, Athens, Pisa)

Page 7: DILIGENT Project

09/01/2006 NA4 Generic Application Meeting 7

Interaction with EGEE

Feedback to EGEE

On EGEE activitiesOn EGEE activities

gLite bugs submission (JRA1)gLite bugs submission (JRA1)

On DILIGENT projectOn DILIGENT project

statusstatus

access to EGEE prototype testbeds (JRA1)access to EGEE prototype testbeds (JRA1)

access to EGEE PPS testbed (SA1)access to EGEE PPS testbed (SA1)

grid related DL requirements (JRA1, NA4)grid related DL requirements (JRA1, NA4)

future plansfuture plans

Page 8: DILIGENT Project

09/01/2006 NA4 Generic Application Meeting 8

gLite DILIGENT Infrastructures

DILIGENT has 2 independent infrastructures (gLite v1.4)

Development infrastructureDevelopment infrastructureTesting infrastructureTesting infrastructure

Infrastructures are geographically distributed, linking 6 sites in Athens, Budapest, Darmstadt, Pisa, Innsbruck and Rome

Running gLite experimentationtests since July 2005

Page 9: DILIGENT Project

09/01/2006 NA4 Generic Application Meeting 9

Development Infrastructures

Page 10: DILIGENT Project

09/01/2006 NA4 Generic Application Meeting 10

Testing Infrastructure

Job ManagementServices

Data Management

Services

4DSOFT

InformationServices

CNR

SecurityServices

ENG

Page 11: DILIGENT Project

09/01/2006 NA4 Generic Application Meeting 11

gLite Experimentation

Goalstore/manage collections of objectsrun applications organized in DAGs store the application results for future usage

Tests plan Data Upload Job Submission Data transferData800K XML files of the Reuters corpus (from Aug96 to Aug97)ApplicationFeature extraction tool (JIRE Application)

Implementation of prototypes to test the feasibility of the proposed solutions

Page 12: DILIGENT Project

09/01/2006 NA4 Generic Application Meeting 12

gLite Experimentation – Data Upload

Two Mass Storage Systems (MSS) were tested: dCache and DPM

dCache:success rate: 69,06 % success rate: 69,06 % avg. rate: 16,18 s/fileavg. rate: 16,18 s/fileseveral problems!several problems!

DPM:success rate: 97,26 % success rate: 97,26 % avg. rate: 6,10 s/fileavg. rate: 6,10 s/file

UMIT UoA CNR FhG

0,00

5,00

10,00

15,00

20,00

25,00

30,00

Upload Rate

DPM

dCache

Page 13: DILIGENT Project

09/01/2006 NA4 Generic Application Meeting 13

gLite Experimentation – Job Submission

Jobs using dCache data MSS:

several problems!several problems!

Jobs using DPM data MSS:

success rate: 100%success rate: 100%

avg. rate: 5,77 s/fileavg. rate: 5,77 s/file

comparable comparable performance using 10 performance using 10 and 100 jobs due to the and 100 jobs due to the small number of small number of available worker nodesavailable worker nodes

12

1055,00

60,00

65,00

70,00

75,00

jobs

files

Execution Rate (dCache)

110

100

1000

100000,002,004,00

6,008,00

10,00

12,00

14,00

16,00

jobs

files

Execution Rate (DPM)

Page 14: DILIGENT Project

09/01/2006 NA4 Generic Application Meeting 14

gLite Experimentation

DILIGENT Vs PPS infras.

Data upload

similar results (for DPM)similar results (for DPM)

Job submission

similar resultssimilar results

DILIGENT dCache not DILIGENT dCache not considered (didn't work considered (didn't work with 1000 files)with 1000 files)

DIL (DPM)PPS (DPM)

100 jobs

10 jobs

1 job

0,00

1,00

2,00

3,00

4,00

5,00

6,00

7,00

8,00

Execution Rate - 1000 files

DIL(dCache)

DIL (DPM)PPS (DPM)

3 thread

1 thread0,00

5,00

10,00

15,00

20,00

Upload Rate

Page 15: DILIGENT Project

09/01/2006 NA4 Generic Application Meeting 15

Process ManagementProcess Management

gLite ExperimentationThe experimental

DILIGENT DL exploits gLite storing and processing on demand the stored products on the GRID. This allows to produce usable end-user manifestations upon requests.

Storage Management

Content Management

Meta

data

M

an

ag

em

en

t

Index and Search Management

Authentication Authorization

gLite StorageBroker

Information Service

gLite JM

gLite SE

gLite WMS

Storage Management

User Interface

Inf. ServiceR-GMA

DVOSVOMS

Page 16: DILIGENT Project

09/01/2006 NA4 Generic Application Meeting 16

Problem Using gLite Services

gLite deploymentgLite architecture and configuration are complexgLite 1.0 was released in April 2005 (since then four new releases were made available)limited information available (it has been made available gradually)several bugs were found in deploying and using gLite (many are solved)

Software porting to 64 bit is not complete. Some gLite services ( WMS, CE) can’t be deployed on 64 bit machines.

Page 17: DILIGENT Project

09/01/2006 NA4 Generic Application Meeting 17

Problem Using gLite Services [cont]

Job submission: Slow Job execution phaseSlow Job execution phaseAnyway gLite job management system showed to be reliable:

more jobssame performance

Data upload:A lot of performance issues using DCache backend

gLite-put/gLite-get/gLite-rm simultaneous gLite-put/gLite-get/gLite-rm simultaneous large amount of small fileslarge amount of small files

DILIGENT needs 100% successful upload rate-> DPMdead-links on Fireman when glite-put ends with errors

Page 18: DILIGENT Project

09/01/2006 NA4 Generic Application Meeting 18

DILIGENT Requirements

DILIGENT aims to run executables that repeat the same operations for each input files belonging to a given collection.

Each single execution takes few minutes (or less) but it must be repeated for hundreds of thousands times (even millions).

These executables usually are organised in a DAG to deliver a more complex functionality

Page 19: DILIGENT Project

09/01/2006 NA4 Generic Application Meeting 19

DILIGENT Requirements [cont]

In order to support this framework, it should be possible:To query for the maximum number of CPUs concurrently available

in order to allow to a DILIGENT high level service to automatically prepare a DAG where each node will be entitled to process a partition of the data collection

To use parametric jobs/automatic partitioning on data

Submission of a same computation on a set of n input data should be more efficient than the submission of n jobs

To use Condor as LRMS (Local resource management System)

Page 20: DILIGENT Project

09/01/2006 NA4 Generic Application Meeting 20

DILIGENT Requirements [cont]

To support service certificateit should be possible to obtain a service certificate for a high level service

To specify a job specific prioritythe same user/service should be able to specify priorities for his/its own jobs

To specify a priority for a user or for a serviceit is required to prioritize the DILIGENT infrastructural services jobs with respect to the end-user services requests

Page 21: DILIGENT Project

09/01/2006 NA4 Generic Application Meeting 21

DILIGENT Requirements [cont]

To ask for on-disk encryption of dataIt should be possible to ask for encryption of the data on disk to prevent data leaks at the storage site level

To dynamically manage VO creation The creation of a new VO should be supported without deploying and configuration of services by hand

To dynamically support user/service affiliation to a VO

The user/service affiliation to a VO should be automathized as much as possible

Page 22: DILIGENT Project

09/01/2006 NA4 Generic Application Meeting 22

Future Plans

Monitor gLite developments and continue the current work of deploying gLite in DILIGENT infrastructures

Continue the ongoing gLite experimentation using DILIGENT and EGEE PPS infrastructures

Continue gridifying the following services needed in the DILIGENT DL experimentation.

Metadata ManagementContent ManagementIndex and Search ManagementProcess (workflow) Management

Page 23: DILIGENT Project

09/01/2006 NA4 Generic Application Meeting 23

Tips / Summary

DILIGENT has successfully installed and now maintains its own gLite infrastructures. DILIGENT development infrastructure can join the EGEE infrastructure

An active EGEE-DILIGENT collaboration has been established and this has been key for the achievement of our first goals

DILIGENT has identified a concrete set of open issues that we need to address. The gLite and DL experimentation activities have shown that we are on the right track

Page 24: DILIGENT Project

09/01/2006 NA4 Generic Application Meeting 24

DILIGENT Web Site http://www.diligentproject.org

DILIGENT Training DL http://diligent-training.isti.cnr.it

Experimental DL http://diligent-dl1.isti.cnr.it

Andrea Manzi [email protected]

Thank you