databases in the grid

23
INFSO-RI-508833 Enabling Grids for E-sciencE Databases in the Grid A New Data Source Oriented CE for GRID Taffoni Giuliano INAF - OATS

Upload: keala

Post on 15-Jan-2016

22 views

Category:

Documents


0 download

DESCRIPTION

Databases in the Grid. A New Data Source Oriented CE for GRID Taffoni Giuliano INAF - OATS. Overview. What is a G-DSE An overview of the GDSE Some practice. People: Edgardo Amborsi Giuliano Taffoni Andrea Barisani Claudio Vuerli Antonia Ghiselli. The Database crisis. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Databases in the Grid

INFSO-RI-508833

Enabling Grids for E-sciencE

Databases in the Grid

A New Data Source Oriented CE for GRID

Taffoni Giuliano INAF - OATS

Page 2: Databases in the Grid

Enabling Grids for E-sciencE

INFSO-RI-508833

Overview

• What is a G-DSE

• An overview of the GDSE

• Some practice

People: Edgardo Amborsi Giuliano Taffoni Andrea Barisani Claudio Vuerli Antonia Ghiselli

Page 3: Databases in the Grid

Enabling Grids for E-sciencE

INFSO-RI-508833

The Database crisis

• I have a DB and I want to USE it from my GRID.

• I have a number of DBs and I want to USE all of them.

• Move the execution to the data and not data to the code.

• Fully compliant with gLite

Page 4: Databases in the Grid

Enabling Grids for E-sciencE

INFSO-RI-508833

Grid resource definition

• The Grid limit: it is able to execute binary code or shell scripts and stores files;

• DB in the Grid? Extension of the existing Resource Manager of Globus for providing transparent access to heterogeneous DS and DSE

Page 5: Databases in the Grid

Enabling Grids for E-sciencE

INFSO-RI-508833

Blueprint for a Query Element

• The Grid Resource Framework Layer, Information System and Data Model is extended so that a software virtual machine as a Data Source Engine becomes a valid instance for a Grid computing model.

• A new Grid component (G- DSE) that enables the access to a Data Source Engine and Data Source, totally integrated with the Grid Monitoring and Discovery System and Resource Broker is defined

• A new Grid Element, the Query Element, can be built on top of the G-DSE component.

Page 6: Databases in the Grid

Enabling Grids for E-sciencE

INFSO-RI-508833

Blueprint for a Query Element

• Modify the Job Management component to access new kind of resources

• Integrate the Information system with the “description” of the new resource;

• Use the Grid Security Infrastructure

• No modification on the client and server side: if I can submit a job I can also submit a query!

• No modification on the Brokering/Workflow systems: if I can direct the CE I can direct also a QE.

Page 7: Databases in the Grid

Enabling Grids for E-sciencE

INFSO-RI-508833

Extending the Grid capabilities

• Provide a proper extension of the Grid to care a new resource

• Security GSI: no need to extend but to use!• First theory (Grid ASM) then…application.

“A Formal Framework for Defining Grid Systems” Zsolt N. Nemeth & Vaidy Sunderam

2nd IEEE/ACM (CCGRID'02)

Page 8: Databases in the Grid

Enabling Grids for E-sciencE

INFSO-RI-508833

Globus G-DSE integration

gatekeeper

JobManger QueryManger

JobProcess QueryProcess

Scheduler p-in

Pbs/LFS

query plug-in

Query DB specific driver

GRAM GIS

RDBMS

MDS

GRIS

Ldapldif

RDBMS

Grid Providers (snmp)

Page 9: Databases in the Grid

Enabling Grids for E-sciencE

INFSO-RI-508833

Globus4 Integration

GRAMservices

Delegation

RFT

GRAMAdapter

LocalDb control

query “plug-in”

QueryProcessGridFTP

GridFTP

Remote SE

Page 10: Databases in the Grid

Enabling Grids for E-sciencE

INFSO-RI-508833

G-DSE Grid formalization

• New Grid component: –Integrated within the Grid Information

System–May be integrated in the WMS

• New Grid Element on top of the G-DSE component

the Query Element

Page 11: Databases in the Grid

Enabling Grids for E-sciencE

INFSO-RI-508833

The Query Element

CE

code

QEquery

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

Page 12: Databases in the Grid

Enabling Grids for E-sciencE

INFSO-RI-508833

QE implementation

• Runs on any linux/unix flavor: GT>=2.4.3 • Backbends: any DB vendor (MySQL, Oracle,

PostgreSQL etc…) + flat files • Two protocols: GRAM or WS• API: C, C++, python, Java, perl • If it works with Globus it works with G-DSE

ora

psql

file

GRAM

SOAPGDSE

Page 13: Databases in the Grid

Enabling Grids for E-sciencE

INFSO-RI-508833

QE Authorization

• Access control using GSI and VOMS – The certificate + roles identify the user permissions on DB

Super user: crate, modify, admin, grant and revoke users…. ANYTHING!!!

Standard user: select+ insert

Simple user: select

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture. QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

Page 14: Databases in the Grid

Enabling Grids for E-sciencE

INFSO-RI-508833

QE Authorization

VOMS roles and groups mapping with db user:

Attribute:/vo/dbuser/ROLE=astrouser/CAPABILITY=select

Page 15: Databases in the Grid

Enabling Grids for E-sciencE

INFSO-RI-508833

More than one statement

QE language

• UI/QE interactions trough a STANDARD LANGUAGE• RSL(SQL)

> globus-job-run g.dse.host/dbmanager-ODBC -queue PSQL1 “select a,b from table;” --------------| a | b | --------------| Uno | 001 || Due | 002 || Tre | 003 | --------------

Page 16: Databases in the Grid

Enabling Grids for E-sciencE

INFSO-RI-508833

QE language

> globus-job-submit g.dse.host/dbmanager-ODBC -queue PSQL1 “select a,b from table;”

--------------| a | b | --------------| Uno | 001 || Due | 002 || Tre | 003 | --------------

https://g.dse.host/20001/23297/113699980234

>globus-job-status https://g.dse.host/20001/23297/113699980234DONE>globus-job-get-output https://g.dse.host/20001/23297/113699...

Off line access

Page 17: Databases in the Grid

Enabling Grids for E-sciencE

INFSO-RI-508833

The Information System

• QE publishes its presence to the GRID • Software computing machine load and memory space

etc.. • We use MIB rdms information:

– More than 250 parameters … we are not using all of them!!! rdbmsSrvInfoFinishedTransactions 1.3.6.1.2.1.39.1.6.1.2 rdbmsSrvInfoDiskReads 1.3.6.1.2.1.39.1.6.1.3 rdbmsSrvInfoLogicalReads 1.3.6.1.2.1.39.1.6.1.4 rdbmsSrvInfoDiskWrites 1.3.6.1.2.1.39.1.6.1.5

• Based on snmp or direct access.

Page 18: Databases in the Grid

Enabling Grids for E-sciencE

INFSO-RI-508833

gLite implementation

• GRAM + site bdii + top BDII• Based on information provides

– Static information– Dynamic information

odbc odbc odbc

snmp snmp snmp

ORACLE POSTGRESQL MYSQL

Dynamic providers Static providers

snmpquery ODBCquery ldif

Page 19: Databases in the Grid

Enabling Grids for E-sciencE

INFSO-RI-508833

QE BDII

> ldapsearch -LLL -x -H g.dse.host -b "mds-vo-name=site,o=grid”

dn:GlueDSEUniqueID=g.dse.host:2119/dbmanager-ODBC, mds-vo-name=local,o=grid

objectClass: GlueCETop

objectClass: GlueCE

objectClass: GlueDSE

objectClass: GlueDSETop

objectClass: GlueKey

GlueDSEName: TESTDB

GlueDSEStateStatus: Production

GlueDSEInfoLRMSType: Postgresql

GlueDSEInfoLRMSVersion:7.3

Page 20: Databases in the Grid

Enabling Grids for E-sciencE

INFSO-RI-508833

QE and the WMS

• New job wrapper for dbmanager

gatekeeper QueryManger

QueryProcess

query plug-in

Query DB specific driver

RDBMS

QueryWrapperRB

Page 21: Databases in the Grid

Enabling Grids for E-sciencE

INFSO-RI-508833

An Example

Type = "Job";JobType = "Normal";Executable = ”select A from table;";StdOutput = "hostname.out";StdError = "hostname.err";OutputSandbox = {"hostname.err","hostname.out"};Arguments = "-xml";RetryCount = 1;

$ glite-job-submit -r gdse.oats.inaf.it:2119/dbmanager-odbc-test1 sqltest.jdl

Selected Virtual Organisation name (from proxy certificate extension): inaf Connecting to host arquimedes.rediris.es, port 7772 Logging to host arquimedes.rediris.es, port 9002

================================ edg-job-submit Success =========== The job has been successfully submitted to the Network Server. Use edg-job-status command to check job current status. Your job - https://arquimedes.rediris.es:9000/75hD3nNHxbYRDAL3GmiIug

The edg_jobId has been saved in the following file: /home/madrid01/jlvpjobid========================================================================

Page 22: Databases in the Grid

Enabling Grids for E-sciencE

INFSO-RI-508833

Summing up

• G-DSE supports Data Source (DS) and DSE indexing, monitoring, management and recovery through a rich set of Meta-Data bound to standard GIS.

• DS have their core engine into G-DSE, that provides a framework for activity and task management.

• A RSL/JDL Transaction/Query permits a number of tasks to be specified, together with their parameters, inputs, outputs and control flow.

• The response to a request is generated by the GDSE within a JobQueryManager Session. The GDSE analyses incoming Task and conducts authentication and authorisation

• The standard Grid WorkLoad Manager constructs an optimised execution graph.

• GIS will monitor a DS’s and DSE’s status digest produced by its internal monitor.

• The GDSE has been designed to support dynamic configuration, sessions, transactions, recovery and concurrency.

Page 23: Databases in the Grid

Enabling Grids for E-sciencE

INFSO-RI-508833

End of Presentation

Thank you for your attention