databases in the grid
DESCRIPTION
Databases in the Grid. A New Data Source Oriented CE for GRID Taffoni Giuliano INAF - OATS. Overview. What is a G-DSE An overview of the GDSE Some practice. People: Edgardo Amborsi Giuliano Taffoni Andrea Barisani Claudio Vuerli Antonia Ghiselli. The Database crisis. - PowerPoint PPT PresentationTRANSCRIPT
INFSO-RI-508833
Enabling Grids for E-sciencE
Databases in the Grid
A New Data Source Oriented CE for GRID
Taffoni Giuliano INAF - OATS
Enabling Grids for E-sciencE
INFSO-RI-508833
Overview
• What is a G-DSE
• An overview of the GDSE
• Some practice
People: Edgardo Amborsi Giuliano Taffoni Andrea Barisani Claudio Vuerli Antonia Ghiselli
Enabling Grids for E-sciencE
INFSO-RI-508833
The Database crisis
• I have a DB and I want to USE it from my GRID.
• I have a number of DBs and I want to USE all of them.
• Move the execution to the data and not data to the code.
• Fully compliant with gLite
Enabling Grids for E-sciencE
INFSO-RI-508833
Grid resource definition
• The Grid limit: it is able to execute binary code or shell scripts and stores files;
• DB in the Grid? Extension of the existing Resource Manager of Globus for providing transparent access to heterogeneous DS and DSE
Enabling Grids for E-sciencE
INFSO-RI-508833
Blueprint for a Query Element
• The Grid Resource Framework Layer, Information System and Data Model is extended so that a software virtual machine as a Data Source Engine becomes a valid instance for a Grid computing model.
• A new Grid component (G- DSE) that enables the access to a Data Source Engine and Data Source, totally integrated with the Grid Monitoring and Discovery System and Resource Broker is defined
• A new Grid Element, the Query Element, can be built on top of the G-DSE component.
Enabling Grids for E-sciencE
INFSO-RI-508833
Blueprint for a Query Element
• Modify the Job Management component to access new kind of resources
• Integrate the Information system with the “description” of the new resource;
• Use the Grid Security Infrastructure
• No modification on the client and server side: if I can submit a job I can also submit a query!
• No modification on the Brokering/Workflow systems: if I can direct the CE I can direct also a QE.
Enabling Grids for E-sciencE
INFSO-RI-508833
Extending the Grid capabilities
• Provide a proper extension of the Grid to care a new resource
• Security GSI: no need to extend but to use!• First theory (Grid ASM) then…application.
“A Formal Framework for Defining Grid Systems” Zsolt N. Nemeth & Vaidy Sunderam
2nd IEEE/ACM (CCGRID'02)
Enabling Grids for E-sciencE
INFSO-RI-508833
Globus G-DSE integration
gatekeeper
JobManger QueryManger
JobProcess QueryProcess
Scheduler p-in
Pbs/LFS
query plug-in
Query DB specific driver
GRAM GIS
RDBMS
MDS
GRIS
Ldapldif
RDBMS
Grid Providers (snmp)
Enabling Grids for E-sciencE
INFSO-RI-508833
Globus4 Integration
GRAMservices
Delegation
RFT
GRAMAdapter
LocalDb control
query “plug-in”
QueryProcessGridFTP
GridFTP
Remote SE
Enabling Grids for E-sciencE
INFSO-RI-508833
G-DSE Grid formalization
• New Grid component: –Integrated within the Grid Information
System–May be integrated in the WMS
• New Grid Element on top of the G-DSE component
the Query Element
Enabling Grids for E-sciencE
INFSO-RI-508833
The Query Element
CE
code
QEquery
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
Enabling Grids for E-sciencE
INFSO-RI-508833
QE implementation
• Runs on any linux/unix flavor: GT>=2.4.3 • Backbends: any DB vendor (MySQL, Oracle,
PostgreSQL etc…) + flat files • Two protocols: GRAM or WS• API: C, C++, python, Java, perl • If it works with Globus it works with G-DSE
ora
psql
file
GRAM
SOAPGDSE
Enabling Grids for E-sciencE
INFSO-RI-508833
QE Authorization
• Access control using GSI and VOMS – The certificate + roles identify the user permissions on DB
Super user: crate, modify, admin, grant and revoke users…. ANYTHING!!!
Standard user: select+ insert
Simple user: select
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture. QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
Enabling Grids for E-sciencE
INFSO-RI-508833
QE Authorization
VOMS roles and groups mapping with db user:
Attribute:/vo/dbuser/ROLE=astrouser/CAPABILITY=select
Enabling Grids for E-sciencE
INFSO-RI-508833
More than one statement
QE language
• UI/QE interactions trough a STANDARD LANGUAGE• RSL(SQL)
> globus-job-run g.dse.host/dbmanager-ODBC -queue PSQL1 “select a,b from table;” --------------| a | b | --------------| Uno | 001 || Due | 002 || Tre | 003 | --------------
Enabling Grids for E-sciencE
INFSO-RI-508833
QE language
> globus-job-submit g.dse.host/dbmanager-ODBC -queue PSQL1 “select a,b from table;”
--------------| a | b | --------------| Uno | 001 || Due | 002 || Tre | 003 | --------------
https://g.dse.host/20001/23297/113699980234
>globus-job-status https://g.dse.host/20001/23297/113699980234DONE>globus-job-get-output https://g.dse.host/20001/23297/113699...
Off line access
Enabling Grids for E-sciencE
INFSO-RI-508833
The Information System
• QE publishes its presence to the GRID • Software computing machine load and memory space
etc.. • We use MIB rdms information:
– More than 250 parameters … we are not using all of them!!! rdbmsSrvInfoFinishedTransactions 1.3.6.1.2.1.39.1.6.1.2 rdbmsSrvInfoDiskReads 1.3.6.1.2.1.39.1.6.1.3 rdbmsSrvInfoLogicalReads 1.3.6.1.2.1.39.1.6.1.4 rdbmsSrvInfoDiskWrites 1.3.6.1.2.1.39.1.6.1.5
• Based on snmp or direct access.
Enabling Grids for E-sciencE
INFSO-RI-508833
gLite implementation
• GRAM + site bdii + top BDII• Based on information provides
– Static information– Dynamic information
odbc odbc odbc
snmp snmp snmp
ORACLE POSTGRESQL MYSQL
Dynamic providers Static providers
snmpquery ODBCquery ldif
Enabling Grids for E-sciencE
INFSO-RI-508833
QE BDII
> ldapsearch -LLL -x -H g.dse.host -b "mds-vo-name=site,o=grid”
dn:GlueDSEUniqueID=g.dse.host:2119/dbmanager-ODBC, mds-vo-name=local,o=grid
objectClass: GlueCETop
objectClass: GlueCE
objectClass: GlueDSE
objectClass: GlueDSETop
objectClass: GlueKey
GlueDSEName: TESTDB
GlueDSEStateStatus: Production
GlueDSEInfoLRMSType: Postgresql
GlueDSEInfoLRMSVersion:7.3
Enabling Grids for E-sciencE
INFSO-RI-508833
QE and the WMS
• New job wrapper for dbmanager
gatekeeper QueryManger
QueryProcess
query plug-in
Query DB specific driver
RDBMS
QueryWrapperRB
Enabling Grids for E-sciencE
INFSO-RI-508833
An Example
Type = "Job";JobType = "Normal";Executable = ”select A from table;";StdOutput = "hostname.out";StdError = "hostname.err";OutputSandbox = {"hostname.err","hostname.out"};Arguments = "-xml";RetryCount = 1;
$ glite-job-submit -r gdse.oats.inaf.it:2119/dbmanager-odbc-test1 sqltest.jdl
Selected Virtual Organisation name (from proxy certificate extension): inaf Connecting to host arquimedes.rediris.es, port 7772 Logging to host arquimedes.rediris.es, port 9002
================================ edg-job-submit Success =========== The job has been successfully submitted to the Network Server. Use edg-job-status command to check job current status. Your job - https://arquimedes.rediris.es:9000/75hD3nNHxbYRDAL3GmiIug
The edg_jobId has been saved in the following file: /home/madrid01/jlvpjobid========================================================================
Enabling Grids for E-sciencE
INFSO-RI-508833
Summing up
• G-DSE supports Data Source (DS) and DSE indexing, monitoring, management and recovery through a rich set of Meta-Data bound to standard GIS.
• DS have their core engine into G-DSE, that provides a framework for activity and task management.
• A RSL/JDL Transaction/Query permits a number of tasks to be specified, together with their parameters, inputs, outputs and control flow.
• The response to a request is generated by the GDSE within a JobQueryManager Session. The GDSE analyses incoming Task and conducts authentication and authorisation
• The standard Grid WorkLoad Manager constructs an optimised execution graph.
• GIS will monitor a DS’s and DSE’s status digest produced by its internal monitor.
• The GDSE has been designed to support dynamic configuration, sessions, transactions, recovery and concurrency.
Enabling Grids for E-sciencE
INFSO-RI-508833
End of Presentation
Thank you for your attention