introduction to ogsa-dai neil chue hong 15 th february 2006 ggf16, athens

18
Introduction to OGSA-DAI Neil Chue Hong 15 th February 2006 GGF16, Athens

Upload: tracy-richard

Post on 25-Dec-2015

221 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Introduction to OGSA-DAI Neil Chue Hong 15 th February 2006 GGF16, Athens

Introduction to OGSA-DAI

Neil Chue Hong

15th February 2006GGF16, Athens

Page 2: Introduction to OGSA-DAI Neil Chue Hong 15 th February 2006 GGF16, Athens

Data Services: challenges

Scale Many sites, large collections, many uses

Longevity Research requirements outlive technical decisions

Diversity No “one size fits all” solutions will work

Primary Data, Data Products, Meta Data, Administrative data, …

Many Data Resources Independently owned & managed Geographically distributed

and I haven’t even mentioned security yet!

Page 3: Introduction to OGSA-DAI Neil Chue Hong 15 th February 2006 GGF16, Athens

Use Cases for Data Services Data Filtering:

Single source producing large amounts of data distributed to many sites downstream

Data Discovery: many sources, many query entry points in a linked system

Data Translation: source to sink, conversion of data model / structure

Data Federation: many sources, linked to provide view as a single source

Data Replication full or partial copies to improve throughput

Data Integration (model aggregation) e.g. integration of time variant data, streams, files

Data Integration (knowledge expansion) forming links between databases to increase knowledge

Page 4: Introduction to OGSA-DAI Neil Chue Hong 15 th February 2006 GGF16, Athens

Requirements on Data Services? Common Data Model e.g. RowSet Common Query Language(s) e.g. XQuery, SQL Standard access to

data resource schema information physical data resource information for optimisation purposes data resource descriptive information for discovery / integration

Single, seamless security model Dynamic publication and discovery Multiple, efficient delivery methods Move computation towards data Data aggregation functionality Replication information

Page 5: Introduction to OGSA-DAI Neil Chue Hong 15 th February 2006 GGF16, Athens

OGSA-DAI In One Slide An engineered extensible

framework for data access and integration.

Expose heterogeneous data resources to a grid through web services.

Interact with data resources: Queries and updates. Data transformation /

compression Data delivery.

Customise for your project using Additional Activities Client Toolkit APIs Data Resource handlers

A base for higher-level services federation, mining, visualisation,

Page 6: Introduction to OGSA-DAI Neil Chue Hong 15 th February 2006 GGF16, Athens

OGSA-DAI Philosophy

We provide the basic, general functionality e.g. querying relational databases, delivery

mechanisms, schema extractors You add the specialist functionality

e.g. map overlays Several well-defined extension points

client toolkit activity plugins data resource accessor model

Page 7: Introduction to OGSA-DAI Neil Chue Hong 15 th February 2006 GGF16, Athens

MySQL

OGSA-DAI service

Engine

SQLQuery

JDBCData

Resources

Activities

DB2

GZip GridFTPXPath

XMLDB

XIndice

readFile

File

SWISSPROT

XSLT

SQLServer

Data-bases

ApplicationApplicationClient ToolkitClient Toolkit

Page 8: Introduction to OGSA-DAI Neil Chue Hong 15 th February 2006 GGF16, Athens

MySQL

OGSA-DAI service

Engine

SQLQuery

JDBC

SQL

JDBC

SQL

JDBC

SQL

JDBC

SQL

JDBC

MultipleSQL GDS

SQLQuery

Page 9: Introduction to OGSA-DAI Neil Chue Hong 15 th February 2006 GGF16, Athens

Distributed Query Processing

Higher level services building on OGSA-DAI

Queries mapped to algebraic expressions for evaluation

Parallelism represented by partitioning queries Use exchange operators

table_scan(protein)

table_scantermID=S92(proteinTerm)

reduce

reduce

hash_join(proteinId)

op_call(Blast)

reduce

exchange

exchange

3,4

1 2

Page 10: Introduction to OGSA-DAI Neil Chue Hong 15 th February 2006 GGF16, Athens

DQP architecture

Co-ordinator

Evaluator Evaluator Evaluator

OGSA-DAI

OGSA-DAI

OGSA-DAI

OGSA-DAI

Query SQL & OQL

OGSA-DAI activity

WS-I only

Using client toolkit

All interfaces that aresupported by toolkit

Page 11: Introduction to OGSA-DAI Neil Chue Hong 15 th February 2006 GGF16, Athens

Map Retrieval: Integration

Using security and extensibility (overlay)

OGCODS 2 GIS OraclePortlet

ODS 1OracleCensus

ODS 3 Application data

SO-OGC

JDBC

SO-OGC

SQL/XML

NGS Authentication

Page 12: Introduction to OGSA-DAI Neil Chue Hong 15 th February 2006 GGF16, Athens

Integrated service for Data & Metadata

Dat

aR

esou

rce

Dat

aR

esou

rce

Storage Manager

Dat

aR

esou

rce

BD messages

Dat

aR

esou

rce

Dat

aR

esou

rce

Metadata Manager

Dat

aR

esou

rce

MD messages

Naming Service

Metadata & Data Service

Client

Dat

aR

esou

rce

Dat

aR

esou

rce

Page 13: Introduction to OGSA-DAI Neil Chue Hong 15 th February 2006 GGF16, Athens

MDS/GridFTP/GSI Integration

Can publish any OGSA-DAI resource property to a local MDS Index Service e.g. databaseSchema, activityTypes information published is on a per-resource basis, and

can differ for each resource Can transfer results via GridFTP rather than via

SOAP still working on tuning options

Can use X509 certificates to secure services but still a coarse grained security by default

Page 14: Introduction to OGSA-DAI Neil Chue Hong 15 th February 2006 GGF16, Athens

Future plans: overview A new version of the OGSA-DAI Engine

better support for concurrency, sessions, monitoring and notification

Implementing new DAIS specifications Key things that we will be addressing:

Performance (particularly format representation and transport) Security Model which can be applied across platforms Transactions provision More data integration facilities

Integration with other components registries (e.g. GRIMOIRES) workflow editors (e.g. Taverna)

Working with new projects e.g. CancerGrid, iSpider, GEODE

Page 15: Introduction to OGSA-DAI Neil Chue Hong 15 th February 2006 GGF16, Athens

Future plans: Performance

WebRowSet is not efficient aim to use ResultSet and

CSV instead where possible

SOAP is not efficient aim to use SOAP

w/Attachments, MTOM

ResultSet to RowSet conversion

WebRowSet is larger

CSV scales better for output

Conversion and validation takes the time work in progress Jan06

Page 16: Introduction to OGSA-DAI Neil Chue Hong 15 th February 2006 GGF16, Athens

From contribution to core

One of a group of projects moving to GlobDev project (more later)

Hope to use this as a way of encouraging collaborations and contributions

Different levels of contributions Based on OGSA-DAI? Works with OGSA-DAI? Part of OGSA-DAI?

Page 17: Introduction to OGSA-DAI Neil Chue Hong 15 th February 2006 GGF16, Athens

Contributing to OGSA-DAI

Additional functionality: Provide activities which implement specific

functionality Provide extra client functionality Provide different security mechanisms Provide higher level components and

applications

Page 18: Introduction to OGSA-DAI Neil Chue Hong 15 th February 2006 GGF16, Athens

Further information The OGSA-DAI Project Site:

http://www.ogsadai.org.uk The DAIS-WG site:

http://forge.gridforum.org/projects/dais-wg/

OGSA-DAI Users Mailing list [email protected]

Formal support for OGSA-DAI releases http://bugzilla.globus.org (OGSA-DAI)

OGSA-DAI training courses (live and online)