grid computing meets the database chris smith platform computing session # 36686

30
Grid Computing Meets the Database Chris Smith Platform Computing Session # 36686

Upload: alfred-watts

Post on 24-Dec-2015

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Grid Computing Meets the Database Chris Smith Platform Computing Session # 36686

Grid Computing Meets the Database

Chris Smith

Platform Computing

Session # 36686

Page 2: Grid Computing Meets the Database Chris Smith Platform Computing Session # 36686

© Platform Computing Inc. 2003

The best thing about the Grid is that it is unstoppable.

The Economist, June 21, 2001

22

Page 3: Grid Computing Meets the Database Chris Smith Platform Computing Session # 36686

© Platform Computing Inc. 20033

Grid: Transparent, secure and coordinated computing

resource sharing across geographically disparate

sites

What is Grid computing?

Page 4: Grid Computing Meets the Database Chris Smith Platform Computing Session # 36686

© Platform Computing Inc. 20034

Benefits of Grid Computing

Grid technology is used to aggregate computing resources across the entire organization, regardless of location or business unit.

Provides virtually unlimited computing capacity

Delivers reliable, “always-on” computing infrastructure

Virtualizes IT infrastructure for end-users

Coordinates the usage of heterogeneous computing resources in order to accomplish business processing tasks

Page 5: Grid Computing Meets the Database Chris Smith Platform Computing Session # 36686

© Platform Computing Inc. 20035

Example Use Cases

Batch Process Automation

Multi-Site Capacity Computing

Service Virtualization

Page 6: Grid Computing Meets the Database Chris Smith Platform Computing Session # 36686

Batch Process Automation

Page 7: Grid Computing Meets the Database Chris Smith Platform Computing Session # 36686

© Platform Computing Inc. 20037

What is Platform JobScheduler?Intelligent batch process automation

Grid-enabled enterprise batch process automation software

Provides a Graphical Design Studio & Management console to design and control the scheduling of Oracle jobs and compute jobs with various dependencies (Line-of-Business Processes) across a virtualized environment

Page 8: Grid Computing Meets the Database Chris Smith Platform Computing Session # 36686

© Platform Computing Inc. 20038

Simplified Scheduling Environment for Oracle jobs and Compute jobs

Single Point of Control to Design & Monitor

Job Events,

File Events,

Time Events

Central Repository for Storing/Sharing

Jobs

Business flows

Sub flows

Proxy dependencies

Consistent, Flexible & Extensible Automated Exception Handling

Re-running jobs,

Killing jobs,

Triggering other jobs

Page 9: Grid Computing Meets the Database Chris Smith Platform Computing Session # 36686

© Platform Computing Inc. 20039

More Efficient Use of Computing Resources forOracle jobs and Compute jobs

Resource Virtualization

Ensures the reliability of mission critical business flows and always-on availability of resources

Provision additional databases for specific tasks across time

Matching demand for resources with the supply of resources

Page 10: Grid Computing Meets the Database Chris Smith Platform Computing Session # 36686

© Platform Computing Inc. 200310

JobScheduler Architecture

Client

Grid-Enabled ApplicationExecution

Infrastructure

LoadXML

SaveXML Log

Grid Master & Grid Agents

SchedulingTime, Job, file,

Other events

Jobflow Server

ProcessDesigning/

Control

Oracle Database

Page 11: Grid Computing Meets the Database Chris Smith Platform Computing Session # 36686

© Platform Computing Inc. 200311

JobScheduler and Oracle scheduler integration

Platform JobScheduler client

Platform JobScheduler server

LSF Master host

Oracle instance

Oracle instance

Oracle client

C B

1

2 3

4LSF host

orajobstartelim.oracle.Celim.oracle.B

LSF Cluster

Page 12: Grid Computing Meets the Database Chris Smith Platform Computing Session # 36686

© Platform Computing Inc. 200312

ETL using Platform JobScheduler

A common use of the Platform JobScheduler and Oracle scheduler integration is for ETL into a data warehouse.

Example: a brokerage firm wants to load the day’s trading data into their data warehouse for analysis (e.g. risk positions, trending, etc)

ETL flow is triggered by:

Time of day event

Arrival of market data in flat-file format

Completion of a stored procedure which collects location brokerage data

Data is cleansed and loaded with SQL*Loader into the database

Stored procedures are invoked which do some analysis and initial reporting

Page 13: Grid Computing Meets the Database Chris Smith Platform Computing Session # 36686
Page 14: Grid Computing Meets the Database Chris Smith Platform Computing Session # 36686

Multi-Site Capacity Computing

Page 15: Grid Computing Meets the Database Chris Smith Platform Computing Session # 36686

© Platform Computing Inc. 200315

Increasing Computing Capacity with Platform MultiCluster

A parameter space study is done on tens of thousands of individual sets of parameters, resulting in tens of thousands of analysis jobs

Local cluster doesn’t have enough capacity, so Platform MultiCluster is used to allow the forwarding of analysis jobs to clusters located at other sites of the organization

The DBMS_STREAMS_ADM.MAINTAIN_TABLESPACES procedure provided with Oracle Database 10g is used to replicate input data for the analysis at the remote site

Database aware scheduling is used to make intelligent decisions about which sites are suitable for receiving jobs

Page 16: Grid Computing Meets the Database Chris Smith Platform Computing Session # 36686

© Platform Computing Inc. 200316

Platform MultiCluster Job Forwarding Model

Compute

ServersCompute

Servers

Site A Site B

Send queue

Receivequeue

You submitWe do ---• Job transfer• data staging• Account mapping• Accounting

Page 17: Grid Computing Meets the Database Chris Smith Platform Computing Session # 36686

© Platform Computing Inc. 200317

Enterprise Grid Architecture

Page 18: Grid Computing Meets the Database Chris Smith Platform Computing Session # 36686

© Platform Computing Inc. 200318

Workload driven data management

1. Job forwarded

Pre-execscript

Application

Master moleculardatabase (MOL)

Tablespacesfor MOL

Streams maintainedversion of MOL

Tablespacesfor MOL

2. Run pre-exec

3. Connect to MOL and runMAINTAIN_TABLESPACES

4. MOL metadata and tablespaces transferred

5. pre-exec finished6. Job is run

7. Job uses copy

Streams DML updates

Page 19: Grid Computing Meets the Database Chris Smith Platform Computing Session # 36686

© Platform Computing Inc. 200319

Database aware scheduling

MOLMOL

Site 1 Site 3

Site 2

Data ManagementService

Site 1 – MOL, MOL2

Site 2 – (none)

Site 3 - MOL

MOL2

1. Poll for datasets

2. Update cache info

3. bsub -extsched MOL

4. Local site is overloadedDatabase aware scheduler plug-in decides to forward the job to site 3, since it has the MOL database

5. Job forwarded to site 3

Page 20: Grid Computing Meets the Database Chris Smith Platform Computing Session # 36686

Service Virtualization

Page 21: Grid Computing Meets the Database Chris Smith Platform Computing Session # 36686

© Platform Computing Inc. 200321

Demo Lab Hardware-- A Common Web Service/Application Environment

Node Node Node Node

NAS/SAN

Node Node Node Node

Web Server & App Server

Oracle RAC

CISCO Hardware Load Balancer

Web Interconnect networkStorage networkPublic network

(Linux)

(Linux AS 2.1)

Page 22: Grid Computing Meets the Database Chris Smith Platform Computing Session # 36686

© Platform Computing Inc. 200322

Oracle RAC Provisioning Demo System

AppsAppsWeb

Server instances

Provisioner

Agent Manager

Node5

Managed node

Web Layer/Nodes (Linux)

RAC Agent

Node8

Managed node

App Agent

Agent Manager

AppsAppsApplication instances

Service Agent

Node6

Managed node

Agent Manager

AppsAppsApp Server instances

Service Agent

AppsAppsRAC

instances

RAC

Managed cluster

Node1 Node4…

RAC Layer/Nodes(Linux AS 2.1)

Application Layer/Nodes (Linux)

Page 23: Grid Computing Meets the Database Chris Smith Platform Computing Session # 36686

© Platform Computing Inc. 200323

Proof of Concept Demos

Dynamic Provisioning within Database Layer

Dynamic Provisioning cross Database and Application Layers

Page 24: Grid Computing Meets the Database Chris Smith Platform Computing Session # 36686

RAC Layer

dbHR

dbFinance

- Show one RAC node running dbFinance, two RAC nodes running dbHR, and one RAC node is idle- Have a lot of data access to dbFinance, a few of data access to dbHR

- Without dynamic provisioning, the response time to dbFinance is very slow, while other RAC nodes are idle

- Applying dynamic provisioning, one idle node is added to dbFinance, one dbHR node is shutdown and moved to dbFinance

- The response time to dbFinance is improved

?

App LayerWeb Layer

NodeNode

NodeNode

Web Server App ServerApp Server

Provisioning Within DB Layer

Page 25: Grid Computing Meets the Database Chris Smith Platform Computing Session # 36686

Provisioning Across DB & App Layers

RAC Layer

dbHR

dbFinance

- Show one RAC node running dbFinance, one RAC node running dbHR, and two RAC nodes are idle- Have a lot of applications need to run on App Layer- Without dynamic provisioning, the response time of App Layer is very slow, while some RAC nodes are idle- Applying dynamic provisioning, some applications are running on two idle RAC nodes

- The response time of App Layer is improved

App LayerWeb Layer ?

App Server

App Server

- When there are some data accesses to dbFinance, more database instances are needed

- Applications on the RAC nodes are gracefully preempted, and two more dbFinance instances are started

NodeNode

NodeNode

Web Server App ServerApp Server

Page 26: Grid Computing Meets the Database Chris Smith Platform Computing Session # 36686

© Platform Computing Inc. 200326

RAC Agent

Gathers Metrics:

numInstances – Instances in a given database.

instanceState – Operation state of an instance.

dbLoad – Various load metrics from a database

User Calls, Recursive Calls

Physical Reads, Physical Writes

Consistent Gets, dB Block Gets

Takes Actions:

startInstance – Start an instance on a candidate

stopInstance – Stop an instance on a candidate

Page 27: Grid Computing Meets the Database Chris Smith Platform Computing Session # 36686

© Platform Computing Inc. 200327

Policy Functions

Discover State of System

What is the current state of the Candidates

Database High Load

If a candidate is free start an Instance of the loaded database.

Database Low Load

If a candidate was added, shutdown the database instance on the candidate.

Page 28: Grid Computing Meets the Database Chris Smith Platform Computing Session # 36686

© Platform Computing Inc. 200328

Scenario 1: Results

Discovery

• Discover pe02, and pe03 are free

High Load

• Detect High Load on HR database.• Have a candidate free.• Remove candidate from free host list.• Start another instance of the HR database.• Add the candidate to the list of HR instances.

Page 29: Grid Computing Meets the Database Chris Smith Platform Computing Session # 36686

© Platform Computing Inc. 200329

Scenario 1: Results

Continued High Load

• Detect low load on the HR database.• Detect that candidate hosts are in use.• Remove from last added candidate from list of HR instances.• Stop HR instance on candidate.• Return candidate to list of free hosts.

Low Load

• Add the remaining candidate to the HR instances.

Page 30: Grid Computing Meets the Database Chris Smith Platform Computing Session # 36686

Questions?