2010.03.16 pollock.edw2010.modern d ifor warehousing

Post on 18-Nov-2014

1.011 Views

Category:

Documents

4 Downloads

Preview:

Click to see full reader

DESCRIPTION

Presentation describes a modern alternative to conventional hub-based ETL and Replication for Data Warehousing

TRANSCRIPT

<Insert Picture Here>

The following is intended to outline our general

product direction. It is intended for information

purposes only, and may not be incorporated into any

contract. It is not a commitment to deliver any

material, code, or functionality, and should not be

relied upon in making purchasing decisions.

2

relied upon in making purchasing decisions.

The development, release, and timing of any

features or functionality described for Oracle’s

products remains at the sole discretion of Oracle.

<Insert Picture Here>

Modern Data Integration for Data WarehousingOracle Fusion Middleware

Agenda

• Data Warehouse Problem Space (Data Intg. Focus)

• Ancient Pre-History of Data Warehouse

• “The Good Old Days” of Data Warehouse

• Revival Period for Data Warehouse

• Data Integration for Modern Data Warehousing

• Old Generation: Hub & Spoke with Invasive Capture

• New Generation: Agent-based with Non-invasive Capture

4

• New Generation: Agent-based with Non-invasive Capture

• Drive Business Value with Data Integration

• Why Replace? Isn’t my Old _____ Good Enough?

• The Oracle Solution for Data Integration

• Oracle GoldenGate

• Oracle Data Integrator

• Oracle Data Quality

Data Warehousing

P R O B L E M S P A C E

5

P R O B L E M S P A C E

Data Warehouse Ancient History

• 1985 – 1995 “Controlled Chaos”

• Fragmented Strategy for Marts vs. Warehouse

• No practical notion of “Enterprise Data Warehouse”

• Data Integration:

• Hand-coded Scripts (External to DB)

• Not Optimized

6

• Not Optimized

• Procedural Transformations (PL/SQL etc)

• Few Data Integration Tools

• No Formal Methodology, Metrics or Governance

Data Warehouse Good Old Days

• 1995 – 2005 “Formal Methods and Discipline”

• Strategy Choices for Marts vs. Warehouse

• Top-down (Inmon) vs. Bottom-up (Kimball)

• Formal notion of “Enterprise Data Warehouse”

• Data Integration:

• Tool-based Data Integration Solutions

7

• Tool-based Data Integration Solutions

• Optimized, Parallel Server-based Transforms

• Formal Methodology, Metrics or Governance

• Reduced Reliance on Hand-coded Scripts and

Procedural Transformations (PL/SQL etc)

Data Warehouse Revival Period

• 2005 – 2015 “Specialized Warehouse Solutions”

• Technology-driven Choices for High-end DW’s

• Commodity H/W vs. Optimized Appliances

• Relational/Star vs. Columnar (vs. Cubes/OLAP)

• Database + BI vs. Distributed Analytic Apps (Hadoop etc)

• EDW as a “source of truth” vision � morphs and

expands to MDM as a distinct problem domain

8

expands to MDM as a distinct problem domain

• Data Integration is still stuck in the “Good Old Days”

Good Old Days Modern Alternative

Hub-based Runtime Agent-based Runtime

Centralized ETL Server Optimized E-LT (DW Appliance)

Mainly Batch Mainly Real Time / Trickle Feed

Data Warehousing with

MODERN DATA INTEGRATION

9

MODERN DATA INTEGRATION

Traditional ETL + CDC

• Invasive Capture on OLTP

systems using complex Adapters

• Transformations in ETL engine

on expensive middle tier servers

• Continuous feeds from

operational systems

• Non-invasive data capture

• Thin middle tier with

Modern Data Integration ApproachHeterogeneous, Real-time, Non-Invasive, High Performance E-LT

Modern E-LT + Real-time

10

• Bulk load to the data warehouse

with large nightly/daily batch

transformations on the database

platform (target)

• Mini-batches throughout the day

or bulk processing nightly

Staging

Trickle

Lookup

Data

Load

Extract

Lookup

Data

Xform XformBulk

Ag

en

t

Ag

en

t

Heterogeneous

Good Old Days of ETL Batch Integration

Extract Transform Load Lookups/Calcs Transform Load

Development, QA, System (etc)

Environments

• Good Tools, but:

• Expensive Environments, Performance

Bottlenecks, Too Many Data Hops,

Proprietary Skills w/Vendor Lock-in, and

Heavy Optimization in Complex Situations

• Won’t scale w/new Generation of DW’s

11

Stage ProdLookup

DataSources

ETL engines

require BIG

H/W and heavy

parallel tuning

Extract Transform Load Lookups/Calcs Transform Load

ETL Engine(s)

MetaLookup

Data

ETL Metadata

Extract Transform Load Lookups/Calcs Transform Load

Modern Agent-based E-LT Processing

• Same Good Tools you Expect, plus:

• Reduce Data Center Costs, De-commission Servers

• Open Frameworks, Non-Proprietary SQL Skills

• Deploys Seamlessly Alone or within SOA Servers

• Scales Linearly with Modern DW Appliances

12

Extract Transform Load Lookups/Calcs Transform Load

Sources

Meta

Stage ProdLookup

Data

E-LTAgent

Data Movement

Set-based SQL

transforms

typically faster

SQL Load

inside DB is

always faster

Development, QA, System (etc)

Environments

Data Transformation

Good Old Days of Real Time Replication

• Good Tools, but:

• Arcane capture process, sometimes invasive

• Okay for Data Integration Changed Data Capture, but:

• not used for Active-Active / ZDT Migrations

• not used for High Availability or Disaster Recovery

13

Stage ProdLookup

DataSources

CDC Hub(s)

ETL Engine(s)

Transaction Apply

Mgmt Server

Agent-based Real Time Replication

• Same Good Tools you Expect, but:

• Not dependent on hardware for replication

• Capable of Heterogeneous, Active-Active Deployments

• Suitable for Zero Downtime Migrations

• Point-in-time Recovery

14

Sources Stage ProdLookup

Data Data MovementCaptureAgent

ReplicatAgent

Data Capture Architecture Options

• Next Generation Capabilities

• Non-invasive, heterogeneous, disk-based log access

• Suitable for CDC + High Availability & Active-Active

• Bi-directional and high performance

• Check-pointing and Simple Trail/Queue Management

15

On-Disk Logs

Log Tables

TriggersUpdatesInsertsDeletes

OracleIBM DB2MSFT SQL ServerSybaseTeradataEnscribe

Good Old Days of Data Integration

• Monolithic & Expensive Environments

• Fragile, Hard to Manage

• Difficult to Tune or Optimize

ETL engines

require BIG

H/W and heavy

Extract Transform Load Lookups/Calcs Transform Load

MetaLookup

Data

ETL Metadata

Development, QA, System (etc)

Environments

16

Stage ProdLookup

DataSources

H/W and heavy

parallel tuning

ETL Engine(s)

CDC Hub(s)

Transaction Apply

Mgmt Server

Modern Data Integration Architecture

• Lightweight, Inexpensive Environments – Agents

• Resilient, Easy to Manage – Non-Invasive

• Easy to Optimize and Tune – uses DBMS power

Extract Transform Load Lookups/Calcs Transform Load

17

Sources

Meta

Stage ProdLookup

Data Data Transformation

E-LTAgent

Bulk Data Movement

Set-based SQL

transforms

typically faster

SQL Load

inside DB is

always faster

Development, QA, System (etc)

Environments

CaptureAgent

ReplicatAgent

Data Integration Drives

B U S I N E S S V A L U E

18

B U S I N E S S V A L U E

1. Do More with Less

2. Compete Globally 24X7

Design metadata-driven integrationLeverage skills & dictate patterns

Ensure continuous uptimeAccess data in real time

Business Drivers for Data IntegrationAdd Value to the Core Business Lines

19

3. Use Data for Competitive Advantage

4. Automate and Adapt Business Processes

Ensure the quality of your dataActively govern most valuable asset

Expose data services for reuseOrchestrate processes using SOA

Project Drivers for Data IntegrationEssential Ingredient for Information Agility

Strategic Value of Data Integration

• Consistency for major enterprise initiatives like BI, DW, & MDM

• Common technical foundation platform across data silos

• Central point for data governance, availability and controls

20

Key Data Integration Use Cases

• BI, DW, and OLTP Data Integration & Replication

• SOA, Enterprise Integration & Modernization

• Migrations and Master Data Management

Modern Data Integration Alternatives:

W H Y R E P L A C E _______?

21

W H Y R E P L A C E _______?

Why Replace _______?

• We often hear, “my company has already standardized

on __________, why should I replace it?

Answer:

� Save Money on Data Center Costs

� Accelerate Project Delivery / TTM

22

� Accelerate Project Delivery / TTM

� Supply Real Time Intelligence to the Business

� Reduce Batch Windows on Data Warehouse

� Unify Data Integration with SOA Plans

Save Money on Hardware/Data CenterE-LT runs on Small Commodity Servers as an Agent Process

Next Generation Architecture

E-LTE-LTLoadExtract

Transform Transform

Typical: Separate ETL Server• Proprietary ETL Engine, Poor Performance

• High Costs for Separate Standalone Server

E-LT: No New Servers• Lower Cost: Leverage Compute

Resources & Partition Workload efficiently

• Efficient: Exploits Database Optimizer

23

Conventional ETL Architecture

Extract LoadTransform

• Efficient: Exploits Database Optimizer

• Fast: Exploits Native Bulk Load & Other Database Interfaces

• Scalable: Scales as you add Processors to Source or Target

Benefits• Optimal Performance & Scalability

• Better Hardware Leverage

• Easier to Manage & Lower Cost

Speed Project Delivery/Time to MarketE-LT uses Declarative SQL-style Design + Simple Runtime

• Development Productivity• 40% Efficiency Gains

• Environment Setup (ex: BI Apps)• 33-50% Less Complex

Number of Setup Steps 7

Number of Servers 1

Number of connections 3

24

Number of Setup Steps 10

Number of Servers 3

Number of connections 7

Supply Real Time Business IntelligenceNon-invasive Capture + E-LT Processing

Application Real Time BI(using Data Copy)

Analytic BI(Facts & Dims)

Consistency Window

25

E-LT(Mini-Batch + Transforms)

Stage ProdLookup

DataSources

MetaLookup

Data

ETL Engine(s)

ETL Metadata

ETL engines

require BIG

H/W and heavy

parallel tuning

Main driver for batch

window is data integrity &

consistency; once lookup &

calc functions begin, DW

typically goes offline

Reduce Consistency Windows w/E-LTFewer Steps, Faster Xform, and Faster Loads vs. typical ETL

Extract Transform Load Lookups/Calcs Transform Load

26

DW isOnline

E-LT Batch Window

ETL Batch Window

Sources

Meta

Stage ProdLookup

Data Data Movement

E-LTAgent

Data Movement

Extract

Extract

Transform Load

Load

Extract Transform Load

Transform Load

Set-based SQL

transforms

typically faster

SQL Load

inside DB is

always faster

Uptime GainsTransform

*What About “Pushdown Processing”• Pushdown Processing is what the ETL vendors do to

compensate for bad performance – push the transformation

processing to the Database

• Both Pushdown & E-LT have in common:• uses the power of your Data Warehouse for maximum performance

• can combine engine-based operations with DB-based transformations to

accomplish any level of data transformation complexity

• can scale to any multi-TB level and using parallel processing

• Only E-LT can claim:

27

• Only E-LT can claim:• performance optimized for your Database – whichever DB you use

• operate without any new IT Hardware costs

• 100% Java-based

• easily embedded within your existing or planned SOA infrastructure

• is not a glorified scheduler that relies on PL-SQL, or other custom-coded

DB scripts to achieve maximal performance

• can entirely eliminate needless network-hops for remote data joins

• can operate with no additional energy drain in your Datacenter

Unified Management + Monitoring• Common Runtime – 100% Java

• Common Monitoring

Example Use Cases• Bulk Data Transformation (any2any)

• XML/EDI Large File Handling

• SOA-driven Business Intelligence

Unify E-LT Agent with SOA RuntimeBest of Breed Data Integration as a Shared SOA Service

28

High PerformanceETL & Replication

Any Data SourceData Warehouse

& OLAP

• SOA-driven Business Intelligence

• Load DW from SOA

• Unified Data Steward Workflow(ETL Error Hospital w/BPEL PM)

• ERP Migration, Replication / Loading

• Query Offloading & Zero Downtime

E-LT Frameworks are optimal architectures for:

• Business Intelligence

• Performance Management

• Database & OLAP

• Embedded Applications

• Application Integration

• Middleware Servers

Data Integration the:

O R A C L E S O L U T I O N

29

O R A C L E S O L U T I O N

Oracle Data Integration SolutionBest-in-class Heterogeneous Platform for Data Integration

MDMApplications

SOAPlatforms

OracleApplications

BusinessIntelligence

Activity Monitoring

Custom Applications

Oracle GoldenGate

SOA Abstraction Layer

Service BusProcess Manager Data Services

Oracle Data Integrator Oracle Data Quality

Data Federation

Comprehensive Data Integration Solution

30

Oracle GoldenGate

Log-based CDC

Bi-directional Replication

Real-time Data

Oracle Data Integrator

ELT/ETL

Data Transformation

Bulk Data Movement

OLTPSystem

Flat FilesData Warehouse/Data Mart

OLAP Cube Web 2.0 Web and Event Services, SOA

Storage

Data Verification

Oracle Data Quality

Data Profiling

Data Parsing

Data Cleansing

Data Lineage Match and Merge

Key Data Integration Products

• Comprehensive Integration

• ELT/ETL for Bulk Data

• Service Bus

• Process Orchestration

• Human Workflow

• Data Grid

• Heterogeneous E-LT & ETL

• High-speed Transformations

• OLAP Data Loading

• Data Warehouse Loading

• Real Time Data Replication

• Changed Data Capture

• DBMS High Availability

• Disaster Tolerance

31

• Business Data / Metadata

• Statistical Analysis

• Time Series Reporting

• Integrated Data Quality

• Cleansing & Parsing

• De-duplication

• High Performance

• Integrated w/ODI

• Data Service Modeling

• Query Federation

• Data Redaction

• Service Data Objects

Oracle Data Integrator Enterprise EditionOptimized E-LT for improved Performance, Productivity and Lower TCO

E-LT Transformation vs. E-T-L

Any Data Warehouse

Legacy Sources

32

Declarative Set-based design

Change Data Capture

vs. E-T-L

Hot-pluggable Architecture

Any Planning System

OLTP DB Sources

Application Sources

Pluggable Knowledge Modules

Oracle GoldenGate OverviewEnterprise-wide Solution for Real Time Data Needs

Log Based, Real-

Time Change Data

Capture

Disaster Recovery, Data Protection

Zero Downtime Migration and

Upgrades

Operational Reporting

Standby(Open & Active)

Reporting

• Standardize on Single

Technology for Multiple Needs

• Deploy for Continuous

Availability and Real-time Data

Access for Reporting / BI

33

Capture

Heterogeneous Source Systems

EDWODS

EDW

Reporting

Real-time BI

ReportingDatabase

OGG

ETL

ETL

Query Offloading

Data Distribution

• Highly Flexible

• Fast Deployments

• Lower TCO & Improved ROI

How Oracle GoldenGate WorksModular De-Coupled Architecture

Capture: committed transactions are captured (and can be

filtered) as they occur by reading the transaction logs.

Trail: stages and queues data for routing.

Pump: distributes data for routing to target(s).

Route: data is compressed,

encrypted for routing to target(s).

Delivery: applies data with transaction

integrity, transforming the data as required.

34

LAN/WANInternet

TCP/IP

Bi-directional

CaptureTrail

Pump DeliveryTrail

SourceDatabase(s)

TargetDatabase(s)

Govern Data Better with Data Quality

• Data Movement

– E-LT & ETL

– Data Transformation

– Change Data Capture

– Data Access

– Data Services

• Data Profiling

– Statistical Analysis

– Rule-based Validation

– Monitoring & Timeslice

– Fine-grained Auditing Data Movement

35

• Data Cleansing

• Data Validation during ETL

• Data Standardization

• Address Matching & Dedup

• Error Hospital / Workflow

Data Cleansing

Data Quality and Profiling

Data Integration

C O N C L U S I O N

36

C O N C L U S I O N

Traditional ETL + CDC

• Invasive Capture on OLTP

systems using complex Adapters

• Transformations in ETL engine

on expensive middle tier servers

• Continuous feeds from

operational systems

• Non-invasive data capture

• Thin middle tier with

Modern Data Integration ApproachHeterogeneous, Real-time, Non-Invasive, High Performance E-LT

Modern E-LT + Real-time

37

• Bulk load to the data warehouse

with large nightly/daily batch

transformations on the database

platform (target)

• Mini-batches throughout the day

or bulk processing nightly

Staging

Trickle

Lookup

Data

Load

Extract

Lookup

Data

Xform XformBulk

Ag

en

t

Ag

en

t

Heterogeneous

Questions

38

The preceeding is intended to outline our general

product direction. It is intended for information

purposes only, and may not be incorporated into any

contract. It is not a commitment to deliver any

material, code, or functionality, and should not be

relied upon in making purchasing decisions.

40

relied upon in making purchasing decisions.

The development, release, and timing of any

features or functionality described for Oracle’s

products remains at the sole discretion of Oracle.

top related