staging area in near real-time dwh - trivadis.com · solution for staging area in near real-time...

19
Solution for Staging Area in Near Real-Time DWH Efficient in Refresh and Easy to Operate Technical White Paper Mathias Zarick, Karol Hajdu Senior Consultants March-2011

Upload: buihanh

Post on 14-Feb-2019

218 views

Category:

Documents


0 download

TRANSCRIPT

Solution for Staging Area in Near Real-Time DWH –

Efficient in Refresh and Easy to Operate

Technical White Paper

Mathias Zarick, Karol Hajdu

Senior Consultants

March-2011

[email protected] . www.trivadis.com . Info-Tel. 0800 87 482 347 . 10.05.2011 . Page 2 / 19

While looking for a solution for near real-time data warehouse (DWH), the efficiency and

operational stability (reliability) is probably one of your primary technical concerns. This

applies both to the data transformation tasks and to the extraction & loading of staging

area as well. This article covers the challenges around the staging area. It presents a

technical solution which aims to solve both important concerns: the refresh process of

staging area is efficient and easy to operate. This solution is based on Oracle Data Guard

and transportable tablespaces.

Contents

1. The role of a Staging Area in Data Warehouse ................................................................. 3

1.1 The Challenge called “short latency” ...................................................................................................... 3

1.2 Different solutions having different advantages ................................................................................... 3

2. Solution with Data Guard – the management perspective ............................................... 5

2.1 Benefits for Data Warehouses .................................................................................................................... 5

2.2 Which types of Data Warehouses will benefit most? ........................................................................ 5

3. Solution with Oracle Data Guard – the technical insight ................................................. 6

3.1 How it works? .................................................................................................................................................. 6

3.2 The Key Advantages ...................................................................................................................................... 9

3.3 Technical Prerequisites .............................................................................................................................. 10

4. Take a „Tour“ on Real-Life Example ................................................................................ 12

4.1 Real-life example ......................................................................................................................................... 12

4.2 Setup and Configuration ........................................................................................................................... 13

4.3 Operation ....................................................................................................................................................... 14

5. Solution extension: If data availability for operational reporting matters ...................... 17

[email protected] . www.trivadis.com . Info-Tel. 0800 87 482 347 . 10.05.2011 . Page 3 / 19

1. The role of a Staging Area in Data Warehouse

In data warehouse architectures, there are some common good practices concerning the

staging area:

1. Create a staging area. After being extracted from source systems, the data is loaded

into the staging area. The staging area serves as the input for transformation

processes.

2. During the extraction and load into the staging area, only minimal data

transformations are done: the tables in the staging area have the same structure as

the corresponding tables in the source system. This makes the ETL architecture

much more transparent.

Based on the staging area’s content, the transformation and integration processes will produce:

- snapshots of data, serving as input for DWH’s versioning

- sets of change events (transactions) to be loaded into the DWH

1.1 The Challenge called “short latency”

In many enterprises, the Data Warehouse is the place where operative data originating from

different systems are coming together and are integrated with analytical or dispositive data.

Step-by-step, many business users discovered the value of integrated data. They use the data

stored in the Data Warehouse to create reporting or analytical applications.

As the markets in many lines of business get more and more volatile, the business users are

not willing to wait several days or hours for the latest figures. They require a shortened latency

of the Data Warehouse: the need for near real-time data warehouse was born.

Integration tasks involve both hardware resources and time. Hence, the Data Warehouse

architects faced a new challenge: to find a trade-off between “get more speed (data latency)”

and “provide integrated and cleansed data”. Some of them decided to introduce additional

redundancy (by creating an Operational Data Store, having short latency, but less integration).

Some of them decided to provide short latency only for very narrow and well specified content:

they speak about real-time data warehouse content, rather than a real-time data warehouse.

Regardless on which approach the Data Warehouse architect has chosen, he or she needs to

have a Staging Area with short latency. This is the subject covered by this white paper.

1.2 Different solutions having different advantages

There are several technical approaches how data extraction and the loading of a staging area

can be implemented. The technical implementations differ basically in the following

characteristics:

- transferred data volumes required to refresh the staging area

[email protected] . www.trivadis.com . Info-Tel. 0800 87 482 347 . 10.05.2011 . Page 4 / 19

- degree of completeness for changes to be captured

- performance impact on the source system (additional resource consumption)

- impact on data and service availability of source system

- total costs of ownership:

o licensing costs and development efforts

o operational complexity (efforts, reliability)

Concept / Implementation Data Volumes

to Transfer

Degree of

Completeness

Performance

Impact on

Source

Availability

Impact on

Source

Operational

Complexity

Full-Extraction Marker-Based Extraction Journal-Based Extraction Oracle Streams / / Oracle Golden Gate / / Table 1: Simple overview of refresh solutions for DWH’s staging area

Depending on database technology of the source system, some concepts can be excluded

right from the start, because they have very specific prerequisites about supported

technologies.

For more details about the refresh solutions for Staging Area, please refer to the book ”Data

Warehousing mit Oracle – Business Intelligence in der Praxis” [3], Chapter 3.4.

If the source system is Oracle, there is yet another technical solution to extract the data from

source systems and load it into staging area. This approach uses Oracle Data Guard, flashback

and transportable tablespaces.

This solution has the same advantages common for any other replication techniques like

Oracle Streams or Oracle GoldenGate:

- small data volumes to be transferred

- low impact on source system’s performance & availability1

- all types of changes on both data & structures are captured and transferred

However, there is one important difference:

This new solution has significantly lower operational complexity than Oracle Streams or

Oracle GoldenGate!

Concept / Implementation Data Volumes

to Transfer

Degree of

Completeness

Performance

Impact on

Source

Availability

Impact on

Source

Operational

Complexity

New Solution using Data Guard and

transportable tablespaces

Table 2: New solution has significantly lower operational complexity than Streams or GoldenGate

This white paper explains the concepts and provides the most important implementation

details presented in form of a real-life example.

1 This solution has even less impact on source database than Oracle Streams or Oracle GoldenGate. Reason: the supplemental logging is not required here.

[email protected] . www.trivadis.com . Info-Tel. 0800 87 482 347 . 10.05.2011 . Page 5 / 19

2. Solution with Data Guard – the management perspective

2.1 Benefits for Data Warehouses

Our experience shows that the solution described in this paper brings the following benefits for

Data Warehousing:

Benefit How is this achieved?

Short latency of data stored

in DWH or ODS

(to near real-time)

The refresh process of Staging Area and/or Operational Data

Store (ODS) is very efficient.

It consumes small amount of HW resources.

It terminates in short elapsed time.

Shorter time-to-market

for new ETL functionality

Solution enables that Staging Area contains full set of data

(not only the changed records). This makes the ETL

application more transparent. Introducing changes in ETL

applications is then less complex.

Easy and stable operation While refreshing the tables in Staging Area, the operational

complexity is delegated to standard and reliable Oracle

products and features. These features are easy to operate.

2.2 Which types of Data Warehouses will benefit most?

Extraction and sourcing from dedicated online transaction applications which are used to

manage complex relationships between customers, suppliers, accounts or delivery

components (applications like CRM or SCM2) can be very hard. The underlying database

schema of these applications is related with complex data models3. Companies using

dedicated CRM or SCM applications often have to manage the life cycle of several millions of

individual subjects (like customers, suppliers, contracts, product components, stock keeping

units, policies etc).

A Staging Area – or even an Operational Data Store (ODS) – if using the solution described in

this paper, takes most benefits if:

- The source system has huge data volume with complex relationships, having

relatively small rate of data changes.

- There are reports with short latency requirements. The Staging Area needs to

capture the changes made in online applications with a very short latency.

2 Supply Chain Management 3 a lot of relationships between the tables

[email protected] . www.trivadis.com . Info-Tel. 0800 87 482 347 . 10.05.2011 . Page 6 / 19

3. Solution with Oracle Data Guard – the technical insight

3.1 How it works?

The solution presented in this article is based on Oracle Data Guard.

Data Guard technology maintains standby databases which are copies of primary databases.

A Data Guard’s standby database can also be used for refreshing the staging area in a data

warehouse.

The main idea is based on the Data Guard’s ability to open a physical standby database

temporarily read-write and the ability to rewind it back to the time when it was opened. This is

achieved by using Oracle guaranteed restore point and flashback technology.

How can this be used for refreshing a staging area?

Let’s explain it using the Figure 1. On the data warehouse machine (host DWH), a physical

standby for the database OLTP will be configured. The primary database of OLTP is on host

OLTP.

This setup leads to the following situation: Using the Data Guard functionality, any change

done on the primary database will be performed on the standby database as well.

OLTP_SITE2OLTP_SITE1

tablespace CRM

datafile crm01OLTP.dbf

physical standby

database OLTP

Standby Site

database DWH

Primary Site

primary

database OLTP

host OLTP host DWH

Staging Area CORE DWH

Redo

Transport

tablespace CRM

Archived

Redo Logs

Archived

Redo Logs

Standby Redo Logs

Online Redo Logs

Figure 1: On the DWH machine, a physical standby database of OLTP is configured with Data Guard

[email protected] . www.trivadis.com . Info-Tel. 0800 87 482 347 . 10.05.2011 . Page 7 / 19

Reading from the Staging Area:

As soon as an ETL process inside the DWH database needs to read the content out of the

staging area, the following action will be taken:

- Recovery process on the standby is paused.

- Physical standby database is converted to a snapshot standby database. This opens

the standby database read write.

- Using the transportable tablespaces feature, the tablespace CRM of the snapshot

standby database is plugged into the database DWH:

o The tablespace CRM in the snapshot standby database is set to read only

mode.

o The metadata (definition of tables, indexes, etc.) of this tablespace is

transferred with data pump from the snapshot standby database to the

DWH database4.

- Datafile crm01OLTP.dbf is now part of both databases (snapshot standby database

OLTP and database DWH). In both databases the tablespace is in read only mode.

- The ETL process can read the data out of the staging area.

OLTP_SITE2

Archived

Redo Logs

OLTP_SITE1

Archived

Redo Logs

read only access

tablespace CRM

read only access

tablespace CRM

datafile crm01OLTP.dbf

Standby Site

database DWH

Primary Site

primary

database OLTP

host OLTP host DWH

Staging Area CORE DWH

Standby Redo Logs

Redo

Transport

Online Redo Logs

tablespace CRM

physical / snapshot standby

database OLTP

Figure 2: On the DWH machine, datafile crm01OLTP.dbf is part of both databases (read-only)

4 For convenient handling of this transfer with data pump a database link can be used.

[email protected] . www.trivadis.com . Info-Tel. 0800 87 482 347 . 10.05.2011 . Page 8 / 19

Refreshing the Content in the Staging Area:

As far as there is a need to read a more current content, that means there is a need to refresh

the CRM part of the staging area, the following action will be taken:

- The plugged-in tablespace CRM is dropped from the DWH database.

- The snapshot standby database is converted back to a physical standby database.

- This resumes the recovery process of all its datafiles, including those of the

tablespace CRM.

OLTP_SITE2OLTP_SITE1

recovery

tablespace CRM

dropped

tablespace CRM

datafile crm01OLTP.dbf

Standby Site

database DWH

Primary Site

primary

database OLTP

host OLTP host DWH

Staging Area CORE DWH

Redo

Transport

tablespace CRM

physical / snapshot standby

database OLTP

Archived

Redo Logs

Archived

Redo Logs

Standby Redo Logs

Online Redo Logs

Figure 3: Tablespace CRM is dropped from DWH database; standby database is converted back to physical

standby

[email protected] . www.trivadis.com . Info-Tel. 0800 87 482 347 . 10.05.2011 . Page 9 / 19

3.2 The Key Advantages

This solution has the following Key Advantages:

- Staging area contains the full set of data.

- No additional workload on the host OLTP.

- Datafiles with full set of data are neither transferred nor copied.

o Volume of data transferred between the OLTP and DWH is determined merely

by the volume of data changes (size of archived redo logs).

- Elapsed time of refresh process of the staging area – this represents the refresh of the

standby database - does not include the elapsed time to copy the archived redo logs

from host OLTP to DWH:

o The standby site is able to receive logs from the primary database, in both the

physical standby mode and in the snapshot standby mode. In the snapshot

standby mode, the logs are queued and not applied.

o Since the log transport to the standby site is running all the time, as the

recovery process resumes, the outstanding archived redo log files are already

registered and available for the recovery5 of the physical standby database.

- Elapsed time for refresh process of staging area does not depend on tablespace size

but only on the volume of data changes since the last refresh.

- Once configured, both the operation of physical standby databases and the operation

of transportable tablespaces are easy to handle and maintain.

- Neither remote queries nor distributed joins are used.

- On the DWH database the access methods to the data residing in the transported

tablespace(s) can be adjusted as follows:

o estimation of additional statistics like histograms

o manipulation of statistics

o creation of additional data structures like indexes or materialized views

Considering the overhead produced on the source system and the workload produced on the

DWH machine, the solution presented in this article is the most efficient one.

- Only the redo logs, and no additional structures, are used

- it works on the level of “changes on data blocks” and not on the level of “SQL

statements”

5 Transported redo logs are applied in physical standby mode only.

[email protected] . www.trivadis.com . Info-Tel. 0800 87 482 347 . 10.05.2011 . Page 10 / 19

In case the refresh of the staging area is the only purpose of the standby database on the

DWH machine, the elapsed time for the refresh process can be minimized by narrowing the

scope of the recovery process on the standby database to only those tablespaces of OLTP

database, which need to be read by the ETL application.

Usually, the ETL processes in DWH require other index types than an OLTP application. If

indexes of an OLTP schema reside in a separate tablespace, excluding them can boost the

recovery process.

Exclusion of irrelevant tablespaces can be easily achieved by offlining and deleting their

datafiles on the standby database6.

The standby database on the DWH machine can be configured to serve two purposes at the

same time: both for refresh of staging area and for disaster protection of OLTP database.

While considering this approach, be aware of the following impacts:

- A standby database with offline datafiles cannot be used for disaster protection.

- If MaxAvailability or MaxProtection is considered then the availability or the workload

on the DWH machine can impact the availability or the performance of the OLTP

database.

3.3 Technical Prerequisites

There are some technical prerequisites which have to be fulfilled, in order to be able to use the

solution described.

These prerequisites can be grouped in the following categories:

- Identical database character set

- Self-contained tablespace sets

- Required Oracle database releases

- Required Oracle licenses

3.3.1 Identical Database Character Set

In order to use transportable tablespaces the database OLTP and the database DWH must

have identical database character set and identical national character set.

3.3.2 Self-contained Tablespace Sets

In order to be able to transport a set of tablespaces it needs to be self contained. This means

that you cannot transport a set of tablespaces which contain objects with other dependent

objects such as materialized views, table partitions, etc. as long as you transfer all those objects

together in one set7.

6 Tablespaces that are needed for opening the database like system, sysaux and undo cannot be excluded. 7 Segmentless objects like sequences, views, pl/sql packages are not transferred with transportable tablespaces. Normally you don’t need to transfer them into Staging Area anyway.

[email protected] . www.trivadis.com . Info-Tel. 0800 87 482 347 . 10.05.2011 . Page 11 / 19

3.3.3 Required Oracle Database Release

The OLTP database needs to be operated with Oracle Database release 10g or higher.

Oracle 11g is recommended as the snapshot standby database feature is available as of this

release.

If using Oracle 10g it would be necessary to emulate this functionality manually by creating a

guaranteed restore point on the standby database before opening it read write. The following

limitations have also to be considered when running with Oracle 10g:

- There is no out-of-the-box handling with Data Guard for this functionality. You will

need to develop a piece of code – but this is quite straight forward.

- The redo transport between primary and standby is stopped during the period when

the standby is open read write8.

In order to use transportable tablespaces in this context, the DWH database needs to be at

same or higher release as the OLTP database.

3.3.4 Required Oracle Licenses

This solution requires Oracle Enterprise Edition licence both for the OLTP host and for the

DWH host. All required features like Data Guard, transportable tablespaces and snapshot

standby database are included in the Enterprise Edition license.

No additional option is required for this solution - neither the Active Data Guard9 option nor

the Partitioning option.

8 As mentioned before with a snapshot standby database as of 11g the log transport stays active all the time. 9 Active Data Guard is a new extra licensable option with 11g which includes real-time query and fast

incremental backup. None of these features is required by the described solution.

[email protected] . www.trivadis.com . Info-Tel. 0800 87 482 347 . 10.05.2011 . Page 12 / 19

4. Take a „Tour“ on Real-Life Example

To demonstrate our approach on a representative sample we will use an excerpt from the

database schema of CRM application called Siebel. We took Siebel to improve the readability

of this example. Siebel is a widely used CRM application owned by Oracle and hence there is a

higher chance that ETL developers are familiar with the data model behind it.

It is important to understand that the described solution works with any other system or

application, even a non-standard in-house developed SW application, too10.

We took the Siebel tables S_CONTACT, S_ORG_EXT and S_ASSET as the representatives for a

set of approximately 15 Siebel tables having complex relationships and high cardinality.

4.1 Real-life example

Consider the following common Data Warehouse situation:

Transformation processes have to read the content out from Siebel tables and transform it into

a new entity, let’s call it “Customer Subscription” (refer to Figure 3).

Figure 3: Transformation process reads Siebel tables and transform the data into new entity “Customer

Subscription”

The Data Warehouse has to store not only the latest status of the “Customer Subscriptions”,

but also all the historical values. The ETL has to compare the new snapshot of “Customer

Subscriptions” with the latest one and – in case of changes – create new versions which will

keep track of the fragmented history. This concept is known as versioning -refer to Figure 4.

10 as long as the data is stored in Oracle RDBMS

[email protected] . www.trivadis.com . Info-Tel. 0800 87 482 347 . 10.05.2011 . Page 13 / 19

database DWH

Staging Area CORE DWH

tablespace CRM

S_ASSET S_ORG_EXT

S_CONTACT

C_CUST_SUBSCRIPTION

Derive

Customer SubscriptionNew rows

Updated rows

Deleted rows

DELTA

Historize Subscription

0.5 mio rows

0.5 mio rows

2.5 mio rows

Latest Snapshot

Highest Version

from History- create new version

- close version

Figure 4: ETL compares new snapshot of “Customer Subscriptions” with the latest one and – in case of

changes – creates new versions which keep track of the fragmented history

Consider the following design decisions of a Data Warehouse architect:

- Due to the many inner joins and filters inside the query, the Staging Area needs to hold

the full set of data.

- transferring millions of rows from source system to Staging Area every night is not an

option

- in the source system, no reliable row-markers or journals exist or can be introduced

- architect decided to use the solution described in this white paper

Because of the high cardinality of the data set (several millions of rows) good scalability of the

underlying database11 is assumed.

In the next sections we will present the most important steps to build and operate this solution.

4.2 Setup and Configuration

The Oracle Data Guard has been setup as described above in chapter 3.

On both the OLTP database and the DWH database, we used Oracle Database 11.2.0.2.0.

We created a Data Guard Broker configuration. We left the protection mode on Maximum

Performance (default) and set the log transport to asynchronous.

11 including the physical data model of CORE DWH

[email protected] . www.trivadis.com . Info-Tel. 0800 87 482 347 . 10.05.2011 . Page 14 / 19

To enforce logging on database OLTP, we issued the following statement:

ALTER DATABASE FORCE LOGGING;

This causes every12 attempt of an unrecoverable nologging operation to be logged anyway.

4.2.1 Create Role with Common Name in Both Databases

In both the database DWH and the database OLTP, the role dwh_sa_crm_role was created:

CREATE ROLE dwh_sa_crm_role;

Grant the SELECT privilege on Siebel tables to this role in the OLTP database. GRANT SELECT ON s_contact TO dwh_sa_crm_role;

GRANT SELECT ON s_org_ext TO dwh_sa_crm_role;

GRANT SELECT ON s_asset TO dwh_sa_crm_role;

You will also need to create the owner of the transported tables on the DWH database: CREATE USER crm IDENTIFIED BY thisIsASecretPassword;

Neither a create session nor a create table privilege is necessary for this user.

4.3 Operation

Let’s take a look on operation of this solution.

From the point of view of the CRM data in the staging area, there are two main operational

states:

- Snapshot of latest CRM data is available in staging area (Status A)

- Refresh of CRM data in staging area is in progress (Status B)

time

OLTP CRM users are changing the operative data (24/7)

A: Snapshot of latest CRM data

in Staging Area is available for readDWH

B: Refresh of CRM data in Staging Area

Figure 5: Two main operational states for CRM data in the Staging Area

Most of the time, the CRM data in the staging area is available for read (Status A). Sometimes

you will need to refresh the data in the staging area: during this period, the data is not

available (Status B).

12 Like for any other change in database instance parameters: an impact analysis is required before making this change.

[email protected] . www.trivadis.com . Info-Tel. 0800 87 482 347 . 10.05.2011 . Page 15 / 19

Transitions between these two operational states are usually triggered by one of the following

two events

- ETL processes need more current CRM data (A to B)

o This event triggers the start of refresh process

o The goal of the refresh process is to achieve a given (defined) point in time of

the snapshot

- ETL processes need to read CRM data again (B to A)

o As soon as the CRM data in physical standby is current enough the refresh

process will be terminated and transition to the Status A will be taken

o This event triggers the immediate termination of the refresh process and causes

transition to status A.

In the next paragraphs we will describe:

- the actions related with the termination of the refresh process and

- the actions related with the start of the refresh process

4.3.1 Termination of Refresh Process

As long as the refresh process is in progress, the CRM data in the staging area is not available.

The datafiles of the tablespace CRM13 on the host DWH are currently exclusively “assigned” to

the physical standby database of OLTP called OLTP_SITE2 for recovery.

In order to terminate the refresh process the following sequence of actions is taken:

Firstly the physical standby is converted to a snapshot standby database. This is performed as

follows:

DGMGRL> connect sys@OLTP_SITE2

Password:

Connected.

DGMGRL> convert database 'OLTP_SITE2' to snapshot standby

Converting database "OLTP_SITE2" to a Snapshot Standby database, please wait...

Database "OLTP_SITE2" converted successfully

Secondly the tablespace is set to read only and plugged into the DWH database with data

pump via a database link from DWH to the snapshot standby database.

SQL> alter tablespace crm read only;

# impdp system@DWH logfile=imp_crm.log network_link=OLTP_SNAP

transport_tablespaces=CRM transport_datafiles=d:\oradata\oltp\crm01oltp.dbf14

13 Of course this concept can be extended to transfer multiple tablespaces. 14 If running 11.2.0.2 due to Oracle Bug 10185688 it is required that either XDB is loaded into source Database or related patch is applied.

[email protected] . www.trivadis.com . Info-Tel. 0800 87 482 347 . 10.05.2011 . Page 16 / 19

In order to transport the metadata you may also use other alternatives like:

- export the metadata with data pump to a dump file and import from that dump file

instead of using a database link

- export / import with classical exp/imp15

- initiate data pump directly with PL/SQL, see [2] for details

When transferring the metadata you can also decide whether to include or exclude certain

tables. You can also choose whether to import indexes, object privileges, table triggers and

table constraints.

As the last step a deterministic function is created in DWH database. Return value of this

function reflects the timestamp of the CRM data. We used the following PL/SQL code:

declare

sql_text varchar2(1000);

v_timestamp varchar2(20);

begin

select to_char(timestamp,'DD.MM.YYYY HH24:MI:SS') into v_timestamp

from (select timestamp from gv$recovery_progress@OLTP

where item = 'Last Applied Redo'

order by start_time desc )

where rownum < 2

;

dbms_output.put_line('timestamp is ' || v_timestamp);

sql_text := 'create or replace function crm.SA_CRM_SNAPSHOT_TIMESTAMP return date

deterministic is

ts date;

begin

select to_date ('''|| v_timestamp ||''', ''DD.MM.YYYY HH24:MI:SS'')

into ts from dual;

return ts;

end;';

execute immediate sql_text;

execute immediate 'GRANT EXECUTE ON crm.SA_CRM_SNAPSHOT_TIMESTAMP to

DWH_SA_CRM_ROLE' ;

end;

/

Listing 1: In the DWH database, this creates a function which returns the timestamp for data in the CRM

tablespace

This function is used by ETL processes during the “versioning” operation (Figure 4): it is used to

build the value for VALID_FROM attributes of new versions and for VALID_TO of versions to be

closed.

15 Deprecated with 11g but worked in our case.

[email protected] . www.trivadis.com . Info-Tel. 0800 87 482 347 . 10.05.2011 . Page 17 / 19

4.3.2 Start of Refresh Process

In order to start the refresh process the following sequence of actions has to be taken:

Firstly, the tablespace CRM has to be dropped from the DWH database. After this any queries

on the data will fail as it is not available. Dependent views, synonyms, stored PL/SQL

procedures etc. get invalid.16

SQL> drop tablespace crm including contents;

DGMGRL> convert database 'OLTP_SITE2' to physical standby

If you need to check whether your physical database with CRM tablespace in OLTP_SITE2 is

again current enough in order to be used for the next integration load cycle you can easily

query the Data Guard Broker: DGMGRL> show database 'OLTP_SITE2';

Database - OLTP_SITE2

Role: PHYSICAL STANDBY

Intended State: APPLY-ON

Transport Lag: 0 seconds

Apply Lag: 11 minutes 23 seconds

Real Time Query: OFF

Instance(s):

oltp

Database Status:

SUCCESS

As the refresh is a parallel media recovery the process is very efficient.

Media recovery works block change oriented and is much faster and less resource consuming

than the mechanisms of GoldenGate and Streams where SQL is extracted and processed row

by row.

The presented real-life example demonstrates clearly the high efficiency and the easy and

stable operation of this solution.

5. Solution extension: If data availability for operational reporting

matters

There is yet another challenge for today’s DWH architects: Where to place the Operational

Reporting?

- The OLTP database is becoming a less and less suitable place due to the heavy

workload related with complex query logic inside the Operational Reports

- Many Operational Reports query not only the data residing in OLTP system, but also

additional analytical attributes which are typically stored in a Core DWH.

With the solution presented in this white paper, the DWH architect can consider to use the

data residing in the Staging Area for Operational Reporting17 as well. As this data resides in the

16 They get valid automatically again when they are used after the tablespace reappears in next cycle.

[email protected] . www.trivadis.com . Info-Tel. 0800 87 482 347 . 10.05.2011 . Page 18 / 19

data warehouse database, it can be joined with analytical attributes in Core DWH without

performance impacts (no distributed queries).

However, one point has to be taken into consideration: During the refresh, the data in Staging

Area is not available (refer to Figure 5). This unavailability needs to be eliminated.

The snapshot functionality of operating system and/or storage facility can be used to

overcome this.

The concept: After the standby database is turned into snapshot standby and the tablespaces

are set to read only, snapshots of data files will be created. These snapshots are then plugged

into the DWH database, instead of the standby databases data files.

This will result in two advantages:

- the data in the Staging Area is available almost all18 the time

- the recovery of the tablespaces can go on as the standby database can be converted

from snapshot standby back to a physical standby right after taking the snapshot of the

data files. This will achieve an even shorter latency for the refresh cycles.

Note: the snapshots do not copy the data. The data is presented a second time.

Later changes are tracked for both sets of data: origin and snapshot.

This is known as copy on write mechanism (COW).

Examples for OS side snapshotting:

With ZFS on Solaris you have a feature of taking copy on write snapshots. It is also possible

with Veritas file system, LVM snapshots in Linux and Microsoft Volume Shadow Copy on

Windows. SAN and NAS Systems also offer snapshotting features that work with COW

mechanism.

By using the knowhow of Trivadis, we believe it is possible to reduce operating costs and the

complexity of your data warehouse: proper design is what matters!

Kontakt

Karol Hajdu [email protected]

Mathias Zarick [email protected]

Trivadis Delphi GmbH

Millennium Tower

Handelskai 94-96

A-1200 Vienna

Tel.: +43 1 332 35 31 00

www.trivadis.com

Please contact us if you need more information or help with your setup.

17 at least for that part of reporting where the integrity level of data in Staging Area is sufficient. 18 short downtime will still occur during tablespace drop and re-plugin

[email protected] . www.trivadis.com . Info-Tel. 0800 87 482 347 . 10.05.2011 . Page 19 / 19

Literature and Links

[1] Oracle® Data Guard Concepts and Administration,

http://download.oracle.com/docs/cd/E11882_01/server.112/e17022/toc.htm

[2] Oracle® Database PL/SQL Packages and Types Reference – Chapter 46,

http://download.oracle.com/docs/cd/E11882_01/appdev.112/e16760/d_datpmp.htm

[3] Data Warehousing mit Oracle – Business Intelligence in der Praxis. Chapter 3.4. Jordan et al. Hanser.

2011.