modeling and querying multidimensional data sources in siebel analytics kazi a. zaman donovan a....

28
Modeling and Querying Multidimensional Data Sources in Siebel Analytics Kazi A. Zaman Donovan A. Schneider [email protected] [email protected]

Post on 19-Dec-2015

223 views

Category:

Documents


0 download

TRANSCRIPT

Modeling and Querying Multidimensional Data Sources in Siebel Analytics

Kazi A. Zaman Donovan A. [email protected] [email protected]

© 2005 Siebel Systems, Inc. Confidential. 2

Structure of Talk

Challenges of federating relational and multidimensional data sources

Overview of Multidimensional data sources

Overview of Siebel Analytics Architecture

Our approach to solving the problem

Issues with multi vendor support

Conclusions and Future Work

© 2005 Siebel Systems, Inc. Confidential. 3

Why federating multidimensional sources is important

Enterprises have a multitude of data sources

Not always consolidated in a single data warehouse

Cubes (OLAP systems) are best suited for certain applications: e.g. budgeting

Many important business questions require information from both relational and multidimensional systems Budgets vs. actuals Real time Reporting: HR system data integrated with sales

pipeline data

© 2005 Siebel Systems, Inc. Confidential. 4

Multidimensional Data Sources

Highly aggregated view of data, primarily used for analysis

Provides a dimensional view of data

Prominent examples: Microsoft Analysis Services, Hyperion, SAP/BW

Cubes: Storage mechanism not necessarily MOLAP

Query Language: Vendor specific interfaces, MDX

Access Mechanisms: Vendor specific Interfaces (e.g. BAPI), ODBO, XMLA

© 2005 Siebel Systems, Inc. Confidential. 5

Key Differences from Relational Systems

Rich metadata exposed: Dimensions, hierarchies, levels, measures

Specialized language constructs for manipulating this metadata: Ancestors(), Descendants()

Query results are multidimensional datasets- not rowsets

Ability to specify complex multi pass calculations

Special functionality for time series calculations

© 2005 Siebel Systems, Inc. Confidential. 6

Siebel Analytics Server

Analytics Server is a federated system Supports rich data sources: Relational(DB2, Oracle, SQL Server,

Teradata), OLAP (Analysis Services, SAP/BW), XML Supports rich schemas (OLTP, DW)

Executes queries specified against a logical business model containing data warehousing constructs

Analytics Server translate logical queries to queries against one or more backend data sources

Design goal to push as much processing to back end data sources

Carries out post processing on joined query results

Does not have its own storage layer

© 2005 Siebel Systems, Inc. Confidential. 7

Query Processing Overview

Optimizer/Compiler (Rewrite Rules)

Code Generator

Navigation

Presentation LayerPresentation Layer

Business Model & Mapping Layer

•Dimensions•Hierarchies•Measures•Alternative Sources•Partitioning•Aggregation Rules•Time Series

Business Model & Mapping Layer

•Dimensions•Hierarchies•Measures•Alternative Sources•Partitioning•Aggregation Rules•Time Series

Physical Layer

•Security•Connections•DB Features•Schema

Physical Layer

•Security•Connections•DB Features•Schema

Repository -- Metadata Repository -- Metadata

Generated access planand Initial SQLGenerated access planand Initial SQL

Optimized SQL based ontarget databasesand DB Features tablesalso perform optimizationto improve efficiencies

Optimized SQL based ontarget databasesand DB Features tablesalso perform optimizationto improve efficiencies

Generate physical SQLFor external sources andInternal plan for operationsthat must be executed inthe server, including Parallelization, sorting, etc.

Generate physical SQLFor external sources andInternal plan for operationsthat must be executed inthe server, including Parallelization, sorting, etc.

© 2005 Siebel Systems, Inc. Confidential. 8

Requirements for federating multidimensional sources

Model multidimensional data sources in physical layer of metadata

Mark fragments of a federated query plan for execution at a multidimensional source based on source capabilities

Generate MDX from the relational query plan fragment (SQL to MDX translation)

Ability to convert multidimensional result set into two dimensional rowset

© 2005 Siebel Systems, Inc. Confidential. 9

Challenges

SQL has a relational model, MDX multidimensional

We convert the multidimensional model to relational

Lose full power of multidimensional model

SQL : open world : Country = “USA”

MDX closed world : Geography.[USA]

If no such member, query will fail.

© 2005 Siebel Systems, Inc. Confidential. 10

Metadata Modeling: Cubetables

Cube with 2 hierarchies and 2 measures

Time: Year -> Quarter -> Month

Geography: State -> City

Measures: profit, sales

Cube Table T

(Year, Quarter, Month, State, City, Profit, Sales)

Hierarchy, level , agg rule info is preserved

© 2005 Siebel Systems, Inc. Confidential. 11

Metadata screenshot

© 2005 Siebel Systems, Inc. Confidential. 12

Rowset Creation from Multidimensional Result Sets

MDX result sets consist of dimensional members on axes and measures in delimited cells.

SELECT

{[Measures].[Sales]} on COLUMNS,

{Crossjoin({[Year].Members},

{[Products].[Soda].Members}) on ROWS

FROM [Sales]

Generate only 2 dimensional queries

Measures on COLUMNS, dimensions on ROWS

Sales

1997 Coke 100

1998 Coke 200

© 2005 Siebel Systems, Inc. Confidential. 13

Transforming the Intermediate Rowset

Intermediate rowset may need further transformation Number of columns in rowset may differ from number of requested

columns Ordering of columns in rowset may differ from requested order.

Protocols for intermediate rowset transformation A simple example protocol maps intermediate column indexes to

columns in the final rowset (1, 2, 3) : select year, product, sum (sales) from T group by year,

product (3, 2, 1): select sum (sales), product, year from T group by year,

product Different protocols for different data sources/ MDX generation

algorithms

© 2005 Siebel Systems, Inc. Confidential. 14

MDX Code Generation

Effectively SQL to MDX translation along with rowset creation protocol data

Makes use of cubetable specific metadata – hierarchies and levels

Different code generation strategies for different SQL templates

Support as wide a set of SQL templates as possible

Generate efficient MDX – lack of mature optimizers in multidimensional data sources

© 2005 Siebel Systems, Inc. Confidential. 15

MDX Generation Examples

SELECT c1, c2…, aggr(m1), aggr(m2)

FROM Table

WHERE <conditions>

GROUP BY c1, c2….

HAVING <conditions>

Goal to translate entire SQL template to efficient MDX

Metadata Information T (Store Country, Store State, Year, Quarter, Unit Sales)

Aggregation Rule: SUM

© 2005 Siebel Systems, Inc. Confidential. 16

Multiple dimensions plus measure with matching aggregate rule

SQL

Select “Store Country”, Year, SUM(Unit Sales)From TGroup By “Store Country”, Year

MDX

Select

{[Unit Sales]} on columns,

{ nonemptycrossjoin([Store Country].members, [Year].members)} on rows

From [Sales]

© 2005 Siebel Systems, Inc. Confidential. 17

Measure with non-matching aggregate rule

Select “Store Country”, Year, AVG(Unit Sales)From TGroup By “Store Country”, Year

with

set [A] as '{[Store Country].members}'

set [B] as '{[Year].members}'

set [C] as 'nonemptycrossjoin({[A]},{[B]})'

member [measures].[MS1] as 'AVG(nonemptycrossjoin(Descendants(Store.currentmember,[Store State]), Descendants(Time.currentmember,[Quarter]) ),[Unit Sales])'

select

{[MS1]} on columns,

{[C]} on rows

from [Sales]

© 2005 Siebel Systems, Inc. Confidential. 18

Matching aggregate rule, predicate refers to GROUP BY columns

Select “Store Country”, Year, SUM(Unit Sales)From T Where “Store Country” In (‘USA’, ‘India’) AND Year = ‘1997’Group By “Store Country”, Year

with

set [A] as '{filter([Store Country].members, Store.currentmember.name = "USA" OR Store.currentmember.name = "India")}'

set [B] as '{filter([Year].members, time.currentmember.name = "1997") }'

set [C] as 'nonemptycrossjoin({[A]},{[B]})'

select

{[Unit Sales]} on columns,

{[C]} on rows

from [Sales]

© 2005 Siebel Systems, Inc. Confidential. 19

Multiple levels of a dimension plus measure with matching aggregate rule, predicates refers to both levels

Select “Store Country”, “Store State”, SUM(Unit Sales)From T Where “Store Country” = ‘USA’ AND “Store State” In (‘CA’,’ OR’)’Group By “Store Country” , “Store State”

with

member [measures].[CountryAnc] as 'ancestor(Store.Currentmember,[Store Country]).name'

set [A] as 'filter({[Store Country].members},Store.currentmember.name = “USA”)‘

set[B] as'Filter( Generate({[A]},Descendants([Store].currentmember,[Store].[Store State])), [Store].currentmember.name= "CA" OR [Store].currentmember.name= "OR" )'

© 2005 Siebel Systems, Inc. Confidential. 20

Continued…..

select

{[Measures].[CountryAnc], [Measures].[Unit Sales]} on columns,

{[B]} on rows

From

[Sales]

© 2005 Siebel Systems, Inc. Confidential. 21

Multiple levels of a dimension plus measure with matching aggregate rule, predicate refers to columns not in project list

Select Store Country, Store State, SUM(UnitSales)From T Where Year = ‘1997’Group By Store Country, Store State

© 2005 Siebel Systems, Inc. Confidential. 22

Multiple levels of a dimension plus measure with matching aggregate rule, predicate refers to columns not in project list

Slicer used:

with

member [measures].[CountryAnc] as 'ancestor(Store.Currentmember,[Store Country]).name'

set [A] as '{[Store State].members}'

select

{[Measures].[CountryAnc],[Unit Sales]} on columns,

{ [A]} on rows

From

[Sales]

Where ([1997])

© 2005 Siebel Systems, Inc. Confidential. 23

Multiple levels of a dimension plus measure with matching aggregate rule, predicate refers to columns not in project list

with

member [measures].[CountryAnc] as 'ancestor(Store.Currentmember,[Store Country]).name'

member [Measures].[YearAnc] as 'ancestor([Time].Currentmember,[Time].[Year]).name'

set [A] as '{[Store State].members}'

set [B] as '{[Time].[Year].members} '

member [measures].[MS1] as 'SUM(filter(nonemptycrossjoin(Descendants(Store.currentmember,[Store State]), {[B]} ), [Time].currentmember.name="1997",[Unit Sales])'

select

{[Measures].[CountryAnc],

[Measures].[MS1]} on columns,

{[A]} on rows

From

[Sales]

© 2005 Siebel Systems, Inc. Confidential. 24

Dimension plus measure with matching aggregate rule with HAVING clause

Select “Store Country”, SUM(Unit Sales)From TGroup By “Store Country”

Having SUM(Unit Sales) > 10000

select

{[Unit Sales]} on columns,

Filter({ [Store Country].members}, 10000 < [Unit Sales])

on rows

from

[Sales]

© 2005 Siebel Systems, Inc. Confidential. 25

Multiple Vendor Support

MDX and XMLA support varies widely from vendor to vendor

Caption names vs Unique Names

Classes of hierarchies supported

Treatment of Properties

Using ancestor within a calculated member

Metadata returned <structure> Cardinality of levels

© 2005 Siebel Systems, Inc. Confidential. 26

Captions vs Member Names

Caption : USA Member Name: PG2003012

MDX queries use Member Name not caption

Incoming SQL uses Caption not member name

Member Name is 7 bit ASCII

Need to convert between captions & member names

Solution: cache mappings between member names and captions on demand

Affects class of predicates pushed (no more >, <)

© 2005 Siebel Systems, Inc. Confidential. 27

Conclusions and Future Work

Ability to handle multidimensional and relational data in a single framework

Generate efficient MDX queries for best performance

Varying vendor support requires differing MDX code generation and intermediate rowset processing strategies

Support for larger number of vendors, wider class of SQL, parent-child hierarchies

© 2005 Siebel Systems, Inc. Confidential. 28