olap functions suport in informix

Post on 16-Feb-2017

373 Views

Category:

Data & Analytics

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

OLAP Functions Support in Informix 12.1

Bingjie MiaoIBM

1

Agenda

• What is OLAP• OLAP functions in Informix– the OVER clause– supported OLAP functions

• Questions?

What is OLAP?• On-Line Analytical Processing• Commonly used in Business

Intelligence (BI) tools– ranking products, salesmen, items, etc– exposing trends in sales from historic data– testing business scenarios (forecast)– sales breakdown or aggregates on multiple

dimensions (Time, Region, Demographics, etc)

OLAP Functions in Informix• Supports subset of commonly used

OLAP functions• Enables more efficient query

processing from BI tools such as Cognos

Example query with group byselect customer_num, count(*)from orderswhere customer_num <= 110 group by customer_num;

customer_num (count(*))

101 1 104 4 106 2 110 2

4 row(s) retrieved.

Example query with OLAP functionselect customer_num, ship_date, ship_charge, count(*) over (partition by customer_num) from orders where customer_num <= 110;

customer_num ship_date ship_charge (count(*))

101 05/26/2008 $15.30 1 104 05/23/2008 $10.80 4 104 07/03/2008 $5.00 4 104 06/01/2008 $10.00 4 104 07/10/2008 $12.20 4 106 05/30/2008 $19.20 2 106 07/03/2008 $12.30 2 110 07/06/2008 $13.80 2 110 07/16/2008 $6.30 2

9 row(s) retrieved.

Where does OLAP function fit?

Joins, group by, having, aggregation

OLAP functions

Final order by

OLAP function as predicates• Use derived table query block to compute

OLAP function first

select * from (select customer_num, ship_date, ship_charge, count(*) over (partition by customer_num) as cnt from orders where customer_num <= 110)where cnt >= 3;

OLAP function example• Running 3-month average sales for a particular

product during a particular period

select product_name, avg(sales) over ( partition by region order by year, month rows between 1 preceding and 1 following )from total_saleswhere product_id = 105 and year between 2001 and 2010;

The over() Clauseolap_func(arg) over(partition by clause order by clause window frame clause)

• Defines the “domain” of OLAP function calculation– partition by: divide into partitions– order by: ordering within each partition– window frame: sliding window within each partition– all clauses optional

Partition Bysum(x) over (

partition by a, b order by c, d rows between 2 preceding and 2 following)

a=1, b=1

a=2, b=2

a=1, b=2

a=2, b=1

Order Bysum(x) over (

partition by a, b order by c, d rows between 2 preceding and 2 following)

partition a=1, b=2c=1,d=1c=1,d=2c=1,d=3c=2,d=2c=2,d=4c=3,d=1c=4,d=1c=4,d=2

Window Frame

c=1,d=1c=1,d=2c=1,d=3c=2,d=2c=2,d=4c=3,d=1c=4,d=1c=4,d=2

sum(x) over (partition by a, b order by c, d rows between 2 preceding and 2 following)

Partition By• Divide result set of query into partitions for

computing of an OLAP function• If partition by clause is not specified, then

entire result set is a single partition

max(salary) over (partition by dept_id)sum(sales) over (partition by region)avg(price) over ()

Order By• Ordering within each partition• Required for some OLAP functions

– ranking, window frame clause• Support ASC/DESC, NULLS FIRST/NULLS LAST

rank() over (partition by dept order by salary desc)dense_rank() over(order by total_sales nulls last)

Window Frame• Defines a sliding window within a partition• OLAP function value computed from rows in the

sliding window• Order by clause is required

Physical vs. Logical Window Frame• Physical window frame

– ROWS keyword– count offset by position– fixed window size– order by one or more column expressions

• Logical window frame– RANGE keyword– count offset by value– window size may vary– order by single column (numeric, date or datetime type)

Window Frame Examplesavg(price) over (order by year, day rows between 6 preceding and current row)count(*) over (order by ship_date range between 2 preceding and 2 following)

• Current row can be physically outside the windowavg(sales) over (order by month range between 3 preceding and 1 preceding)sum(sales) over (order by month rows between 2 following and 5 following)

Order By – Special Semantics• “cumulative” semantics in absence of window

frame clause– for OLAP function that allows window frame clause– equivalent to “ROWS between unbounded preceding and

current row” select sales, sum(sales) over (order by quarter) from sales where year = 2012 sales (sum) 120 120 135 255 127 382 153 535

Supported OLAP Functions• Ranking functions

– RANK, DENSE_RANK (DENSERANK)– PERCENT_RANK, CUME_DIST, NTILE– LEAD, LAG

• Numbering functions– ROW_NUMBER (ROWNUMBER)

• Aggregate functions– SUM, COUNT, AVG, MIN, MAX– STDEV, VARIANCE, RANGE– FIRST_VALUE, LAST_VALUE– RATIO_TO_REPORT (RATIOTOREPORT)

Ranking Functions• Partition by clause is optional• Order by clause is required• Window frame clause is NOT allowed• Duplicate value handling is different between

rank() and dense_rank()– same rank given to all duplicates– next rank used “skips” ranks already covered by

duplicates in rank(), but uses next rank for dense_rank()

RANK vs DENSE_RANKselect emp_num, sales, rank() over (order by sales) as rank, dense_rank() over (order by sales) as dense_rankfrom sales;

emp_num sales rank dense_rank 101 2,000 1 1 102 2,400 2 2 103 2,400 2 2 104 2,500 4 3 105 2,500 4 3 106 2,650 6 4

PERCENT_RANK and CUME_DIST• Calculates ranking information as a percentile• Returns value between 0 and 1select emp_num, sales, percent_rank() over (order by sales) as per_rank, cume_dist() over (order by sales) as cume_distfrom sales;

emp_num sales per_rank cume_dist

101 2,000 0 0.166666667 102 2,400 0.2 0.500000000 103 2,400 0.2 0.500000000 104 2,500 0.6 0.833333333 105 2,500 0.6 0.833333333 106 2,650 1.0 1.000000000

NTILE

• Divides the ordered data set into N number of tiles indicated by the expression.

• Number of tiles needs to be exact numeric with scale zero

NTILE Exampleselect name, salary,

ntile(5) over (partition by dept order by salary)

from employee;

name salary (ntile)

John 35,000 1 Jack 38,400 1 Julie 41,200 2 Manny 45,600 2 Nancy 47,300 3 Pat 49,500 4 Ray 51,300 5

LEAD and LAG LEAD(expr, offset, default) LAG(expr, offset, default)• Gives LEAD/LAG value of the expression at the

specified offset• offset is optional, default to 1 if not specified• default is optional, NULL if not specified

– default used when offset goes beyond current partition boundary

• NULL handling– RESPECT NULLS (default)– IGNORE NULLS

LEAD/LAG Exampleselect name, salary, lag(salary) over (partition by dept order by salary), lead(salary, 1, 0) over (partition by dept order by salary)from employee; name salary (lag) (lead) John 35,000 38,400 Jack 38,400 35,000 41,200 Julie 41,200 38,400 45,600 Manny 45,600 41,200 47,300 Nancy 47,300 45,600 49,500 Pat 49,500 47,300 51,300 Ray 51,300 49,500 0

LEAD/LAG NULL handlingselect price, lag(price ignore nulls, 1) over (order by day), lead(salary, 1) ignore nulls over (order by day)from stock_price;

price (lag) (lead) 18.25 18.37 18.37 18.25 19.03 18.37 19.03 18.37 19.03 19.03 18.37 18.59 18.59 19.03 18.21 18.21 18.59

Numbering Functions

• Partition by clause and order by clause are optional

• Window frame clause is NOT allowed• Provides sequential row number to result set

– regardless of duplicates when order by is specified

ROW_NUMBER Example

select row_number() over (order by sales), emp_num, sales

from sales; (row_number) emp_num sales

1 101 2,000 2 102 2,400 3 103 2,400 4 104 2,500 5 105 2,500 6 106 2,650

Aggregate Functions• Partition by, order by and window frame clauses

are all optional– window frame clause requires order by clause

• All currently supported aggregate functions– SUM, COUNT, MIN, MAX, AVG, STDEV, RANGE,

VARIANCE• New aggregate functions

– FIRST_VALUE/LAST_VALUE– RATIO_TO_REPORT

Aggregate Function Exampleselect price, avg(price) over (order by day rows between 1 preceding and 1 following)from stock_price;

price (avg) 18.25 18.31 18.37 18.31 18.37 19.03 19.03 18.81 18.59 18.61 18.21 18.40

DISTINCT handling• DISTINCT is supported, however DISTINCT is mutually

exclusive with order by clause or window frame clause

select emp_id, manager_id, count(distinct manager_id) over (partition by department)from employee; emp_id manager_id (count) 101 103 3 102 103 3 103 100 3 104 110 3 105 110 3

FIRST_VALUE and LAST_VALUE

• Gives FIRST/LAST value of current partition

• NULL handling– RESPECT NULLS (default)– IGNORE NULLS

FIRST_VALUE/LAST_VALUE Example

select price, price – first_value(price) over (partition by year order by day) as diff_pricefrom stock_price;

price diff_price 18.25 0 18.37 0.12 19.03 0.78 18.59 0.34 18.21 -0.04

RATIO_TO_REPORT

• Computes the ratio of current value to sum of all values in current partition or window frame.

select emp_num, sales, ratio_to_report(sales) over (partition by year order by sales) from sales;

RATIO_TO_REPORT Example

select year, sales, ratio_to_report(sales) over (partition by year) from sales;

year sales (ratio_to_report) 1998 2400 0.2308 1998 2550 0.2452 1998 2650 0.2548 1998 2800 0.2692 1999 2450 0.2311 1999 2575 0.2429 1999 2725 0.2571 1999 2850 0.2689

Nested OLAP Functions• OLAP function can be nested inside another OLAP

function

select emp_id, salary, salary – first_value(salary) over (order by rank() over (order by salary)) as diff_salaryfrom employee;

select sum(ntile(10) over (order by salary)) over (partition by department)from employee;

OLAP functions and IWA

• Queries containing OLAP functions can be accelerated by Informix Warehouse Accelerator (IWA)

• IWA processes majority of the query block– scan, join, group by, having, aggregation

• Informix server processes OLAP functions based on query result from IWA

For more information

• Links to OLAP function in Informix 12.1 documentation

http://pic.dhe.ibm.com/infocenter/informix/v121/index.jsp?topic=%2Fcom.ibm.sqls.doc%2Fids_sqs_2583.htm

http://pic.dhe.ibm.com/infocenter/informix/v121/index.jsp?topic=%2Fcom.ibm.acc.doc%2Fids_acc_queries1.htm

Questions?

Bingjie Miaobingjie@us.ibm.com

41

top related