pivot tables - universidade nova de...

26
Pivot Tables The Pivot relational operator (available in some SQL platforms/servers) allows us to write cross-tabulation queries from tuples in tabular layout. It takes data in separate rows, aggregates it and convert it into columns. 1

Upload: others

Post on 28-Dec-2019

23 views

Category:

Documents


0 download

TRANSCRIPT

Pivot Tables

The Pivot relational operator (available in some SQL platforms/servers) allows us to write cross-tabulation queries from tuples in tabular layout. It takes data in separate rows, aggregates it and convert it into columns.

1

Pivot Tables – Motivation (1)

2

If we want to know how many customers bought someting once, twice, thrice and so on, from each state, a regular SQL to satisfy that query would be, select state_code, times_purchased, count(*) cnt from customers group by state_code, times_purchased;

Cust_id Cust_name State_code Times_purchased

1 John CT 1

2 Mary NY 10

3 Alfredo NJ 2

4 Ana NY 4

... ... ...

Considering table customers as:

Pivot Tables – Motivation (2)

3

This is the information we need but it is a little hard to read. A crosstab where we could organize the data vertically and states horizontally would be preferable:

State_code Times_purchased cnt

CT 0 90

CT 1 165

CT 2 179

... ... ...

NY 1 33048

Whose result would be:

Times_purchased CT NY NJ ...

0 90 0 35 ...

1 165 33048

20 ...

2 179 219 37 ...

3 ...

Pivot Tables: another example

order_id customer_ref product_id

50001 SMITH 10

50002 SMITH 20

50003 ANDERSON 30

50004 ANDERSON 40

50005 JONES 10

50006 JONES 20

50007 SMITH 20

50008 SMITH 10

50009 SMITH 20

The following tuples:

4

Can be shown as:

customer_ref 10 20 30

ANDERSON 0 0 1

JONES 1 1 0

SMITH 2 3 0

PIVOT clause – syntax (1)

SELECT * FROM ( SELECT column1,…, columnj FROM tables WHERE conditions ) PIVOT ( aggregate_function(columnj) FOR columnj IN ( expr1, expr2, ... expr_n) | subquery ) ORDER BY expression [ ASC | DESC ];

5

PIVOT clause – syntax (2)

Where: aggregate_function can be a function such as SUM, COUNT, MIN, MAX or AVG IN ( expr1, expr2, ... expr_n ) is a list of values for columnj to pivot into headings in the cross-tabulation query. Each distinct value will be shown as a separate column subquery can be used instead of a list of values.

6

PIVOT clause – Application (1)

7

select * from ( select times_purchased times, state_code from customers t ) pivot ( count(state_code) for state_code in ('NY','CT','NJ','FL','MO') ) order by times_purchased

times NY CT NJ FL MO

0 16601 90 35 0 0

1 33048 165 20 0 0

2 33151 179 37 0 0

3 32978 173 0 0 0

4 33109 173 0 1 0

Searching with PIVOT clause (1)

8

EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO

7839 KING PRESIDENT 17-NOV-81 5000 10

7698 BLAKE MANAGER 7839 01-MAY-81 2850 30

7782 CLARK MANAGER 7839 09-JUN-81 2450 10

7566 JONES MANAGER 7839 02-APR-81 2975 20

... ... ... ... ... ... ... ...

EMP table

Question: For each job, display the salary totals in a separate column for each department.

Searching with PIVOT clause (2)

9

JOB 10 20 30 40

CLERK 1430 2090 1045

SALESMAN 6160

PRESIDENT 5500

MANAGERT 2695 3272.5 3135

ANALYST 6600

WITH pivot_data AS (SELECT deptno, job, sal from EMP) select * from pivot_data PIVOT ( SUM(sal) for deptno in (10, 20, 30, 40) );

The list of values in deptno was hard-coded in this example (10, 20, 30, 40)

Searching with PIVOT clause (2)

10

JOB 10 20 30 40

CLERK 1430 2090 1045

SALESMAN 6160

PRESIDENT 5500

MANAGER 2695 3272.5 3135

ANALYST 6600

select * from (SELECT deptno, job, sal from EMP) PIVOT ( SUM(sal) for deptno in (10, 20, 30, 40));

Alternatively, an inline-view may be used to obtain the same result:

Searching with PIVOT clause (3)

11

Groupings will be affected if pivot queries are performed on a larger set of columns. Ex: SELECT * from EMP PIVOT ( SUM(sal) for deptno in (10, 20, 30, 40)); Here, deptno is still the pivot column but the large group of columns Including a superkey of EMP cause the effective useless of the pivot (results in the next slide).

Searching with PIVOT clause (4)

12

EMPNO ENAME JOB MGR HIREDATE COMM 10 20 30 40

7654 MARTIN SALESMAN 7698 28/09/81 1400 1375

7698 BLAKE MANAGER 7839 01/05/81 3135

7934 MILLER CLERK 7782 23/01/82 1430

7521 WARD SALESMAN 7782 22/02/81 500 1375

7566 JONES MANAGER 7698 02/04/81 3272.5

7844 TURNER SALESMAN 7839 08/09/81 0 1650

7900 JAMES CLERK 7698 03/12/81 1045

7839 KING PRESIDENT 19/04/87 5500

7876 ADAMS CLERK 7788 23/05/87 1210

7902 FORD ANALYST 7566 03/12/81 3300

... ... ... ... ... ... ... ... ... ...

Searching with PIVOT clause (5)

13

Question: For ANALYST, CLERK and SALESMAN, display the salary totals in a separate column for each department.

WITH pivot_data AS (SELECT deptno, job, sal from EMP) select * from pivot_data PIVOT ( SUM(sal) for deptno in (10, 20, 30, 40)) where job in (‘ANALYST’, ‘CLERK’, ‘SALESMAN’);

JOB 10 20 30 40

CLERK 1430 2090 1045

SALESMAN 6160

ANALYST 6600

Searching with PIVOT clause (6)

14

Aliases can be used:

WITH pivot_data AS (SELECT deptno, job, sal from EMP) select * from pivot_data PIVOT ( SUM(sal) as salaries for deptno in (10 as Dep10, 20 as Dep20, 30 as Dep30, 40 AS Dep40)) where job in (‘ANALYST’, ‘CLERK’, ‘SALESMAN’);

JOB Dep10_salaries Dep20_salaries Dep30_salaries Dep40_salaries

CLERK 1430 2090 1045

SALESMAN 6160

ANALYST 6600

Searching with PIVOT clause (7)

15

Pivoting multiple columns:

WITH pivot_data AS (SELECT deptno, job, sal from EMP) select * from pivot_data PIVOT ( SUM(sal) as sum, count(sal) as cnt for deptno in (10 as D10, 20 as D20, 30 as D30));

JOB D10_sum D10_cnt ... D30_sum D30_cnt

CLERK 1430 1 ... 1045 1

SALESMAN 0 ... 6160 4

PRESIDENT 5500 1 ... 0

MANAGER 2695 1 ... 3135 1

ANALYST 0 ... 0

Searching with PIVOT clause (8)

16

Or:

WITH pivot_data AS (SELECT deptno, job, sal from EMP) select * from pivot_data PIVOT ( SUM(sal) as sum, count(sal) as cnt for (deptno, job) in ((30, 'SALESMAN') as d30_sls, (30, 'MANAGER') as d30_mgr, (30, 'CLERK') AS d30_clk));

D30_SLS_SUM D30_SLS_CNT D30_MGR_SUM D30_MGR_CNT ...

6160 4 3135 1 ...

PIVOTing an Unknown Domain of Values (1)

17

By default, the pivot syntax does not support a dynamic list of values in the pivot_in_clause. A subquery instead of a hard-code list of values used in the pivot_in_clause will generate an error: SELECT * FROM emp PIVOT (SUM(sal) AS salaries FOR deptno IN (SELECT deptno FROM dept));

PIVOTing an Unknown Domain of Values (2)

18

A possible workaround to solve this problem: (with Oracle) select * from (SELECT deptno, job, sal from EMP) PIVOT XML ( SUM(sal) for deptno in (any));

JOB DEPTNO_XML

ANALYST <PivotSet><item><column name = "DEPTNO">20</column><column name = "SUM(SAL) ">6600</column></item></PivotSet>

MANAGER ....

... ...

It implies extra work to read the information from the XML format!

PIVOTing an Unknown Domain of Values (3)

19

Another workaround to solve the problem (with Oracle SQLplus): column namelist new_value nlist noprint; /* first obtain a string with the list of distinct values of deptno select wm_concat(''''||deptno||'''') namelist from (select distinct deptno from emp) connect by nocycle deptno = prior deptno group by level; WITH pivot_data AS (SELECT deptno, job, sal from EMP) select * from pivot_data PIVOT ( SUM(sal) for deptno in (&nlist)); /* &nlist is a variable containing the string “'10','20','30‘“(results in the next slide). */

PIVOTing an Unknown Domain of Values (4)

20

JOB 10 20 30

CLERK 1430 2090 1045

SALESMAN 6160

PRESIDENT 5500

MANAGER 2695 3272.5 3135

ANALYST 6600

UnPIVOT – turning pivot tables into rows (1)

21

SELECT ... FROM ... UNPIVOT [INCLUDE|EXCLUDE NULLS] ( unpivot_clause unpivot_for_clause unpivot_in_clause ) WHERE ...

unpivot clause: specifies a name for a column to represent the unpivoted measure values. unpivot_for_clause: specifies the name for the column that will result from our unpivot query. unpivot_for_clause: this contains the list of pivoted columns (not values) to be unpivoted

UnPIVOT – turning pivot tables into rows (2)

22

CREATE VIEW pivoted_data as SELECT * FROM pivot_data PIVOT (SUM(sal) FOR deptno IN (10 AS d10_sal, 20 as d20_sal, 30 aS d30_sal, 40 AS d40_sal));

select * from pivoted_data;

JOB D10_sal D20_sal D30_sal D40_sal

CLERK 1430 2090 1045

SALESMAN 6160

PRESIDENT 5500

MANAGERT 2695 3272.5 3135

ANALYST 6600

UnPIVOT – turning pivot tables into rows (3)

23

SELECT * FROM pivoted_data UNPIVOT ( Deptsal FOR saldesc IN (d10_sal, d20_sal, d30_sal, d40_sal) );

JOB SALDESC DEPTSAL

CLERK D10_SAL 1430

CLERK D20_SAL 2090

CLERK D30_SAL 1045

SALESMAN D30_SAL 6160

PRESIDENT D10_SAL 5500

MANAGER D10_SAL 2695

MANAGER D20_SAL 3272.5

MANAGER D30_SAL 3135

ANALYST D20_SAL 6600

UnPIVOT – other uses (1)

24

Since columns in the unpivot_in_clause must all be of the same datatype, this would cause an error:

SELECT empno, job, unpivot_col_name, unpivot_col_value FROM emp UNPIVOT (unpivot_col_value FOR unpivot_col_name IN (ename, deptno, hiredate));

UnPIVOT – other uses (2)

25

A workaround (in oracle) consists on datatype conversion: WITH emp_data AS ( SELECT empno, job , ename , TO_CHAR(deptno) as deptno, TO_CHAR(hiredate) as hiredate FROM emp) SELECT empno , job , unpivot_col_name , unpivot_col_value FROM emp_data UNPIVOT (unpivot_col_value FOR unpivot_col_name IN (ename, deptno, hiredate)); (results in the next page)

UnPIVOT – other uses (3)

26

EMPNO JOB UNPIVOT_COL_NAME UNPIVOT_COL_VALUE

7369 CLERK ENAME SMITH

7369 CLERK DEPTNO 20

7369 CLERK HIREDATE 17/12/1980

7499 SALESMAN ENAME ALLEN

7499 SALESMAN DEPTNO 30

7499 SALESMAN HIREDATE 20/02/1981

... ... ... ...