on-line analytical processing

60
ON-LINE ANALYTICAL PROCESSING - Analyzing Data Resources ADITI PAUL MCS/08/20 REGISTRATION NO – 003834 OF 2008 POST GRADUATE DEPARTMENT OF COMPUTER SCIENCE ST.XAVIERS COLLEGE (AUTONOMOUS)

Upload: parinita-verma

Post on 28-Mar-2015

190 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: On-line Analytical Processing

ON-LINE ANALYTICAL PROCESSING

-

Analyzing Data Resources

ADITI PAULMCS/08/20

REGISTRATION NO – 003834 OF 2008 POST GRADUATE DEPARTMENT OF COMPUTER SCIENCE

ST.XAVIERS COLLEGE (AUTONOMOUS)

Page 2: On-line Analytical Processing

WHAT IS OLAP ?

Basic idea:

Quickly answer multi-dimensional analytical queries.

Convert data into information that decision makers need

It is a continuous , iterative, and preferably interactive process.

Page 3: On-line Analytical Processing

WHO USES OLAP ?

It is used in an organization to carry out the different ORGANIZATIONAL FUNCTIONS in :

• Finance departments • Sales analysis and forecasting • Marketing departments

»Cardinal Goal

“ Provide managers with the information they need to make effective decisions ”

Page 4: On-line Analytical Processing

Understanding Online Analytical Processing - OLAP

3 part description

Part 1 – Online

Part 2 – Analytical Part 3 – Processing

Page 5: On-line Analytical Processing

PART 1 – ONLINE

Page 6: On-line Analytical Processing

FLASH BACK

Page 7: On-line Analytical Processing

Data Stored in a Database

TYPE 1 Operational Data

• Data that “works”.• Frequent Updates and Queries.• Normalized for efficient search and

updates.• Fragmented & local relevance.• Point Queries .

Page 8: On-line Analytical Processing

Examples of Operational Data

• Account Details of a Customer in a Bank

• Student Details in a College/School Database

• Employee Records• Etc.

Page 9: On-line Analytical Processing

Example Queries on Operational Data

• What is the salary of Mr.Chatterjee ? ( point query)

• What is the address and phone number of the person in charge of the hardware department ?

• How many students have received an “distinction” credential in the latest exam?

Page 10: On-line Analytical Processing

Operational Data pertain to what we call“ONLINE TRANSACTION PROCESSING”As the name suggests these sorts of

data are used for day to day ‘operations’ like data entry /retrieval .

For example : An ATM is a commercial online transaction system.

Page 11: On-line Analytical Processing

Types of Data in a Database

Type 2 Historical Data

Data that “tells”. Very Infrequent updates. Integrated data set with global relevance. Analytical queries that require huge

amounts of aggregation. Performance issues mainly in query

response time.

Page 12: On-line Analytical Processing

Examples of Historical Data

• Last set of 10 transactions on a particular bank account of a customer

• Record of sales of a product in the last 15 years in a company’s database

• The profits incurred by a company stored month wise in a whole fiscal year.

Page 13: On-line Analytical Processing

Example Queries on Historical Data

• How is the student marks percentage scene changing over the years in college?

• Is there a correlation between the geographical location of a company unit and excellent employee appraisals?

• How is the employee attrition changing over the years across the company?

Page 14: On-line Analytical Processing

Historical Data pertains to the phenomenon that is

“Online Analytical Processing” where queries thus do not just depend on

seeing one part of a tuple .

For example to find out the employee attrition, we have to find out some aggregate employee attrition and then map it against time. Thus these queries require “analyzing” certain facts and then producing a correct output .

Page 15: On-line Analytical Processing

The necessity that these queries be ONLINE means that the queries need to be responded to in an

“ONLINE INTERACTIVE RESPONSE TIME”

as the waiting time of users is of the order of a few seconds.

Page 16: On-line Analytical Processing

The differences Between OLAP and OLTP thus are

Page 17: On-line Analytical Processing

PART 2 - ANALYTICAL

Page 18: On-line Analytical Processing

Analysis of the DataIn order to “Analyze” this Historical Data , it

needs to be stored in a certain formatted and organized manner.

This is accomplished by a Data Warehouse.• Data warehouse is an infrastructure to

manage historical data from various sources.• It is designed to support OLAP Queries

involving gratuitous use of aggregation.• Subject Oriented , Integrated ,Time-Variant

and Non Volatile collection of data in support of management’s decision making process.

Page 19: On-line Analytical Processing

WAREHOUSING SCHEMATIC DATA DIAGRAM

Page 20: On-line Analytical Processing

Dimensions of Data Warehouse Modeling

Measures –

• Key performance indicator that we want to evaluate.• Typically numerical , including volume, sales and

cost.• A Rule of Thumb : if a number makes(business)

sense when aggregated, then it is a measure.• Affects what should be stored in Data Warehouse.

• Example : Aggregate daily volume to month , quarter and year

Page 21: On-line Analytical Processing

Dimensions –

• Categories of data analysis• Typical dimensions include product, time, region.• A Rule of Thumb : when a report is requested

“by” something, that something is usually a dimension.

• Example :In sales report , view sales by month,by region,so the two dimensions needed are time and region.

Page 22: On-line Analytical Processing

Dimensions and measures are physically represented by a

STAR SCHEMA.

Page 23: On-line Analytical Processing

• The Data Model Which is adhered to while handling Historical Data to populate a Data Warehouse is a

“MULTIDIMENSIONAL DATA MODEL.”• One way to look at a

multidimensional data model is to view it as a CUBE.

Page 24: On-line Analytical Processing

CUBE  • It is a data structure that allows fast

analysis of data. • It can also be defined as the

capability of manipulating and analyzing data from multiple perspectives.

Page 25: On-line Analytical Processing

The response time of the

multidimensiona-l query still depends on

how many cells have to be

added on the fly

BASIC STRUCTURE OF A CUBE

Page 26: On-line Analytical Processing
Page 27: On-line Analytical Processing

• n-D base cube is called a BASE CUBOID. The top most 0-D cuboid, which holds the highest-level of summarization, is called the APEX CUBOID. The lattice of cuboids forms a data CUBE.

Page 28: On-line Analytical Processing

PART 3 - PROCESSING

Page 29: On-line Analytical Processing

PROCESSING DATA TO INFORMATION

Now that we have the Required Data in the Requisite form , how do we get the Desired output to a Query which requires analyzing of the data?

This is Accomplished by • OLAP Operations• OLAP Functions • SQL Extensions for OLAP.

Page 30: On-line Analytical Processing

OLAP OPERATIONS

• Dimension Tables • Market (Market_ID, City , Region)• Product

(Product_ID,Name,Category,Price)• Time(Time_ID,Week,Month,Quarter)

• Fact table• Sales(Market_ID,

Product_ID,Time_ID,Amount)

Page 31: On-line Analytical Processing

OLAP OPERATIONS

1. Aggregation – doing the ‘total’ of a measure over one or more dimensions.

MARKET ID CITY REGION

M1 KOLKATA EAST

M2 KOLKATA EAST

M3 DELHI NORTH

PRODUCT ID

NAME CATEGORY

PRICE

P1 FROOTI DRINKS 20

P2 LIMCA DRINKS 30

TIME ID WEEK MONTH QUARTER

T1 1 JAN 1

T2 1 JUNE 2

Page 32: On-line Analytical Processing

MARKET_ID

PRODUCT_ID

TIME_ID AMOUNT

M1 P1 T1 250

M1 P1 T2 350

M2 P2 T1 400

M2 P2 T2 200

M3 P1 T1 100

M3 P2 T1 50

SELECT Market_ID ,Product_ID ,SUM(AMOUNT)FROM SalesGROUP BY Market_ID , Product_ID;

MARKET ID PRODCT ID SUM(AMT)

M1 P1 600

M2 P2 600

M3 P1 100

QUERY :Find the Total Sales (over time) of each product in each market

Page 33: On-line Analytical Processing

OLAP OPERATIONS

2. ROLL UPSpecific grouping on one dimension

where we go from lower level of aggregation to a higher.

Example :

“ROLL UP sales on MARKET from CITY to REGION”

Page 34: On-line Analytical Processing

Firsty, the TOTAL SALE of a PARTICULAR Product in a city at a given time is done.Then,we use the CITY and Product ID of a city belonging to a REGION to project sales in that region

Select S.Product_Id,M.City,SUM(S.Amount)INTO City_SalesFROM Sales S,Market MWHEREM.Market_ID = S.Market_IDGROUP BY S.Product_ID,M.City

SELECT T.Product_ID,M.Region,SUM(T.Amount)

FROM City_Sales T,Market M

Where T.City=M.City

GROUP BY T.Product_ID,M.Region

Page 35: On-line Analytical Processing

OLAP OPERATIONS

3.DRILL DOWN• Finer –grained view on aggregated

data,i.e. going from higher to lower aggregation

• Converse of Roll-up• E.g disaggregate county sales by

region/city.

Page 36: On-line Analytical Processing

OLAP OPERATIONS

4.PIVOTINGSelect A different dimension(orientation) for

analysis

Page 37: On-line Analytical Processing

OLAP OPERATIONS

5. SLICE and DICE

Slicing : Selection on one or more dimensionsExample : “Choosing sales only in week 12” Slicing the data cube in the Time Dimension

SELECT S.*FROM Sales S,Time TWHERE T.Time_ID = S.Time_ID AND T.WEEK=’Week 12’

Page 38: On-line Analytical Processing

OLAP OPERATIONS

Dicing: A range selection in a hypercube. Partition or group on one or more dimensions.

Example :“ Total sales for each product in each quarter “ Dicing sales in the time dimension :

SELECT S.Product_ID,T.Quarter,SUM(S.Amount)FROM Sales S,Time TWHERE T.Time_ID=S.Time_IDGROUP BY T.Quarter,S.Product_ID

Page 39: On-line Analytical Processing

SQL EXTENSIONS FOR OLAP

ID FNAME LNAME MARKS SEM

1 ANAL ACHARYA 300 1

2 ROMIT BEED 325 1

3 JAYATI GHOSH.D 390 2

4 KAUSHIK GOSWAMI

350 2

5 SILADITYA

MUKHERJEE

275 3

6 SONALI SEN 310 3

1.ROLL UP

SELECT SEM,SUM(MARKS),RANK() OVER (ORDER BY SUM (MARKS) DESC)AS rank FROMTEACHERSGROUP BY ROLL UP(SEM) ORDER BY SEM

Page 40: On-line Analytical Processing

ROLL UP thus provides subtotals of aggregate rows.

SEM SUM(MARKS) RANK

2 740 1

1 625 2

3 585 3

Page 41: On-line Analytical Processing

SQL EXTENSIONS

2.CUBESELECT SEM,SUM(MARKS)FROM TEACHERS GROUP BY CUBE(SEM)

ID FNAME LNAME MARKS SEM

1 ANAL ACHARYA

300 1

2 ROMIT BEED 325 1

3 JAYATI GHOSH.D

390 2

4 KAUSHIK GOSWAMI

350 2

5 SILADITYA

MUKHERJEE

275 3

6 SONALI SEN 310 3

Page 42: On-line Analytical Processing

The CUBE operator provides subtotals of aggregate values

in the result set

SEM SUM(MARKS)

1 625

2 740

3 585

1950

Page 43: On-line Analytical Processing

SQL EXTENSIONS

3. GROUPING SETS lets us compute groups on several different sets of grouping columns in the same query.

This Query returns subtotal rows for each year, but not for the individual quarters.

Page 44: On-line Analytical Processing

SQL EXTENSIONS

Select YEAR as YEAR , QUARTER as QUARTER,COUNT(*) as ORDERS from SALESGROUP BY GROUPING SETS(YEAR,QUARTER),(YEAR)) ORDER BY YEAR & QUARTER

Page 45: On-line Analytical Processing

OLAP FUNCTIONS

1. RANK FUNCTION – Lets us compile a list ofvalues from your data set inranked order.

Example : The SQL query that follows finds the male and female employees from Kolkata

and ranks them in descending order according to salary.

Page 46: On-line Analytical Processing

SELECT emp_lname, salary, sex,RANK () OVER (ORDER BY salary DESC) "Rank"FROM employeeWHERE city IN (’KOL’)

Page 47: On-line Analytical Processing

OLAP FUNCTIONS

2.REPORTING FUNCTION : Reporting functions lets us compare non-aggregate values to aggregate values.

Example : The following query returns a result set that shows a list of the products that

sold higher than the average number of sales. The result set is partitioned by

year.

Page 48: On-line Analytical Processing

SELECT *FROM (SELECT year(order_date) AS Year, prod_id,SUM( quantity ) AS Q,AVG (SUM(quantity))OVER (PARTITION BY Year) AS Average FROM sales_order JOIN sales_order_itemsGROUP BY year(order_date), prod_idORDER BY Year)AS derived_tableWHERE Q > Average

• For the year 2000, the average number of orders was 1787. Four products• (700, 601, 600, and 400) sold higher than that amount. In 2001, the average• number of orders was 1048 and three products exceeded that amount.

Page 49: On-line Analytical Processing

OLAP FUNCTIONS

WINDOW FUNCTIONSWindow functions lets us analyze

ourdata by computing aggregate values over windows surrounding each row. The result set returns a summary value representing a set of rows.

Page 50: On-line Analytical Processing

The query returns a result set that partitions the data by department and then provides a cumulative summary of employees’ salaries starting with the employee who has been at the company the longest. The result set includes only those employees who reside in West Bengal, BBSR, Maharashtra, or Arunachal. The column Sum Salary provides the cumulative total of employees’ salaries. SELECT dept_id, emp_lname, start_date, salary,SUM(salary) OVER (PARTITION BY dept_idORDER BY start_dateRANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS "Sum_Salary"FROM employeeWHERE state IN (’WB’, ’BBSR’, ’MH’, ’AR’) AND dept_id IN (’100’,’200’)ORDER BY dept_id, start_date;

Page 51: On-line Analytical Processing
Page 52: On-line Analytical Processing

On Line Analytical Processing

Thus Online Analytical Processing as a whole can be understood to be a method which takes in raw data , processes it through various functions and operations and produces Information as a Response to Multidimensional Queries in Real Time

Page 53: On-line Analytical Processing

SERVER ARCHITECTURES

MOLAP : Multidimensional OLAPThe database is stored in a special, usually proprietary, structure that is optimized for multidimensional analysis.+ : very fast query response time because data is mostly pre-calculated-: practical limit on the size because the time taken to calculate the database and the space required to hold these pre-calculated values

Page 54: On-line Analytical Processing

SERVER ARCHICTECTURES

ROLAP – Relational OLAPThe database is a standard relational database and the database model is a multidimensional model, often referred to as a star or snowflake model or schema.+: more scalable solution -: performance of the queries will be largely governed by the complexity of the SQL and the number and size of the tables being joined in the query

Page 55: On-line Analytical Processing

SERVER ARCHITECTURES

HOLAP – HYBRID OLAP

A hybrid of ROLAP and MOLAPcan be thought of as a virtual database whereby the higher levels of the database are implemented as MOLAP and the lower levels of the database as ROLAP

Page 56: On-line Analytical Processing

SERVER ARCHITECTURES

DOLAP –DESKTOP OLAP

The previous terms are used to refer to server based OLAP technologiesDOLAP (Desktop OLAP)DOLAP enables users to quickly pull together small cubes that run on their desktops or laptops .

Page 57: On-line Analytical Processing

COMMERCIAL OLAP SYSTEMS

IBM DB2 DATAWAREHOUSING•ENTERPRIZE EDITION•STANDARD EDITION•BASE EDITION

ORACLE 9i ENTERPRIZE EDITION

MICROSOFT SQL SERVER 2005 BUSINESS INTELLIGENCE WORKBENCH PLATFORM

Page 58: On-line Analytical Processing

OLAP Challenges and Future Scope

• Analytical Complexity• Business questions can be rarely answered by a single

query• Complex queries are hard to understand,write and

execute efficiently• Need for good business analysts

• Data Cubes can be HUGE• But also can be sparse• Can compute in advance,compute on demand , or

some combination.• OLAP forms the underlying structure of DDAS –Distributed

Data Analysis and Dissemination System.• From On line Analytical Processing to Online Analytical

Mining • ( OLAP to OLAM)

Page 59: On-line Analytical Processing

BIBLIOGRAPHY

• Data Warehousing , Data Mining and OLAP – Alex Berson,Stephen J.Smith

• Data Warehousing And OLAp - Hector Garcia-Molina• Stanford University• A Hitchhiker’s guide to OLAP – Paul Burton and Howard ong.• Data mining data warehousing – Dr.Hani Saleeb• DATA WAREHOUSE

AND OLAP TECHNOLOGY Prof. Anita Wasilewska • Data Mining:

Concepts and Techniques Jiawei Han, Micheline Kamber, and Jian Pei• University of Illinois at Urbana-Champaign &• Simon Fraser University• Wikipedia.• Data Warehousing, Filtering, and Mining-Temple University• Data Mining- Professor Maytal Saar-Tsechansky

Page 60: On-line Analytical Processing

THANK YOU !