business analytics and data visualizationmit.wu.ac.th/mit/images/editor/files/ch 02(3).pdf · oltp...

Post on 18-Apr-2020

11 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

BUSINESS ANALYTICS AND

DATA VISUALIZATION

1

DATA VISUALIZATION

ITM-761 Business Intelligenceดร. สลล บญพราหมณ

2

การทาความดน*น ยากและเหนผลชา แตกจาเปนตองทา เพราะหาไมความช:วซ:งทาไดงายจะเขามาแทนท:และจะพอกพนข*นอยางรวดเรวโดยไมทนรสกตวแตละคนจงตองต *งใจและเพยรพยายามใหสดกาลงในการสรางเสรมและสะสมความดในการสรางเสรมและสะสมความด

พระบรมราโชวาทพระบาทสมเดจพระเจาอยหวพระราชทานแกผสาเรจการศกษาจากโรงเรยนนายรอยตารวจ

ณ อาคารใหม สวนอมพร วนท: 14 สงหาคม 2525

Overview: The Business Analytics (BA)

� The use of analytical methods, either manually or automatically, to derive relationships from data

� business analytics (BA) includes the access, reporting, and analysis of data supported by

3

reporting, and analysis of data supported by software to drive business performance and decision making

4

� MicroStrategy’s classification of BA tools: The

five styles of BI

1. Enterprise reporting

2. Cube analysis

The Business Analytics (BA) Field:

An Overview 5

2. Cube analysis

3. Ad hoc querying and analysis

4. Statistical analysis and data mining

5. Report delivery and alerting

� SAP’s classification of strategic enterprise

management

� Three levels of support

1. Operational

The Business Analytics (BA) Field:

An Overview 6

1. Operational

2. Managerial

3. Strategic

� Executive information and support systems

� Executive information systems (EIS)

Provides rapid access to timely and relevant information aiding in monitoring an organization’s performance

The Business Analytics (BA) Field:

An Overview 7

performance

� Executive support systems (ESS)

Also provides analysis support, communications, office automation, and intelligence support

� On-Line Analytical Processing (OLAP) is a decision support tool that allows users to analyze different dimensions of multidimensional data.

� Designed for executives looking to make sense out of their information, OLAP structures data

Online Analytical Processing (OLAP) 8

of their information, OLAP structures data hierarchically to reflect the real dimensionality of the enterprise as understood by the users.

� Users can pivot, filter, drill down and drill up data and generate numbers of views with simple mouse manipulations.

9

� OLAP structure created

from the operational data

is called an OLAP cube.

� the cube holds data more

like a 3D spreadsheet

the cube holds data more

like a 3D spreadsheet

rather than a relational

database, allowing

different views of the data

to be quickly displayed

10

� In multidimensional OLAP (MOLAP) databases, cubes

are created and stored physically, whereas in

relational OLAP (ROLAP) databases, cubes are

virtually created, based on a star or snowflake

schemaschema

11

� Star and snowflake

schemas

The OLAP Report

12

� one of the most internationally authoritative sources

of information on OLAP products and applications,

defines OLAP in five keywords: Fast Analysis of

Shared Multidimensional Information, or FASMI for

shortshort

� Fast

� The system is targeted to deliver most responses

to users within about five seconds, with the

simplest analyses taking no more than one second

and very few taking more than 20 seconds

13

� Analysis

� The system can cope with any business logic and

statistical analysis that is relevant for the

application and the user, and keep it easy enough

for the target user

application and the user, and keep it easy enough

for the target user

14

� Shared

� The system implements all the security

requirements for confidentiality and, if multiple

write access is needed, concurrent update locking

at an appropriate level.

write access is needed, concurrent update locking

at an appropriate level.

� Not all applications need users to write data back,

but for the growing number that do, the system

should be able to handle multiple updates in a

timely, secure manner

15

� Multidimensional

� The system must provide a multidimensional

conceptual view of the data, including full support

for hierarchies and multiple hierarchiesfor hierarchies and multiple hierarchies

� Information

� The capacity of various products is measured in

terms of how much input data they can handle,

not how many gigabytes they take to store it

� OLTP concentrates on processing repetitive

transactions in large quantities and conducting

simple manipulations

� OLAP involves examining many data items

16

OLAP versus OLTP

� OLAP involves examining many data items

complex relationships

� OLAP may analyze relationships and look for

patterns, trends, and exceptions

� OLAP is a direct decision support method

17

� OLTP (on-line transaction processing)

� Major task of traditional relational DBMS

� Day-to-day operations: purchasing, inventory,

banking, manufacturing, payroll, registration, banking, manufacturing, payroll, registration,

accounting, etc.

� OLAP (on-line analytical processing)

� Major task of data warehouse system

� Data analysis and decision making

18

� Distinct features (OLTP vs. OLAP):

� User and system orientation: customer vs. market

� Data contents: current, detailed vs. historical,

consolidatedconsolidated

� Database design: ER + application vs. star + subject

� View: current, local vs. evolutionary, integrated

� Access patterns: update vs. read-only but complex

queries

19

OLTP vs. OLAP

OLTP OLAP

users clerk, IT professional knowledge worker

function day to day operations decision support

DB design application-oriented subject-oriented

data current, up-to-date detailed, flat relational

historical, summarized, multidimensional detailed, flat relational

isolated summarized, multidimensional integrated, consolidated

usage repetitive ad-hoc

access read/write index/hash on prim. key

lots of scans

unit of work short, simple transaction complex query

# records accessed tens millions

#users thousands hundreds

DB size 100MB-GB 100GB-TB

metric transaction throughput query throughput, response

Codd’s Rules for OLAP Systems

� In 1993, E.F. Codd formulated twelve rules as the

basis for selecting OLAP tools.

20

Codd’s Rules for OLAP Systems (cont.)

� Multi-dimensional conceptual view

� Supports EIS (Executive Information System) slice

and dice operations and is usually required in

financial modeling.

21

financial modeling.

� Transparency

� Is part of an open system that supports

heterogeneous data sources. Furthermore, the end

user should not be concerned about the details of

data access or conversions.

Codd’s Rules for OLAP Systems (cont.)

22

� Accessibility

� Presents the user with a single logical schema of

the data. OLAP engines act as middleware, sitting

between heterogeneous data sources and an

OLAP front-end.

between heterogeneous data sources and an

OLAP front-end.

� Consistent reporting performance

� Performance should not degrade as the number of

dimensions in the model increases.

Codd’s Rules for OLAP Systems (cont.)

23

� Client-server architecture

� Requires open, modular systems. Not only the

product should be client/server but the server

component of an OLAP product should allow that

various clients could be attached with minimum

component of an OLAP product should allow that

various clients could be attached with minimum

effort and programming for integration

� Generic dimensionality

� Not limited to 3D and not biased toward any

particular dimension. A function applied to one

dimension should also be able to be applied to

another

Codd’s rules for OLAP

� Dynamic sparse matrix handling (null values)

� Related both to the idea of nulls in relational

databases and to the notion of compressing large

files, a sparse matrix is one in which not every cell

contains data. OLAP systems should accommodate

24

contains data. OLAP systems should accommodate

varying storage and data-handling options

� Multi-user support

� Supports multiple concurrent users, including their

individual views or slices of a common database

Codd’s Rules for OLAP Systems (cont.)

25

� Unrestricted cross-dimensional operations

� All dimensions are created equal, so all forms of

calculation must be allowed across all dimensions,

not just the measures dimensionnot just the measures dimension

� Intuitive data manipulation (slicing and dicing

(pivoting), drill-down, consolidation(drill-up), etc)

� Users shouldn't have to use menus or perform

complex multiple step operations when an

intuitive drag and drop action will do

Codd’s Rules for OLAP Systems (cont.)

26

� Flexible reporting

� Users should be able to print just what they need,

and any changes to the underlying model should

be automatically reflected in reports.be automatically reflected in reports.

� Unlimited dimensions and aggregation levels

� Supports at least 15, and preferably 20,

dimensions

Codd’s Rules for OLAP Systems (cont.)

� There are proposals to re-defined or extended the

rules. For example to also include

� Comprehensive database management tools

� Ability to drill down to detail (source record) level

27

� Ability to drill down to detail (source record) level

� Incremental database refresh

� SQL interface to the existing enterprise

environment

OLAP operations

28

� Roll-up

� Takes the current aggregation

level of fact values and does a

further aggregation on one or

more of the dimensions. more of the dimensions.

� Equivalent to doing GROUP BY

to this dimension by using

attribute hierarchy.

� Decreases a number of

dimensions - removes row

headers

29

� Drill-down

� Opposite of roll-up.

� Summarizes data at a

lower level of a

dimension hierarchy, dimension hierarchy,

thereby viewing data

in a more specialized

level within a

dimension.

� Increases a number of

dimensions - adds new

headers

30

� Slice

� Performs a selection on one

dimension of the given

cube, resulting in a sub-

cube. cube.

� Reduces the dimensionality

of the cubes.

� Sets one or more

dimensions to specific

values and keeps a subset

of dimensions for selected

values

31

� Dice

� Define a sub-cube by performing a selection of one or more dimensions. dimensions.

� Refers to range select condition on one dimension, or to select condition on more than one dimension.

� Reduces the number of member values of one or more dimensions

Categories of OLAP Tools

� OLAP tools are categorized according to the

architecture used to store and process multi-

dimensional data.

32

� There are four main categories:

� Multi-dimensional OLAP (MOLAP)

� Relational OLAP (ROLAP)

� Hybrid OLAP (HOLAP)

� Desktop OLAP (DOLAP)

1) Multi-dimensional OLAP (MOLAP)

� Use specialized data structures and multi-dimensional

Database Management Systems (MDDBMSs) to

organize, navigate, and analyze data.

� Data is typically aggregated and stored according to

33

� Data is typically aggregated and stored according to

predicted usage to enhance query performance.

� This allows users to view different aspects of data

aggregates such as sales by time period, geography, or

product. The storage is not in a relational database

� Use array technology and efficient storage

techniques that minimize the disk space

requirements through sparse data management.

� Provides excellent performance when data is used as

designed, and the focus is on data for a specific

34

Provides excellent performance when data is used as

designed, and the focus is on data for a specific

decision-support application.

� Traditionally, require a tight coupling with the

application layer and presentation layer.

MOLAP Available tools

35

� Hyperion,

� Executive Viewer,

� CFO Vision,

� BI/Analyze,

� PowerPlay, � PowerPlay,

� Business Objects,

� Genita,

� Holos,

� MS OLAP Services,

� Pilot,

� ProCube

Typical Architecture for MOLAP Tools

36

37

� MOLAP utilizes a proprietary multidimensional

database to provide OLAP analyses. The main

premise of this architecture is that data must be

stored multidimensionally to be viewed

multidimensionallymultidimensionally

� Data from various operational systems is loaded into

a multidimensional database through a series of

batch routines.

38

� Once this atomic data has been loaded into the

multidimensional database, the general approach is

to perform a series of calculations in batch to

aggregate along the dimensions and fill the

multidimensional array structures.multidimensional array structures.

� Then indices are created, and hashing algorithms are

used to improve query access time

39

� MOLAP is a two-tier, client/server architecture. The multidimensional database serves as both the database layer and the application logic layer. In the database layer, it is responsible for all data storage, access, and retrieval processes.access, and retrieval processes.

� In the application logic layer, it is responsible for the execution of all OLAP requests. The presentation layer integrates with the application logic layer and provides an interface through which the users view and request OLAP analyses.

� The client/server architecture allows multiple users to access the same multidimensional database

40

41

� MOLAP Advantages

� Excellent performance since pre-aggregation

provides quicker response time

� Availability of extensive libraries of complex � Availability of extensive libraries of complex

functions for OLAP analyses

� Optimal for slice and dice operations

� Performs better than ROLAP when data is dense

42

� MOLAP Disadvantages

� Usually more than 90% of cells are empty - issue

with sparsity

� Limited in the amount of data it can handle, since � Limited in the amount of data it can handle, since

all calculations are performed when the cube is

built. Therefore, it is not commonly used above

20-50 GB - scalability problem

� Difficult to change dimension without re-

aggregation

43

� MOLAP Disadvantages (cont)

� Data must be copied and moved into data stores

� Originated from query tools, thereby lacking the

architecturearchitecture

� Requires additional investment since cube

technology is often proprietary and does not

already exist in organizations

� Lacks security and administration features which

RDBMSs can bring

2) Relational OLAP (ROLAP)

� Fastest-growing style of OLAP technology due to

requirements to analyze ever increasing amounts of

data and the realization that users cannot store all

the data they require in MOLAP databases.

44

the data they require in MOLAP databases.

� The traditional OLAP's slice and dice functionality is

equivalent to adding a WHERE clause in the SQL

statement. The design may be structured in the form

of a star or its variations

� ROLAP performs dynamic multidimensional analysis

of data stored in a relational database, rather than in

a multidimensional database

� Supports RDBMS products using a metadata layer -

avoids need to create a static multi-dimensional data

structure - facilitates the creation of multiple multi-

dimensional views of the two-dimensional relation.

� A typical use of ROLAP is for large data size that is

45

� A typical use of ROLAP is for large data size that is

infrequently queried, such as historical data

� To improve performance, some products use SQL

engines to support the complexity of multi-

dimensional analysis, while others recommend, or

require, the use of highly denormalized database

designs such as the star schema.

46

designs such as the star schema.

ROLAP Available tools

47

� Discover 3 from Oracle,

� DSS Agent from MicroStrategy,

� MetaCube from IBM Informix,

� Platinum Beacon from Platinum, � Platinum Beacon from Platinum,

� Brio,

� Business Objects,

� Cognos Powerplay

Typical Architecture for ROLAP Tools48

49

� ROLAP accesses data stored in a data warehouse

(relational database) to provide OLAP analyses

� OLAP is a three-tier, client/server architecture. The

database layer utilizes relational databases for data

storage, access, and retrieval processes.

database layer utilizes relational databases for data

storage, access, and retrieval processes.

� The application logic layer is the ROLAP engine

which executes the multidimensional reports from

multiple users. The ROLAP engine integrates with a

variety of presentation layers, through which users

perform OLAP analyses

50

51

� ROLAP Advantages

� Well known environments (relational database)

� Can leverage functionality that comes with

relational database with ROLAP technologiesrelational database with ROLAP technologies

� Can be used with data warehouse and OLTP

systems

� No pre-aggregation is needed - avoid the data

explosion effect that some MOLAP

implementations incur with large scale models

52

� ROLAP Advantages (cont.)

� Can handle large amounts of data - the limitation is

the data size of the underlying relational database.

OLAP itself has no limitation on data amount

Full security and administration is provided through � Full security and administration is provided through

RDBMS

� Performs better than MOLAP when the data is sparse

� Performance is getting better by adding more OLAP

functions and employing various storage and query

optimization techniques

53

� ROLAP Disadvantages

� Performance can be slow, since each ROLAP report

is a SQL query in the relational database

� Does not have complex functions that are � Does not have complex functions that are

provided by OLAP tools

� Limited by SQL functionality

� Hard to maintain aggregate tables in the data

warehouse

3) Hybrid OLAP (HOLAP)

� Hybrid On-Line Analytic Processing (HOLAP) is a mixture of MOLAP and ROLAP technologies.

� For summary type query, HOLAP leverages cube technology for faster performance. When detail information is needed, it can drill through from the

54

information is needed, it can drill through from the cube into the underlying relational database.

� Cubes stored as HOLAP are smaller than equivalent MOLAP cubes and respond quicker than ROLAP cubes for queries involving summary data.

� HOLAP storage is generally suitable for cubes that require rapid query response for summaries based on a large amount of base data

55

� in order to deliver the combined strengths of MOLAP

and ROLAP technologies, HOLAP systems must comply

with the following rules

� Fast access at all levels of aggregation (MOLAP

requirement)

Fast access at all levels of aggregation (MOLAP

requirement)

� Easy aggregate maintenance (MOLAP requirement)

� Compact aggregate storage (MOLAP requirement) -

for high-level aggregates in order to economize disk

space

56

� Dynamically updated dimensions (ROLAP

requirement) - real time access to the data itself

and to rapidly changing structures

� Multidimensional view based on RDBMS metadata

(ROLAP requirement) - should point to the

Multidimensional view based on RDBMS metadata

(ROLAP requirement) - should point to the

appropriate RDBMS tables and automatically

generate required SQL statements when

modifying the multidimensional view. It reduces

development time and maintenance

HOLAP Available tools

57

� Express from Oracle,

� IBM DB 2 OLAP Server,

� Microsoft OLAP Services,

� Sagent Holos� Sagent Holos

Typical Architecture for HOLAP Tools58

59

� HOLAP Advantages

� Combined advantages of both MOLAP and ROLAP

(for a full list, look at the MOLAP and ROLAP

sections)sections)

� Can combine the ROLAP technology for sparse

regions and MOLAP for dense regions. Also ROLAP

for storing the detailed data and MOLAP for

higher-level summary data

60

� HOLAP disadvantages

� Complex - HOLAP server must support both

MOLAP and ROLAP engines and tools to combine

both storage engines and operationsboth storage engines and operations

� Functionality overlap - between storage and

optimization techniques in ROLAP and MOLAP

engines

4) Desktop OLAP (DOLAP)

� Desktop On-Line Analytic Processing (DOLAP) is

single-tier, desktop-based OLAP technology.

� It is able to download a relatively small hypercube

from a central point, usually from data mart or data

warehouse, and perform multidimensional analyses

61

warehouse, and perform multidimensional analyses

while disconnected from the source.

62

� Data sets are limited to the boundaries defined by

the user with no access to granular data.

� In general, cubes contain summarized data,

organized in a fixed structure of dimensions.

Therefore, it is ideal for well-understood, recurring

organized in a fixed structure of dimensions.

Therefore, it is ideal for well-understood, recurring

analytic questions and reporting

� As with multi-dimensional databases on the server,

OLAP data may be held on disk or in RAM, however,

some DOLAP products allow only read access.

� Most vendors of DOLAP exploit the power of

desktop PC to perform some, if not most, multi-

63

Most vendors of DOLAP exploit the power of

desktop PC to perform some, if not most, multi-

dimensional calculations.

Available tools

64

� Cognos,

� Business Objects,

� Brio,

� Crystal Decisions, � Crystal Decisions,

� Hummingbird,

� Oracle

Typical Architecture for DOLAP Tools

65

66

� DOLAP advantages

� User friendly - user can pivot and manipulate data

locally from the returned result set stored on the

desktopdesktop

� Excellent query performance - it collects, aggregates,

and calculates data in advance of the analysis

� Low cost per seat and maintenance

� Useful for mobile users who cannot always connect

to the data warehouse

� Easiest to deploy among all OLAP approaches.

67

� DOLAP disadvantage

� Limited functionality and data capacity

Reports and Queries

� Reports

� Routine reports

� Ad hoc (or on-demand) reports

� Multilingual support

68

Multilingual support

� Scorecards and dashboards

� Report delivery and alerting

�Report distribution through any touchpoint

�Self-subscription as well as administrator-based distribution

�Delivery on-demand, on-schedule, or on-event

�Automatic content personalization

Reports and Queries

� Ad hoc query

A query that cannot be determined prior to the

moment the query is issued

� Structured Query Language (SQL)

69

� Structured Query Language (SQL)

A data definition and management language for

relational databases. SQL front ends most relational

DBMS

Multidimensionality

� Multidimensionality

The ability to organize, present, and analyze data by

several dimensions, such as sales by region, by

product, by salesperson, and by time (four

70

product, by salesperson, and by time (four

dimensions)

� Multidimensional presentation

� Dimensions

� Measures

� Time

Multidimensionality

� Multidimensional database

A database in which the data are organized

specifically to support easy and quick

multidimensional analysis

71

multidimensional analysis

� Data cube

A two-dimensional, three-dimensional, or higher-

dimensional object in which each dimension of the

data represents a measure of interest

Multidimensionality

� Cube

A subset of highly interrelated data that is organized

to allow users to combine any attributes in a cube

(e.g., stores, products, customers, suppliers) with any

metrics in the cube (e.g., sales, profit, units, age) to

72

metrics in the cube (e.g., sales, profit, units, age) to

create various two-dimensional views, or slices, that

can be displayed on a computer screen

Multidimensionality

73

Multidimensionality

� Multidimensional tools and vendors

� Tools with multidimensional capabilities often work

in conjunction with database query systems and

other OLAP tools

74

other OLAP tools

Multidimensionality

75

Multidimensionality

� Limitations of dimensionality

� The multidimensional database can take up significantly

more computer storage room than a summarized relational

database

Multidimensional products cost significantly more than

76

� Multidimensional products cost significantly more than

standard relational products

� Database loading consumes significant system resources

and time, depending on data volume and the number of

dimensions

� Interfaces and maintenance are more complex in

multidimensional databases than in relational databases

Advanced BA

� Data mining and predictive analysis

� Data mining

� Predictive analysis

Use of tools that help determine the probable

77

Use of tools that help determine the probable

future outcome for an event or the likelihood of a

situation occurring. These tools also identify

relationships and patterns

Data Visualization

� Data visualization

A graphical, animation, or video presentation of data and the results of data analysis

� The ability to quickly identify important trends in corporate and market data can provide competitive

78

corporate and market data can provide competitive advantage

� Check their magnitude of trends by using predictive models that provide significant business advantages in applications that drive content, transactions, or processes

Data Visualization

� New directions in data visualization

� In the 1990s data visualization has moved into:

� Mainstream computing, where it is integrated with

decision support tools and applications

79

decision support tools and applications

� Intelligent visualization, which includes data

(information) interpretation

80

81

Housing and povertyTraffic in Madrid

Data Visualization

� New directions in data visualization

� Dashboards and scorecards

� Visual analysis

� Financial data visualization

83

� Financial data visualization

Geographic Information Systems (GIS)

An information system

that uses spatial data,

such as digitized maps.

A GIS is a combination

of text, graphics, icons,

84

of text, graphics, icons,

and symbols on maps

Geographic Information Systems (GIS)

� As GIS tools become increasingly sophisticated and affordable, they help more companies and governments understand:

85

understand:

� Precisely where their trucks, workers, and resources are located

� Where they need to go to service a customer

� The best way to get from here to there

Geographic Information Systems (GIS)

� GIS and decision making

� GIS applications are used to improve decision

making in the public and private sectors including:

�Dispatch of emergency vehicles

86

�Dispatch of emergency vehicles

�Transit management

�Facility site selection

�Drought risk management

�Wildlife management

� Local governments use GIS applications for used

mapping and other decision-making applications

Geographic Information Systems (GIS)

� GIS combined with GPS

� Global positioning

systems (GPS)

Wireless devices that

87

Wireless devices that

use satellites to enable

users to detect the

position on earth of

items (e.g., cars or

people) the devices

are attached to, with

reasonable precision

Geographic Information Systems (GIS)

� GIS and the Internet/intranets

� Most major GIS software vendors provide Web access that hooks directly to their software

� GIS can help the manager of a retail operation determine where to locate retail outlets

88

determine where to locate retail outlets

� Some firms are deploying GIS on the Internet for internal use or for use by their customers (locate the closest store location)

top related