why sas 9 business intelligence quick access to multiple data source on multiple platforms (data...

26
Why SAS 9 Business Intelligence Quick access to multiple data source on multiple platforms (Data source simplification) Centralized management of metadata objects which reduces replication of data. To name a few metadata objects such as: Libraries Datasets (Relational database and OLAP Cubes) Formats Servers (Different servers which are tuned for different purposes) Users/Groups Applications etc. Less maintenance and single point of control of the sas environment for admin tasks Very Highly secured Integrated platforms for business intelligence, analytics and web reporting Planning prior to Implementation, Systematic approach and Process based Various functional and technical business intelligence tools packaged with SAS 9 software Architecture – client tier, middle tier, server tier and the data tier

Upload: corey-tate

Post on 17-Dec-2015

220 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: Why SAS 9 Business Intelligence  Quick access to multiple data source on multiple platforms (Data source simplification)  Centralized management of metadata

Why SAS 9 Business Intelligence

Quick access to multiple data source on multiple platforms (Data source simplification)

Centralized management of metadata objects which reduces replication of data. To name a few metadata objects such as:

Libraries

Datasets (Relational database and OLAP Cubes)

Formats

Servers (Different servers which are tuned for different purposes)

Users/Groups

Applications etc.

Less maintenance and single point of control of the sas environment for admin tasks

Very Highly secured

Integrated platforms for business intelligence, analytics and web reporting

Planning prior to Implementation, Systematic approach and Process based

Various functional and technical business intelligence tools packaged with SAS 9 software

Architecture – client tier, middle tier, server tier and the data tier

Page 2: Why SAS 9 Business Intelligence  Quick access to multiple data source on multiple platforms (Data source simplification)  Centralized management of metadata

SAS 9 Business Intelligence

SAS Business Intelligence Tools is an integrated system of software solutions that enables you to perform the following tasks:

Data Entry, Retrieval, and Management

Statistical and Mathematical Analysis

Operations Research and Project Management

Report Writing and Graphics Design

Applications Development

Data Access and Management

Data Analysis

Data Presentation

Pre-requisites to learn SAS BI Tools:

Base SAS / SAS Macros / SAS SQL

Data Warehousing Concepts ( Dimensional Data Modeling Designs, Approach to building data

warehouses, OLAP Cubes )

SAS Business Intelligence Architecture

Page 3: Why SAS 9 Business Intelligence  Quick access to multiple data source on multiple platforms (Data source simplification)  Centralized management of metadata

SAS Business Intelligence Tools – Course Contents

SAS Business Intelligence Tools:

SAS Enterprise Guide

SAS Management Console

SAS Data Integration Studio (ETL Studio)

SAS OLAP Cube Studio

SAS Information Map Studio

SAS Add – In for Microsoft Office

SAS Stored Process

SAS Web Report Studio

Additional Topics:

Data Warehouse Introduction

Steps to Design Data Warehouse

Components of Data Warehouse

Approaches to Data Warehousing

Business Intelligence Introduction

SAS Business Intelligence Architecture

Introduction to Metadata

Dimensional Data Models – Star and Snowflake Designs

Types of Slowly Changing Dimensions

Page 4: Why SAS 9 Business Intelligence  Quick access to multiple data source on multiple platforms (Data source simplification)  Centralized management of metadata

Data WarehouseData Warehouse is an Informational Delivery System. Enterprise data is integrated and transformed into information suitable for strategic decision making.

A Data Warehouse is a Subject Oriented, Separate, Available, Integrated, Time-Stamped and Non-Volatile collection of data in support of management’s decision.

Page 5: Why SAS 9 Business Intelligence  Quick access to multiple data source on multiple platforms (Data source simplification)  Centralized management of metadata

Database Data Warehouse

Process based Subject based

Tables(Master, Transaction, Detailed), Relation and Constraints, Rules

Dimensions and Facts(Measures)

Dimensions: provide angle of analysis(performing analysis on a subject). They are always non-numeric.

Measures: provide scope of analysis. They are created to compare the performance of dimensions and are always numeric.

Maintain Transactions and Operations details Metadata Operations (Run-Time support)

Time Dependant Time Variant

Volatile (transactions back-up – daily, hrly, minutes, seconds)

Non-Volatile (upated during batch run either daily, monthly, half-yearly)

Insert, Update and Delete operations Insert operations only(update sometimes)

Refreshed after intervals Never refreshed, only accumulation of data

Frequently used by employees of an organisation(Front-end users)

Used for decision making, research and analysis purpose (Decision makers, Executive management, Researchers, Report producers)

Always connected to the business system Kept away from business system

Difference between Database and Data Warehouse

Page 6: Why SAS 9 Business Intelligence  Quick access to multiple data source on multiple platforms (Data source simplification)  Centralized management of metadata

Introduction

Business Intelligence is a broad category of application programs and technologies for gathering, storing, analyzing, and providing access to data to help enterprise users make better business decisions.

Business Intelligence

SAS Clients(Tools) Interacting with a Centralized Metadata Server

Page 7: Why SAS 9 Business Intelligence  Quick access to multiple data source on multiple platforms (Data source simplification)  Centralized management of metadata

SAS Intelligence Architecture

Page 8: Why SAS 9 Business Intelligence  Quick access to multiple data source on multiple platforms (Data source simplification)  Centralized management of metadata

Client Tier

Run in a Java run-time environment(JRE) and are installed on the machine where they will be used.

Java Client Applications

SAS ETL Studio

SAS OLAP Cube Studio

SAS Information Map Studio

SAS Enterprise Miner

Java Clients

Run in the Microsoft Windows environment and are installed on the machine where they will be used.

Windows Client Applications

SAS Enterprise Guide

SAS Add-In for Microsoft Office

Windows Clients

Run in a browser by connecting to Java Application Server or Servlet Container on the middle-tier. Only the Web browser is installed on the local machine.

Web Applications

SAS Web Report Studio

SAS Information Delivery Portal

Browser Clients

Page 9: Why SAS 9 Business Intelligence  Quick access to multiple data source on multiple platforms (Data source simplification)  Centralized management of metadata

Middle TierThe Web Applications reside and execute on the Middle Tier. This Tier also contains the infrastructure that supports the execution of these applications, such as the

Java Application Server or the Java Servlet Container

SAS Web Infrastructure Kit

webDAV Server

Server TierSAS Servers are installed on this tier and accessed by the BI Tools.

Metadata Server – enables centralized metadata delivery and management to SAS applications across the enterprise

Workspace Server – executes SAS code on behalf of the client applications

Stored Process Server – executes and delivers results from SAS Stored Processes

OLAP Server – delivers pre-summarized cubes of data to OLAP clients

Page 10: Why SAS 9 Business Intelligence  Quick access to multiple data source on multiple platforms (Data source simplification)  Centralized management of metadata

Metadata

Two types of Metadata that is commonly referenced:

Technical Metadata: This describes the data’s physical characteristics and processes

Business Metadata: This simplifies the understanding and use of data and applications for business users(business rules)

Advantages of Centralized Metadata Repository Define the metadata once and use it anywhere

Share metadata across all applications(SAS Tools)

Single Point of Control

Reduce errors and inconsistencies

Page 11: Why SAS 9 Business Intelligence  Quick access to multiple data source on multiple platforms (Data source simplification)  Centralized management of metadata

Data WarehouseSteps for Designing Data Warehouse

Data Designing (Entity Relation design to acquire database knowledge)

Warehouse Designing (Dimensional Modelling (Star or Snowflake Schema))

OLAP Process Designing

Data Warehouse Implementation

Note: Data Warehouse is a combination of ETL Process + OLAP Process

Data Warehouse Components:

Data Warehouse

Data Cubes

Data Marts

Dimensions & Measures

Data Warehouse

Data Warehouse

HRHR AcctsAcctsSalesSales

MktgMktg DistDist StockStock

Dimensions & Measures / Source DB

Data Cubes

Data Marts

Page 12: Why SAS 9 Business Intelligence  Quick access to multiple data source on multiple platforms (Data source simplification)  Centralized management of metadata

HRHR

AcctsAccts

SalesSales

HRHR

AcctsAccts

SalesSales

Data Warehouse

Data Warehouse

Source DatabaseSource

DatabaseData MartsData Marts

Data Warehouse

Data Warehouse

Bottom-Up ApproachBottom-Up Approach

HRHR

AcctsAccts

SalesSales

Data Warehouse

Data Warehouse

Source DatabaseSource

DatabaseData

WarehouseData

WarehouseData MartsData Marts

Top-Down ApproachTop-Down Approach

HRHR

AcctsAccts

SalesSales

Approaches to Data Warehouse Designing

Top – Down Approach: Build Enterprise-Wide data first and later build Data Marts from Data Warehouse. This type of approach takes long time to build and has a high risk of failure. It also requires experienced professionals to build this type of approach otherwise it could be dangerous

Bottom – Up Approach: The priority is set to each data mart and the departmental marts are built one-by-one. The data is fragmented but it is faster and easier to build and the risk exposure to failure is less.

Page 13: Why SAS 9 Business Intelligence  Quick access to multiple data source on multiple platforms (Data source simplification)  Centralized management of metadata

Normalized Relational Data Model (Entity-Relationship)

Page 14: Why SAS 9 Business Intelligence  Quick access to multiple data source on multiple platforms (Data source simplification)  Centralized management of metadata

Dimensional Data Model

Page 15: Why SAS 9 Business Intelligence  Quick access to multiple data source on multiple platforms (Data source simplification)  Centralized management of metadata

Data Designing

SAS supports two types of schemas:

Star Schema

Snowflake Schema

Star SchemaA Star Schema can be depicted as simple STAR style of a data warehouse schema. It consists of a few fact tables(possibly only one) referencing any number of dimensional tables.

The dimensions consists of simple Primary Key and Fact tables consists of a set of Foreign Keys which combines the tables

Dimensional tables in Star Schema are highly denormalized data. The query processing is fast since they are less joins.

Hierarchies for the dimensions are stored in the dimension table itself.

Inside a Dimensional Table:

Dimensional Table Key (Primary Key)

Large number of textual attributes(wide)

Attributes that are not directly related

Denormalized data

Ability to drill down / roll up

Multiple Hierarchies

Less number of records

Inside a Dimensional Table:

Dimensional Table Key (Primary Key)

Large number of textual attributes(wide)

Attributes that are not directly related

Denormalized data

Ability to drill down / roll up

Multiple Hierarchies

Less number of records

Inside a Fact Table:

Concatenated Fact Table Key (Foreign Key)

Grain or Level of Data Identified

Fully Additive Measures

Semi-Additive Measures

Very few Attributes

Degenerate Dimension

Inside a Fact Table:

Concatenated Fact Table Key (Foreign Key)

Grain or Level of Data Identified

Fully Additive Measures

Semi-Additive Measures

Very few Attributes

Degenerate Dimension

Page 16: Why SAS 9 Business Intelligence  Quick access to multiple data source on multiple platforms (Data source simplification)  Centralized management of metadata

Example 1Example 1

Example 2Example 2

Star Schema Representation

Page 17: Why SAS 9 Business Intelligence  Quick access to multiple data source on multiple platforms (Data source simplification)  Centralized management of metadata

Snowflake SchemaA Snowflake Schema represents a dimensional model which consists of central fact table surrounded by dimensional tables which are further normalized into sub-dimensional tables.

Dimensional tables in Snowflake schema are normalized

Hierarchies are broken down into separate tables in the Snowflake schema. The query processing will be slow since they are many joins involved.

Snowflake Schema Representation

Example 1Example 1

Example 2Example 2

Page 18: Why SAS 9 Business Intelligence  Quick access to multiple data source on multiple platforms (Data source simplification)  Centralized management of metadata

Star Schema Snowflake Schema

Single Module / Domain based Multiple Module / Multiple Domain based

DSS for a business module DSS for entire business system

Fewer joins and faster query processing Complex joins and slower processing time

Very few foreign keys More foreign-keys

Single Hierarchy based Multiple Hierarchy based since dimensional tables are normalised into sub-dimensional tables

Dimensional table does not have any parent table

Dimensional table has one or more parent table

Dimensional tables are highly de-normalized (2 Normal De-normalized form)

Dimensional tables are normalized (3 Normal form)

Single Goal Oriented Single Subject Oriented

Its application is widely used for Financial Product Dimension for banks, insurance companies because each of the individual product has a host of specific attributes not shared by other products.

Difference between Star and Snowflake Schema

Page 19: Why SAS 9 Business Intelligence  Quick access to multiple data source on multiple platforms (Data source simplification)  Centralized management of metadata

Dimensional TableA dimension table is a table in a star or snowflake schema that stores attributes that describe aspects of a dimension. For example, a time table stores the various aspects of time such as year, quarter, month, and day. A foreign key of a fact table references the primary key in a dimension table in a many-to-one relationship.

Conformed Dimensions: It is a single, consistent view of the same piece of data throughout the organization. It means that same dimensional table is used in subsequent star schemas in the data warehouse. For example Product dimension is shared between two fact tables sales and inventory, then the attributes of the product dimension must have the same meaning(attributes) in relation to each of the two fact tables.

Junk Dimension: A "junk" dimension is a collection of random transactional codes, flags and/or text attributes that are unrelated to any particular dimension. The junk dimension table is simply a structure that provides a convenient place to store the junk attributes.

Degenerate Dimension: In simple terms when a fact table has a dimensional data value stored it is called degenerate dimension, since fact table doesn’t store textual items. The decision to use degenerate dimensions is often based on the need to provide a direct reference back to a transactional system without the overhead of maintaining a separate dimension table. For example reference numbers like order_number, invoice_number in fact table.

Slowly Changing Dimension: are dimension tables that have data that slowly changes. For example Product Category for Product may change after sometime in the product dimensional table. Points to consider for SCD:

• A large amount of dimensions are generally constant over time

• A lot of dimensions, though not constant over time, change slowly

• The primary key of the source record in dimensional table does not change. The description and other attributes change slowly over time

• In the OLTP systems, the new values overwrite the old values. It is not a right option to overwrite the dimensional attributes in a data warehouse

Page 20: Why SAS 9 Business Intelligence  Quick access to multiple data source on multiple platforms (Data source simplification)  Centralized management of metadata

Slowly Changing Dimension Type1: Correction of Errors

These changes usually relate to the correction of errors in the source system, suppose a spelling error in the customer name is corrected to read as Ramchander from the erroneous entry of Ramachander. Considering this change there is no need to preserve the old values in the dimensional table of the data warehouse. Principles for Type1 changes:

• Generally changes relate to correction of errors in source systems

• The old value in the source system is not needed

• The change in the source system need not be preserved in the data warehouse

Cust_key Cust_code Cust_nameMarital Status

Address State Zip Code

021234549 C123 Ramachander Single Sec-Bad AP 500026

Cust_key Cust_code Cust_nameMarital Status

Address State Zip Code

021234549 C123 Ramchander Single Sec-Bad AP 500026

Before

After

Page 21: Why SAS 9 Business Intelligence  Quick access to multiple data source on multiple platforms (Data source simplification)  Centralized management of metadata

Slowly Changing Dimension Type2: Preservation of History

There is a need to preserve historical data with the current data in the source system. Assume that the marital status of a customer changes from single to married, the history data along with the current data should be preserved in the data warehouse since analysis include to track orders by the attribute marital status. Principles for Type2 changes:

• Generally relate to true changes in the source systems

• There is a requirement to preserve history in the data warehouse

• Every change for the same attribute must be preserved

• In the data warehouse, add the new dimensional table row with the new value of the changed attribute with an effective date field when the change happened

• The key of the original row is not affected in the dimensional table

• The new row is inserted with a new surrogate key in the dimensional table

Cust_key Cust_code Cust_nameMarital Status

Address State Zip CodeTime

Stamp

021234549 C123 Ramchander Single Sec-Bad AP 500026 01Jan2004

034234488 C123 Ramchander Married Sec-Bad AP 500026 25May2010

New surrogate key

History value preserved

Time of Change

Page 22: Why SAS 9 Business Intelligence  Quick access to multiple data source on multiple platforms (Data source simplification)  Centralized management of metadata

Slowly Changing Dimension Type3: Tentative Soft Revisions

This type of change is updated by moving an old value into a new column and then writing a new value into the column that contains the most recent value i.e. current. The concern with this type of approach is that, over the years, as the column values continue to change, only the latest change is stored. This type of change has limited history preservation, as it’s limited to the number of columns we designate for storing historical data. Principles for Type3 changes:

• Generally relate to tentative changes in the source systems

• There is a need to keep track of history with old and new values of the changed attribute

Method for applying Type3 changes in the data warehouse:

• Add an “Old” column in the dimensional table for the affected attribute

• Push down the existing value of the attribute from the “Current” column to the “Old” column

• Keep the new value of the attribute in the “Current” column

• Add an “effective date” column for the attribute

• The key of the row is not affected

Emp_Key Emp_Id Emp_Name Emp_Dept Old_Salary Current_Salary Eff_Date

5678 E101 Amit Sharma Finance 19500.00 01Apr2007

Before

After

Emp_Key Emp_Id Emp_Name Emp_Dept Old_Salary Current_Salary Eff_Date

5678 E101 Amit Sharma Finance 19500.00 22500.00 01Jan2010

Page 23: Why SAS 9 Business Intelligence  Quick access to multiple data source on multiple platforms (Data source simplification)  Centralized management of metadata

Fact TableA fact table is a table in a star or snowflake schema that stores facts(metrics) that measure the business, such as sales, cost of goods, or profit. Fact tables also contain foreign keys to the dimension tables. These foreign keys relate each row of data in the fact table to its corresponding dimensions and levels.

The centralized table in the star schema is called the fact table

Fact tables are often defined by their grain (Eg: Sales volume by Day by Product by Store, Additional dimensions(location, region) can be members of the fact table

A fact table consists of detailed level facts or facts that have been aggregated(summary tables)

Steps in Designing a Fact Table

Identify a business process for analysis(order tracking)

Identify measures or facts(Order Dollars, Cost Dollars, Margin Dollars, Sales Units)

Identify dimensions for facts(product, time, customer, sales person)

List the columns(attributes) that describe each dimension(Product Dimension – Product Name, Product Code, Product Line, Brand)

Determine the lowest level(granularity) of summary table in fact table(Sales Units)

Data GranularityData Granularity represents the level of detail in the fact table. When the fact table is kept at the lowest grain, the users can drill-down to the lowest level of detail from the data warehouse without the need to go to the operational systems.

Page 24: Why SAS 9 Business Intelligence  Quick access to multiple data source on multiple platforms (Data source simplification)  Centralized management of metadata

Star Schema for Order TrackingStar Schema for Order Tracking

Fact Table Design for Order Tracking

NormalizationIn relational database designs, Normalization usually involves dividing a database into two or more tables and defining relationships between the tables.

De-NormalizationDenormalization is the process of attempting to optimize the performance of a database by adding redundant data or by grouping data

Page 25: Why SAS 9 Business Intelligence  Quick access to multiple data source on multiple platforms (Data source simplification)  Centralized management of metadata

Types of Facts(Measures)There are three types of Facts or Measures:

Additive: Measures that can be summed up through all of the dimensions in the fact table

Semi-Additive: Measures that can be summed up for some of the dimensions in the fact table, but not the others

Non-Additive: Measures that cannot be summed up for any of the dimensions present in the fact table

Date

Store

Product

Sales_Amt

Retailer Fact TableRetailer Fact TableExample for Additive Fact:Assume a Retailer Fact Table consists of Date, Store, Product and Sales_Amt . The purpose of this table is to record the sales amount for each product in each store on a daily basis. Sales_Amount is the fact. In this case, Sales_Amount is an additive fact, because you can sum up this fact along any of the three dimensions present in the fact table -- date, store, and product. For example, the sum of Sales_Amount for all 7 days in a week represent the total sales amount for that week.

Date

Account

Current_Balance

Profit_Margin

Bank Fact TableBank Fact TableExample for Semi-Additive Fact and Non-Additive:The purpose of this table is to record the current balance for each account at the end of each day, as well as the profit margin for each account for each day. Current_Balance and Profit_Margin are the facts. Current_Balance is a semi-additive fact, as it makes sense to add them up for all accounts (what's the total current balance for all accounts in the bank?), but it does not make sense to add them up through time (adding up all current balances for a given account for each day of the month does not give us any useful information). Profit_Margin is a non-additive fact, for it does not make sense to add them up for the account level or the day level

Page 26: Why SAS 9 Business Intelligence  Quick access to multiple data source on multiple platforms (Data source simplification)  Centralized management of metadata

S.No SAS Components Description SAS Techno-Functional Tools

1 Base SAS Base SAS software is the core of SAS system. Base SAS, Advanced SAS, SAS Macros, Sas Sql

2 Data Access & ManagementBase SAS software is used to manipulate the data to get data in the desired format and structure.

Base SAS, Advanced SAS, SAS Macros, Sas Sql

3 User InterfacesNon-Programmer user interfaces to SAS (Manipulate data by point and click).

SAS Enterprise Guide

4 Application DevelopmentQuickly and cost-effectively develop and customize intelligence applications by using a variety of programming languages and platform choices.

SAS Appdev Studio

5 Web EnablementBuilding mission-critical applications and make knowledge based decision via the Web.

SAS/IntrNet, SAS Appdev Studio

6 Business Solutions SAS provides BI tools for End-to-End solutions :  

    Data Designing SAS Warehouser Administrator

    Data Warehousing SAS Data Integration Studio (SAS ETL)

    Data Processing SAS OLAP

    Data Analysis SAS EIS

    Data Mining Enterprise Miner, SAS INSIGHT

7 Visualization & Discovery Effective presentation of analytic results Base SAS, SAS Stats, SAS Graphs, JMP

8 AnalyticalApply methods and generate reports and graphs. Statistical Analysis, Technical Analysis, Operational Analsysis, Operational & Mathematical Analysis

SAS STATS, SAS ETS, SAS OR

9 Reporting & Graphics Present data in the desired ouput i.e. graphs and reports SAS Report, SAS Graphs, SAS EIS

SAS Components