business intelligence technologies

Upload: tanvis1

Post on 30-May-2018

219 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/14/2019 Business Intelligence Technologies

    1/29

    First page

    Business IntelligenceTechnologies

    Donato MalerbaDipartimento di Informatica

    Universit degli Studi, Bari, Italy

    [email protected]

    http://www.di.uniba.it/malerba

  • 8/14/2019 Business Intelligence Technologies

    2/29

    First page

    Business Intelligence

    s Business Intelligence is a global term

    for all the processes, techniques and

    tools that support business decision-

    making based on informationtechnology.

    s The approaches can range from a

    simple spreadsheet to a major

    competitive undertaking.s Data mining is an important new

    component of business undertaking.

  • 8/14/2019 Business Intelligence Technologies

    3/29

    Business IntelligenceTechnologies

    Data Sources

    Paper, Files, Information Providers, Database Systems, OLTP

    Data Warehouses / Data Marts

    Data Exploration

    OLAP, DSS, EIS, Querying and Reporting

    Data Mining

    Information discovery

    Data Presentation

    Visualization Techniques

    Decision

    Making

    Increasing potentialto support

    business decisions

    End User

    Business

    Analyst

    Data

    Analyst

    DB

    Admin

  • 8/14/2019 Business Intelligence Technologies

    4/29

    First page

    Business Processess

    Data for support decision making

    decisional

    processes(agreement with a credit card)

    managementprocesses

    (grant a loan)

    operational

    processes(transaction on

    bank account

    (Ex.: Banking)

    DSS o EIS

    MIS

    TPS

    s Different information systems support the different processes

  • 8/14/2019 Business Intelligence Technologies

    5/29

    First page

    DSS vs. EISs Decision Support Systems (DSS) and

    Executive Information Systems (EIS):

    information systems designed to help

    managers in making choices.

    s Different, yet interrelated applicationss A DSS focuses on a particular decision,

    whereas an EIS provides a much wider

    range of information (e.g., information on

    financials, on production history, and on

    external events).

    s DSSs appeared in the 1970s

    s EISs appeared in the 1980s.

  • 8/14/2019 Business Intelligence Technologies

    6/29

    First page

    DSS vs. EISs The original EISs did not have the

    analytical capabilities of a DSS

    s An EIS is used by senior managers to

    find problems; the DSS is used by the

    staff people to study them and to offeralternatives (Rockart and Delong, 1988)

    EIS DSS

  • 8/14/2019 Business Intelligence Technologies

    7/29

    First page

    Where do Data ComeFrom?

    s The EISs and DSSs often lacked a strong

    database component.

    s Most organizational information gathering

    was (and is) directed to maintaining

    current (preferably on-line) informationabout individual transactions and

    customers.

    s Managerial decision making requires

    consideration of the past and the future,not just the present.

    s New databases, called data warehouses,

    were created specifically for analytic use

  • 8/14/2019 Business Intelligence Technologies

    8/29

    First page

    A Data Warehouse is ...

    A data warehouse is ax subject-oriented,

    x integrated,

    x time-variant, and

    x nonvolatile

    collection of data in support of managementsdecisionsInmon, W.H.

    Building the Data WarehouseWellesley, MA: QED Tech. Pub. Group,1992

  • 8/14/2019 Business Intelligence Technologies

    9/29

    First page

    subject-oriented ...

    s The data in the warehouse is definedand organized in business terms, and is

    grouped underbusiness-orientedsubject headings, such as

    x customers

    x products

    x sales

    rather than individual transactions.

    s Normalization is not relevant.

  • 8/14/2019 Business Intelligence Technologies

    10/29

    First page

    integrated ...s The data warehouse contents are defined such that

    they are valid across the enterprise and its operationaland external data sources

    Operational systems

    Data warehouse

    s The data in the warehouse should be

    x clean

    x validated

    x properly integrated

  • 8/14/2019 Business Intelligence Technologies

    11/29

    First page

    time-variant ...

    s All data in the data warehouse is time-

    stamped at time of entry into the

    warehouse or when it is summarized

    within the warehouse.s This chronological recording of data

    provides historical and trend analysis

    possibilities.

    s On the contrary, operational data is

    overwritten, since past values are not of

    interests.

  • 8/14/2019 Business Intelligence Technologies

    12/29

    First page

    nonvolatile ...

    s Once loaded into the data warehouse, the

    data is not updated.

    s Data acts as a stable resource for

    consistent reporting and comparativeanalysis.

    s On the contrary, operational data is

    updated (inserted, deleted, modified).

  • 8/14/2019 Business Intelligence Technologies

    13/29

    First page

    Which Data in theWarehouse?

    s

    A data warehouse contains five types ofdata:

    x Current detail data

    x Old detail data

    x Lightly summarized data

    x Highly summarized data

    x Metadata

    s Granularityof the data: a key designissue

  • 8/14/2019 Business Intelligence Technologies

    14/29

    First page

    Flow of Data

    Operational

    Environment

    Clean thedata

    Reside in

    warehouse

    Purge

    Summarize

    Archive

  • 8/14/2019 Business Intelligence Technologies

    15/29

    An Example of DataIntegration

    Checking Account System

    Jane Doe (name)

    Female (gender)

    Bounced check #145 on 1/5/95

    Opened account 1994

    Checking Account System

    Jane Doe (name)Female (gender)

    Bounced check #145 on 1/5/95

    Opened account 1994

    Savings Account System

    Jane Doe

    F (gender)

    Opened account 1992

    Savings Account System

    Jane Doe

    F (gender)

    Opened account 1992

    Investment Account System

    Jane Doe

    Owns 25 Shares Exxon

    Opened account 1995

    Investment Account System

    Jane Doe

    Owns 25 Shares Exxon

    Opened account 1995

    Customer

    Jane Doe

    Female

    Bounced check #145

    Married

    Owns 25 Shares ExxonCustomer since 1992

    Customer

    Jane DoeFemale

    Bounced check #145

    Married

    Owns 25 Shares ExxonCustomer since 1992

    Operational

    data

    datawarehouse

  • 8/14/2019 Business Intelligence Technologies

    16/29

    First page

    Cost and Size of a DataWarehouse

    s Data warehouses are expensive

    undertakings (mean cost: $2.2 million).

    s Since a data warehouse is designed for

    the enterprise it has a typical storage

    size running from 50 Gb to over aTerabite.

    s Parallel computingto speed up dataretrieval

    WAREHOUSE SIZE SERVER REQUIREMENTS

    5-50 GB Pentium PC > 100MHz

    50-500 GB SMP machine

    > 500 GB SMP or MPP machine

  • 8/14/2019 Business Intelligence Technologies

    17/29

    First page

    The Data Mart

    s

    A lower-cost, scaled-down version of thedata warehouse designed for the

    strategic business unit (SBU) or

    department level.

    s An excellent first step for manyorganizations.

    s Main problem: data marts often differ

    from department to department.s Two approaches:

    x data marts enterprise-wide systemx data warehousedata marts

    A A hit t f D t

  • 8/14/2019 Business Intelligence Technologies

    18/29

    An Architecture for DataWarehousing

    operational

    databases

    external sources

    data

    warehouse

    extraction

    cleaning

    validation

    summariz.

    data mart

    metadata

    usedby

    EIS

    DSS

    OLAP

    data

    mining

    query

    used

    by

  • 8/14/2019 Business Intelligence Technologies

    19/29

    First page

    On-Line AnalyticalProcessing (OLAP)

    s Term introduced by E.F. Codd (1993) in

    contrast to On-Line Transaction

    Processing (OLTP)

    s

    The OLAP Councils definition:A category of software technology that

    enables analysts, managers and executives

    to gain insight into data through fast,

    consistent, interactive access to a widevariety of possible views of information that

    have been transformed from raw data to

    reflect the real dimensionality of the

    enterprise as understood by the user

  • 8/14/2019 Business Intelligence Technologies

    20/29

    First page

    On-Line AnalyticalProcessing (OLAP)

    s Basic idea: users should be able tomanipulate enterprise data models

    across many dimensions to understand

    changes that are occurring.

    s Data used in OLAP should be in the

    form of a multi-dimensional cube.

    Time

    Marke

    t

    Product

  • 8/14/2019 Business Intelligence Technologies

    21/29

    First page

    DimensionalHierarchies

    s

    Each dimension can be hierarchicallystructured

    Item

    Product

    Type of product

    Day

    Week

    Month

    Year

    Store

    City

    State

    Country

  • 8/14/2019 Business Intelligence Technologies

    22/29

    First page

    OLAP Operations

    s

    Rollup: decreasing the level of details Drill-down: increasing the level of detail

    s Slice-and-dice: selection and projection

    s Pivot: re-orienting the multidimensional view of data

  • 8/14/2019 Business Intelligence Technologies

    23/29

    First page

    Implementing Multi-dimensionality

    s

    Multi-dimensional databases (MDDB)s To make relational databases handle

    multidimensionality, two kinds of tables are

    introduced:

    x Fact table: contains numerical facts. It islong and thin.

    x Dimension tables: contain pointers to the

    fact table. They show where the

    information can be found. A separatetable is provided for each dimension.

    Dimension tables are small, short, and

    wide.

  • 8/14/2019 Business Intelligence Technologies

    24/29

    First page

    Star Schema

    STORE KEYPRODUCT KEY

    PERIOD KEY

    Dollars

    Units

    Price

    STORE KEY

    Store Desc.

    City

    State

    District ID

    District Desc.

    Region ID

    Region Desc.

    Regional Mgr.Level

    PERIOD KEY

    Period Desc.

    Year

    QuarterMonth

    Day

    PRODUCT KEY

    Product Desc.Brand

    Color

    Size

    Manufacturer

    Fact Table

    Time Dimension

    Product Dimension

    Market Dimension

  • 8/14/2019 Business Intelligence Technologies

    25/29

    First page

    MOLAP, ROLAP, DSS

    s

    The OLAP technology is considered anextension of the original DSS technology.s DSS applications are tools that access and

    analyze data in relational database (RDB)

    tables.s OLAP tools access and analyze

    multidimensional data (typically three, up to

    ten-dimensional data).

    s OLAP technology is called MOLAP/ROLAP(multidimensional/relational OLAP) if it uses

    an MDDB/RDB.

  • 8/14/2019 Business Intelligence Technologies

    26/29

    First page

    OLAP/DSS

    s

    OLAP tools focus on providing multi-dimensional data analysis, that is superior to

    SQL in computing summaries and

    breakdowns along many dimensions.

    s

    OLAP tools require strong interaction fromthe users to identify interesting patterns in

    data.

    s An OLAP tool evaluates a precise query that

    the user formulates.s OLAP users are farmers.

    D W h

  • 8/14/2019 Business Intelligence Technologies

    27/29

    First page

    Data Warehouse Data Mining

    The rational to move from the datawarehouse to data mine arises from the

    need to increase the leverage that an

    organization can get from its existing

    warehouse approach.

    After implementing a data mining solution, an

    organization could decide to integrate the solution

    in a broader data-driven approach to businessdecision making. The data warehouse will provide

    an excellent vehicle for such an integration.

    r ca uccess ac ors

  • 8/14/2019 Business Intelligence Technologies

    28/29

    First page

    r ca uccess ac orsfor BusinessApplications

    s Peoplex Find a sponsor for the application

    x Select the right user group

    x Involve a business analyst with domain

    knowledgex Collaborate with experienced data analysts

    s Data

    x Select relatively clean sources of data

    x Select a limited set of data sources (e.g., thedata warehouse)

    r ca uccess ac ors

  • 8/14/2019 Business Intelligence Technologies

    29/29

    First page

    r ca uccess ac orsfor BusinessApplications (cont.)

    s Applicationx Understand business objectives.

    x Analyze cost-benefits and significance

    of the impact on business problem.

    x Consider legal or social issues in

    collecting input data