zhangxi lin texas tech university isqs 6339, data mgmt & bi 1 isqs 6339, data management &...

39
Zhangxi Lin Texas Tech University ISQS 6339, Data Mgmt & BI 1 ISQS 6339, Data Management & Business Intelligence Introduction

Upload: gordon-whitehead

Post on 22-Dec-2015

226 views

Category:

Documents


4 download

TRANSCRIPT

Zhangxi LinTexas Tech University

ISQS 6339, Data Mgmt & BI1

ISQS 6339, Data Management & Business Intelligence

Introduction

\\TechShare\coba\d\isqs3358

ISQS 6339, Data Mgmt & BI2

Outline

ISQS 6339, Data Mgmt & BI3

Big DataDefinitions of BICategorizations of BIBI TrendBI tools

What is Business Intelligence

ISQS 6339, Data Mgmt & BI4

A Simple Definition: The applications and technologies transforming Business Data into Action

Business intelligence (BI) is a business management term refers to applications and technologies which are

used to gather, provide access to, and analyze data and information about their company operations.

Business intelligence systems can help companies gain more comprehensive knowledge of the factors affecting their business, and help companies to make better business decisions.

YouTube: What is BI? 2’Microsoft Business Intelligence Surface Demo 6’ 34”

Data, information, and knowledge

ISQS 6339, Data Mgmt & BI5

Data – a collection of raw value elements or facts used for calculating, reasoning, or measuring.

Information – the result of collecting and organizing data in a way that establishes relationship between data items, which thereby provides context and meaning

Knowledge – the concept of understanding information based on recognized patterns in a way that provides insight to information.

Online Video What is business intelligence? 10’ 36”Retail and Big Data Revolution, 2’12”Big data, 7’ 12”Big data terms, 31’ 19”

Driving force - Big DataA collection of data sets so large and complex

that it becomes awkward to work with using on-hand database management tools.

Difficulties include capture, storage, search, sharing, analysis, and visualization.

The trend to larger data sets is due to the additional information derivable from analysis of a single large set of related data, as compared to separate smaller sets with the same total amount of data.

8/14/20127 Copyright 2012

ISQS7339, Fall 20128

Zettabyte (ZB)A quantity of information or information

storage capacity equal to 1021 bytes or 1,000 exabytes.

As of April 2012, no storage system has achieved one zettabyte of information. The combined space of all computer hard drives in

the world was estimated at approximately 160 exabytes in 2006.

Seagate reported selling 330 exabytes worth of hard drives during the 2011 Fiscal Year.

As of 2009, the entire World Wide Web was estimated to contain close to 500 exabytes. This is a half zettabyte.

1,000,000,000,000,000,000,000 bytes = 10007 bytes = 1021 bytes

9

Data Scale

10

Market"Big data" has increased the demand of information

management specialists - major companies have spent more than $15 billion for this.

This industry is worth more than $100 billion and growing at almost 10% a year.

4.6 billion mobile-phone subscriptions worldwide and between 1 billion and 2 billion people accessing the internet.The world's effective capacity to exchange information

through telecommunication networks was 281 petabytes in 1986, 471 petabytes in 1993, 2.2 exabytes in 2000, 65 exabytes in 2007

It is predicted that the amount of traffic flowing over the internet will reach 667 exabytes annually by 2013.

8/14/201211 Copyright 2012

Approach - Cloud Computing Cloud computing is the

use of computing resources (hardware and software) that are delivered as a service over a network (typically the Internet). The name comes from the use of a cloud-shaped symbol as an abstraction for the complex infrastructure it contains in system diagrams. Cloud computing entrusts remote services with a user's data, software and computation.

Buzzword: SaaS/IaaS/PaaS

ISQS 6339, Data Mgmt & BI12

Distributed business intelligence

Deal with big data – the open & distributed approachLAMPHadoopMapReduceHDFSNOSQLZookeeperStorm

ISQS7339, Fall 201213

Apache Hadoop An open-source software framework for storage and

large scale processing of data-sets on clusters of commodity hardware.

The Apache Hadoop framework is composed of the following modules :Hadoop Common - contains libraries and utilities needed by other

Hadoop modulesHadoop Distributed File System (HDFS).Hadoop YARN - a resource-management platform responsible for

managing compute resources in clusters and using them for scheduling of users' applications.

Hadoop MapReduce - a programming model for large scale data processing.

Apache Hadoop's MapReduce and HDFS components originally derived respectively from Google's MapReduce and Google File System (GFS) papers.

ISQS 6339, Data Mgmt & BI14

A Multi-node Hadoop Cluster

ISQS 6339, Data Mgmt & BI15

ISQS 6339, Data Mgmt & BI16

ISQS 6339, Data Mgmt & BI17

ISQS 6339, Data Mgmt & BI18

ISQS 6339, Data Mgmt & BI19

ISQS 6339, Data Mgmt & BI20

ISQS 6339, Data Mgmt & BI21

ISQS 6339, Data Mgmt & BI22

Hadoop 2: Big data's big leap forward The new Hadoop is the Apache Foundation's attempt

to create a whole new general framework for the way big data can be stored, mined, and processed.

The biggest constraint on scale has been Hadoop’s job handling. All jobs in Hadoop are run as batch processes through a single daemon called JobTracker, which creates a scalability and processing-speed bottleneck.

Hadoop 2 uses an entirely new job-processing framework built using two daemons: ResourceManager, which governs all jobs in the system, and NodeManager, which runs on each Hadoop node and keeps the ResourceManager informed about what's happening on that node.

ISQS 6339, Data Mgmt & BI23

MapReduce 2.0 – YARN(Yet Another Resource Negotiator)

ISQS 6339, Data Mgmt & BI24

The process of BI

ISQS 6339, Data Mgmt & BI25

Data -> information -> knowledge -> actionable plans

Data -> information: the process of determining what data is to be collected and managed and in what context

Information -> knowledge: The process involving the analytical components, such as data warehousing, online analytical processing, data quality, data profiling, business rule analysis, and data mining

Knowledge -> actionable plans: The most important aspect in a BI process

Actionable Knowledge

ISQS 6339, Data Mgmt & BI26

An information asset retains its value on if the converted knowledge is actionable.Need some methods for extracting value from

knowledgeThis is not a technical issue but an organizational

one – need empowered individuals in the organization to take the action

There is an issue of Return on Investment (ROI)

BI Problems

ISQS 6339, Data Mgmt & BI27

StructuredDetecting Credit card fraudSetting Loan parametersMarket segmentation/Mass customizationDeciding Marketing mixCustomer ChurnReducing employee turnover Improving Quality/Efficiency …

UnstructuredData explorationUtilization of resources (stored knowledge) to maximum

effectiveness…

BI Applications

ISQS 6339, Data Mgmt & BI28

Customer AnalyticsCustomer profilingTargeted marketingPersonalizationCollaborative filteringCustomer satisfactionCustomer lifetime valueCustomer loyalty

Sales Channel AnalyticsMarketingSales performance and pipeline

BI Applications (2)

ISQS 6339, Data Mgmt & BI29

Supply Chain AnalyticsSupplier and vendor managementShippingInventory controlDistribution analysis

Behavior AnalysisPurchasing trendsWeb activityFraud and abuse detectionCustomer attritionSocial network analysis

The Evolution of Business Intelligence

ISQS 6339, Data Mgmt & BI30

1st Generation – Traditional analytics (query and reporting)

2nd Generation – Traditional generation (OLAP, data warehousing)

2.5nd Generation – New traditional generation3rd Generation - Advanced analytics

Rules, predictive analytics and realtime data miningStream analytics

ISQS 6339, Data Mgmt & BI31

Business Intelligence Classifications

Traditional Analytics1st Generation Analytics (Query & Reporting)

2nd Generation Analytics (OLAP, Data Warehousing)

Advanced Analytics/OptimizationRules

Predictive AnalyticsReal-time and traditional Data Mining

Stream Analytics*Real-time, continuous, sequential analysis(ranging from basic to advanced analytics)

* In lieu of stream analytics, “embedded analytics,” although architecturally different, could potentially play the same role

3rd-Generation BI

Legacy BI

“New Traditional” Analytics“2.5-Gen” Analytics (In-Memory OLAP, Search-Based)

Source: Bill O’ ConnellIBM, Aug 2007

ISQS 6339, Data Mgmt & BI32

Business Intelligence Use Cases

Traditional Analytics1st Generation Analytics (Query & Reporting)

2nd Generation Analytics (OLAP, Data Warehousing)

Advanced Analytics/OptimizationRules

Predictive AnalyticsReal-time and traditional Data Mining

Stream Analytics*Real-time, continuous, sequential analysis(ranging from basic to advanced analytics)

* In lieu of stream analytics, “embedded analytics,” although architecturally different, could potentially play the same role

“New Traditional” Analytics“2.5-Gen” Analytics (In-Memory OLAP, Search-Based)

Example Target Solutions: Fraud Detection / Risk CRM Analytic Supply Chain Optimization RFID / Spatial Data Other High-VolumeFocus on what is

happening RIGHT NOW

Real-Time Threshold

Focus on what will happen

Analytic applications that apply statistical relationships in the form of RULES

Focus on what did happen

Turning data into information is limited by the relationships which the end-user already knows to look for.

Data mining to determine why something happened by unearthing relationships that the end-user may not have known existed.

Source: Bill O’ ConnellIBM, Aug 2007

Data Center - The Headquarter of Big Data

Case of BaoCloud Center at Shanghai

The land for data center at Shanghai

34

Customizable Data Center

Baocloud data center

38

39