sas visual analytics overview -...
TRANSCRIPT
Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
大數據之資料分析
P.001
Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
課程大綱 大數據之資料分析一、大數據與資料科學家1.資料科學家的角色2.資料科學家需具備技能
二、大數據資料分析之概念三、大數據之資料分析 - 以金融業為例1.金融資料視覺化分析2.客戶分群模型3.客戶回應模型3.3.1 決策樹分析3.3.2 邏輯斯迴歸分析3.3.3 類神經網路分析3.3.4 文字採礦預測模型
4.金融業務時間序列預測分析
四、大數據平台之建置1.大數據之衝擊與挑戰2.建置大數據平台之分析性資料集3.建置大數據平台最完整之架構4.3.1 線上及時決策
Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
THE DATA SCIENCE TEAM
THE PLAYERS
Data Engineering
Data Services
Data Management
Scientist
Production Development
資料科學家
P.023
Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
THE CHALLENGES AND STUMBLING BLOCKS…
New tools are needed to
process rich data sources
“Economies of Scale” data
techniques don’t applyData source needs vary
from problem to
problem
Data access (& quality)
requires significant time
Not all data comes
from internal
systems
資料科學家
P.029
Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
Big Data is EVERYWHERE
17% of the world’s population used a social networking site in 2011.
Twitter logs 100 million Tweets per day.
Facebook counts 350 million unique visitors per day.
60 hours of video is transferred to YouTube every 60 seconds.
80% of companies use social media for recruitment.
資料分析之趨勢
P.003
Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
ONLY ANALYTICS CAN HANDLE THE COMPLEXITY
10 Mio. Decisions
12 Bill. Decisions
600 Mio.Decisions
資料分析之趨勢
P.006
Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
WHAT IS BIG DATA?
SAS OPINION
Data that exceeds an organization’s conventional database storage or processing capacity
Relative – not absolute
資料分析之趨勢
P.005
Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
THE BUSINESS ANALYSIS CONTINUUM
AnalyzeBuild /
DeployDiscoverModel
Traditional Business
Intelligence Individual Data
Elements
Business
KPIsOperationalized
processing,
reports
Data Science
Relationships
and
Outcomes
Building /
using
algorithms to
forecast
outcome
Logic that
processes data
relationships to
reveal outcomes
資料分析之趨勢
P.030
Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
Big Data Architecture High Performance Analytics
Channel Integration Mobile Execution
金融業範例
P.010
Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
進階分析
BIG DATA
視覺化分析
P.012
大數據資料之分析
Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
樹狀圖-群組資料一目了然
散佈圖-變數間分布狀況
氣泡圖-了解群體間相對性
熱圖-以顏色區分資料密集度
盒鬚圖-顯示變異程度
趨勢圖-了解趨勢變化
地圖-以地理角度檢視資料
圓餅圖-圖型化數值百分比
交叉表格-以聚合方式顯示實際數值
趨勢預測分析 -預測各族群在未來消費趨勢
決策樹分析自動找出規則了解行為模式
相關係數-顯示變數間關係程度
行銷/經營人員
分析人員
自主分析
即時回應
P.050
P.013
1.視覺化分析(VISUAL ANALYTICS )
視覺化分析之常用分析方法
Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
12
1.視覺化分析(VISUAL ANALYTICS )
P.014
視覺化分析之常用分析方法
Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
進階分析
BIG DATA
P.200
資料採礦分析大數據資料之分析
Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
分群分析(SEGMENTATION)
What?
• Rules-based segmentation
• Statistical segmentation
• Clustering
• Self-organising maps
P.017
2.資料採礦(DATA MINING)
Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
分群分析(SEGMENTATION)
P.018
2.資料採礦(DATA MINING)
When no clusters exist,
use the k-means
algorithm to partition
cases into contiguous
groups.
Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
Segmentation資料採礦
P.206
Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
DATA MINING PREDICTIVE MODELLING
What?
• Predict propensity to behave
• Whether a person responds
• Predict value
• Return amount
Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
Sample
▪ 建模資料
Explore
▪ 資料觀察
Modify
▪ 資料修正
Model
▪ 模型建置
Assess
▪ 模型比較
行銷回應模型-模型流程2.資料採礦(DATA MINING)
P.019
Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
PREDICTIVE MODEL
...
Predictions: output
of the predictive
model given a set of
input
measurements
inputs predictions
資料採礦分析
Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
Primary
PREDICTIVE
MODELING TOOLS
Primary
Specialty
Multiple
...
Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
DECISION TREE
PREDICTION RULES
0.0 0.50.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0
0.0
0.5
0.1
0.2
0.3
0.4
0.6
0.7
0.8
0.9
1.0
x1
x2
40%
60%
55%
70%
x1
<0.52 ≥0.52 <0.51 ≥0.51x1
x2
<0.63 ≥0.63
root node
interior node
leaf node
...
Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
LOGIT LINK
FUNCTION
= w0 + w1 x1 + w2 x2^ ^ ^· ·
...
logit
link function
0 1
5
-5
The logit link function transforms
probabilities (between 0 and 1) to
logit scores (between −∞ and +∞).
^
log p
1 – p( )^logit scores
Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
LOGIT LINK
FUNCTION
= w0 + w1 x1 + w2 x2^ ^ ^
· ·
...
^
log p
1 – p( )^
1
1 + e-logit( p )p = ^^
^logit( p )
To obtain prediction estimates, the logit equation is solved for p. ^
=
Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
EXTREME
DISTRIBUTIONS
AND REGRESSIONS
high leverage pointsskewed input
distribution
standard regression
true association
standard regression
true association
Original Input Scale
more symmetric
distribution
Regularized Scale
...
Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
INITIAL CHARACTERISTIC ANALYSIS( 欄位轉換 )
Calculate and examine the key assessment metrics, WOE and IV.
» WOE is used for evaluating how well attributes discriminate for each
given characteristic.
» IV is used for evaluating a characteristic’s overall predictive power.
In some cases, override automatically generated groupings using
options within the Interactive Grouping node so that characteristics
and their attributes conform to good business logic and still have
sufficient predictive power to be entered into a scorecard.
Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
NEURAL NETWORK BINARY
PREDICTION FORMULA
0
1
5-5
-1
tanh
0 1
5
-5
logit
link
function
...
Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
TEXT ANALYTICS 文字採礦的原始資料
Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
文字採礦的流程
原始資料 斷字截詞 篩選文字 文件歸類
TEXT ANALYTICS
Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
我 想要 辦 一 張 免 年費 的 信用卡
篩選文字:將無意義字刪
去
TEXT ANALYTICS 文字採礦的流程
Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
TEXT ANALYTICS 文字採礦的篩選結果
Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
SAS Text Analytics Solution
SAS提供完整文字分析解決方案
Enterprise
Reporting
Text
MiningData
Management /
Information
Retrieval
Social Network
Analysis
Sentiment
Analysis
Content
CategorizationCrawler
1
2
3
4
1 2 3 4 提供正負評價分析
提供社群網絡脈絡分析,發掘意見領袖
文字採礦系統主動分類,找出重要討論議題(Discovery
Driven)
依據分類自行定義範本,把資料歸類(Domain
Driven)
TEXT ANALYTICS
Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
進階分析
BIG DATA
大數據之分析(數值預測)
P.243
Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
Line of Business A
Cumulative Cost / Unit by MOB
0, 3, 6, & 12 MIS
Line of Business A
Cumulative Cost / Unit by MOB
0, 3, 6, & 12 MIS + Estimate
若能加上預估 - 實際+預測數值預測
P.245
Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
情境分析
1
1
情境分析可以幫助我們瞭解外部因素在不同水準的時候,毛利會跟著產生什麼變動。
試著將氣溫拖拉高一點,看看毛利的預測值有什麼變化。
P.371
數值預測
Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
Successful solutions
must look at the
Whole Picture
建置大數據資料集
P.032
大數據資料之架構
Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
Business Process
IntegrationAnalyticsData
Big Data
IS YOUR BUSINESS IS SITTING ON A BIG DATA TIME BOMB?
Big Data Big Data
Most organizations
stop here
建置大數據資料
P.033
大數據資料之架構
Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
SAS ENTERPRISE GUIDE INTERFACE:
DRAG & DROP
A project is a
single file that
serves as
a collection of
data sources
SAS programs
and logs
tasks and queries
results
informational notes
for documentation.
You can control the contents, sequencing, and updating of a project.
建置大數據資料
P.121
Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
Products with total profits exceeding $500 were identified. Analysts
need more details about these top products, including the product
name, category, supplier, and country. These columns come from
three different tables.
topproducts products country
_lookup
建置大數據資料 SAS ENTERPRISE GUIDE INTERFACE:
JOIN RESULTS
P.126
Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
SAS PROGRAMS TO CREATE COMPLEX ANALYTICAL DATA MART
•A SAS program is a sequence of one or more steps.
• DATA steps typically create SAS data sets.
• PROC steps typically process SAS data sets to generate reports and graphs,
and to manage data.
DATA Step PROC Step
建置大數據資料建置 SAS 第四代程式語法 :高效率之檔案合併
P.127
Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
大檔 ,有唯一索引,Customer_ID
值不重覆
小檔 ,無索引,Customer_ID 值可重覆
建置大數據資料 SAS 第四代程式語法 :高效率之檔案合併
P.128
Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
P.040
大數據資料之最佳化架構
WHAT IS APACHE HADOOP?
The Apache™ Hadoop™ project develops software for ,
, computing.
The Apache Hadoop software library is a that allows for the distributed
processing of using a
.
It is designed to scale up from single servers to , each
offering . Rather than rely on hardware to deliver
high-availability, the library itself is designed to detect and handle failures at the
application layer, so delivering a highly-available service on top of a cluster of
computers, each of which may be prone to failures.
建置大數據資料集在HADOOP分散式資料庫
Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
Hadoop – Leverage your SAS goodness
Next-Generation
SAS®
User
SAS® LASR™ Analytic
Server & SAS® High-
Performance
Analytics
MapReduce
HDFS
Base SAS & SAS/ACCESS® Interface to Hadoop™
SAS Metadata
Pig Hive
In-Memory
Data Access
SAS® Display Manager SAS® Visual AnalyticsSAS® Enterprise
Miner™
SAS® Data
Integration
SAS®
Enterprise
Guide®
User
Interface
Metadata
Data
Access
Data
Processing
File
System
SAS®
User
MPI Based
大數據資料平台與HADOOP之整合
P.041
大數據資料之最佳化架構
Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
Browser
Source
Data
VISUAL ANALYTICS FLOW
ANALYTICS CLIENTS
In-Memory
Analytical
Engine
Hadoop
Mobile
Load
SAS TierAction
Action
Action
Load
P.175
大數據資料之最佳化架構
Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d .
從物聯網走向智聯網
IOT + STREAMING ANALYTICS = AOT
… 讓資料在收集與傳送過程中,便立即被處理與洞察
IoT at the edge. By 2018, 40% of IoT-created data will be stored, processed,
analyzed and acted upon close to, or at the edge, of the network.
IDC FutureScape: Worldwide Internet of Things 2015 Predictions