® ibm software group © ibm corporation business intelligence ibm architecture et direction...
TRANSCRIPT
®
IBM Software Group
© IBM Corporation
Business Intelligence IBM Architecture et direction Isabelle Claverie-Bergé
Certified IT Specialist
IBM Software Group
IBM Software Group
Un Investissement fort …..
Rapprocher la Business Intelligence temps réel de l’utilisateur et intégration dans les SOAs IBM DB2 RTI IBM Information Server IBM MDM
Des solutions avec nos Partenaires Pour aider nos clients à se développer 16,000+ partenaires et un programme de support
PartnerWorld Consulting and System Integrators Centre de solution BI à Dallas, Singapore, Tokyo,
Hursley
Etendre la valeur de l’entrepôt de données Famille DB2 - $1B Investissement dans l’Innovation
DB2 V9.5 “Viper 2”, DB2 RTI , IBM Dynamic Warehouse , IBM Omnifind Analytics IBM BCU, DB2 Data Warehouse Edition V9
IBM Software Group
Absorber la croissance
IBM Software Group
Several logical partitions can be on the same machine
Physical or Logical Partitioning is transparent to the database.
Shared nothing (function shipping) Each Partition accesses only its
local Data
One database can reside on several separate computers
Database Catalog on partition 0, DB catalog cache on the other partitions
Fast communication needed
(Gigabit Ethernet, Switch)
IBM DB2 Multi-Partition Concept
IBM Software Group
Large Wireless CarrierUnderstanding customers in real time
Business Challenge 360 Degree Customer View Unified Customer Contact Information Churn prediction
Solution Warehouse w/ Near Real-time Feeds
Load over 1B Call Records/Day (up to 1.6B)
10 Billion transactions per day 32 TB Raw Data
1,000s of Concurrent Users 7,000 Customer Care Users Up to 37000 queries/day
DB2 DWE, SAS, 16x8 P5 pSeries
Call Data Records
ContinuousData Load
Business Benefits Fraud Detection < 4 hours Campaign Responses Up 66-300% Margin per Customer up 20%
171 TBinc.
HA Mirror
DB2
Technology Benefits Scale & Performance
Users Volatility Data
IBM Software Group
DB2 Delivers Data The Way You Need ItFlexible data partitioning
DISTRIBUTE BY HASH
PARTITION BY RANGE
ORGANIZE BY DIMENSIONS
East West East West East West East West East West East West
North South North South North South North South North South North South
Node 1 Node 2 Node 3
TS1 TS2 TS1 TS2 TS1 TS2
T1 Distributed across 3 database partitions
Jan Feb Jan Feb Jan Feb
DistributeDistribute
PartitionPartition
(V9)(V9)
OrganizeOrganize
World’s Richest Slice & Dice Capability
CompressCompress
IBM Software Group
DataData
No Partitioning
IBM Software Group
Distribute by HashDivide & Conquer Parallelism
P1P1 P2P2 P3P3 P4P4
IBM Software Group
Hash + Partition by Range - Partition EliminationMassive Parallelism with Massive IO Reduction
P1P1 P2P2 P3P3 P4P4
2006
2005
IBM Software Group
P1P1 P2P2 P3P3 P4P4
2006
2005
Hash + Range + MDC+ CompressionHigh density, High Value, Low IO Reads
IBM Software Group
Two types of Parallelism Intra-partition parallelism => parallel processing within one partition Inter-partition parallelism => operations are executed in parallel on each database partition
Scalabilité: SQL Query performance proportional to number of partitions (BW environment)
Tous les ordres SQL UPDATE, DELETE, INSERT, JOINS, GROUP BY, INDEX/TABLE SCANS, SORT
Tools: INDEX Creation, Backup and Restore, Table Reorganization
IBM DB2 Parallel Processing
SELECT ... FROM ...
SELECT ... FROM ...
Database Partition 0
Intra – partition parallelism
Inter – partition parallelism
pro-cess
pro-cess
pro-cess
pro-cess
SELECT ... FROM ...
Database Partition 1
Intra – partition parallelism
pro-cess
pro-cess
pro-cess
pro-cess
Distributed Table
IBM Information Management
© 2007 IBM Corporation
Introducing IBM Balanced WarehouseTM
A fast track to warehousing
Simplicity Predefined configurations for reduced
complexity One number to contact for complete
solution support
Flexibility for growth Add BCUs to address increasing demands
Multiple on-ramps for different needs
Reliable, nonproprietary hardware for reusability
Optimized performance Preconfigured and certified for guaranteed
performance
Based on best practices for reduced risk
Balanced Configuration Unit (BCU)
Preconfigured, pretested allocation of software, storage and hardware to support a specified combination of function and scale
Better than an appliance
Balanced Warehouse
IBM DB2® Warehouse
SIMPLEFLEXIBLE
OPTIMIZED
Reliability & Performance
Extended Insight
Simplicity
DB2 Data Stream Engine Architecture
Shared Memory
RFID Handler
Feed Handler Plug-ins
DSE Query InterfaceExternal Message Bus
Market Data HandlerDB2
Backing Store
DB2 With DPFDB2 Data Stream Engine
• Shared Memory Management.• Data Cache & Persistence• High Availability• Statistics Maintenance• Query processing
Realtime/HistoricalQueries
DSE Feed Processing & Storage
Messages from data source
IBM :10 :20 :30 :40 :50 1:00 00
10 minute windows
TransformTo Internal
Format
StoreEvent
Feed Handler
Events
Metadata
Apply business logic & publish
Shared Memory
Publish derived data –Aggregates, etc…
Update Metadata
Entity
DSE Persistence
Standard Relational Tables Master Detail Schema
– Symbol Table (Entities & Metadata)– Tick Table (Events)
1 Jan 1, 2004 12:00:00
symbol_id tstamp price volume
90.2 1000
1 Jan 1, 2004 12:00:01 90.3 500
2 Jan 1, 2004 12:00:00 21.7 700
2 Jan 1, 2004 12:00:04 21.6 200
1IBM
2
Symbol_name symbol_id
HPQ
dse.trade_symbols
dse.trade_ticks
IBM Software Group
Analytics as Partof a Business ProcessProcess Management
In-line Analytics
17
Dynamic Warehousing Every Person, Every Transaction, Every Asset…
Real-time Access, In-contextInformation Integration
Master Data ManagementIndustry Specific Models
Unstructured Information,Extracted KnowledgeHeterogeneous Content
Search and Text Analytics
Extended Data Warehouse Capabilities
Mixed Workload PerformanceScalability & Configurability
IBM Information Management
© 2007 IBM Corporation
Min
ing
engin
e
Category Item
[Call Taker] James [Date] 2002/08/30[Duration] 10 min.[CustomerID] ADC00123
[product] harddisk[product] NetVista[request] install[service] support
Extractedmetadata
Search, visualization and interactive mining
Call Taker: James Date: Aug. 30, 2002Duration: 10 min.CustomerID: ADC00123
Q: I do not know how to install an additional harddisk in NetVista. I need quick support.
Unstructured data
Structured Data
Original Data
Rich analysis interface for combining structured and unstructured data Combines search, text analytics and data visualization
Unstructured analytics framework Analysis tools
Introducing IBM OmniFind Analytics Edition
Linguisticanalysis
Extended Insight
MDM and Data Warehousing
Master Data Management (MDM) and Data Warehousing (DW) complement each other; they have significant synergies– MDM and DW provide
quality data to the business but MDM is valuable beyond the DW for 2 reasons• Latency• Feedback
Analytic Services (DW Models,
Identity Services & Predictive Analytics )
DataServices
Metadata
– MDM and DW have different use cases• MDM provides a “golden” source of truth that is used collaboratively for authoring,
operationally in the transactional / operational environment and supports the delivery of "quality" Master Data to a DW system
• DW systems are a multidimensional collection of historical transactional data that may be include than Master Data used to determine trends and create forecasts
• Introducing MDM enhances the value of existing DWs by improving data integrity and closing the loop with transaction systems
Event Management Data Quality Management Data Lifecycle Mgmt
IBM Master Data Management
Industry SOA Business Processes
Operational MDMCollaborative MDM Analytical MDM
Customer
Customer / Shipping
Product
Location
Supplier
Account
Event Management Data Quality Management Data Lifecycle Mgmt
IBM Master Data Management
Event Management Data Quality Management Data Lifecycle MgmtEvent Management Data Quality Management Data Lifecycle Mgmt
IBM Master Data Management
Industry SOA Business Processes Industry SOA Business Processes
Operational MDMCollaborative MDM Analytical MDMOperational MDMCollaborative MDM Analytical MDM
Customer
Customer / Shipping
Customer
Customer / Shipping
Product
Location
Product
Location
Supplier
Account
Supplier
Account
IBM Software Group
Fin
IBM Software Group
Sujets de recherche
IBM Master Data ManagementData Federation
Applicable MDM Services allow for federation of data from the MDM domains as well as additional sources
Thus providing the requesting application with all relevant data in synchronized manner
Integrated with IBM DB2 Information Integrator
Example:– Requesting application submits a request for the MDM
“GetParty” service; MDM is configured to initiate retrieval data from a non-master data source using DB2 Data Integrator; this data is included in the response to the requesting application; the data federation activity is transparent to the requesting application
DB2Database(s)
MDM ServiceRequest
Response includes MDM data augmented with data from other sources
DATA
OtherDatabase(s)
RequestingApplication
Event Management Data Quality Management Data Lifecycle Mgmt
IBM Master Data Management
Industry SOA Business Processes
Operational MDMCollaborative MDM Analytical MDM
Customer
Customer / Shipping
Product
Location
Supplier
Account
Event Management Data Quality Management Data Lifecycle Mgmt
IBM Master Data Management
Event Management Data Quality Management Data Lifecycle MgmtEvent Management Data Quality Management Data Lifecycle Mgmt
IBM Master Data Management
Industry SOA Business Processes Industry SOA Business Processes
Operational MDMCollaborative MDM Analytical MDMOperational MDMCollaborative MDM Analytical MDM
Customer
Customer / Shipping
Customer
Customer / Shipping
Product
Location
Product
Location
Supplier
Account
Supplier
Account
Information Server
IBM MDM – Common Components (1/2)
Event Management Data Quality Management Data Lifecycle Mgmt
IBM Master Data Management
Industry SOA Business Processes
Operational MDMCollaborative MDM Analytical MDM
Customer
Customer / Shipping
Product
Location
Supplier
Account
Event Management Data Quality Management Data Lifecycle Mgmt
IBM Master Data Management
Event Management Data Quality Management Data Lifecycle MgmtEvent Management Data Quality Management Data Lifecycle Mgmt
IBM Master Data Management
Industry SOA Business Processes Industry SOA Business Processes
Operational MDMCollaborative MDM Analytical MDMOperational MDMCollaborative MDM Analytical MDM
Customer
Customer / Shipping
Customer
Customer / Shipping
Product
Location
Product
Location
Supplier
Account
Supplier
Account
IBM MASTER DATA MANAGEMENTIBM MASTER DATA MANAGEMENT
IBM MDM – Common Components (2/2)
Event Management Data Quality Management Data Lifecycle Mgmt
IBM Master Data Management
Industry SOA Business Processes
Operational MDMCollaborative MDM Analytical MDM
Customer
Customer / Shipping
Product
Location
Supplier
Account
Event Management Data Quality Management Data Lifecycle Mgmt
IBM Master Data Management
Event Management Data Quality Management Data Lifecycle MgmtEvent Management Data Quality Management Data Lifecycle Mgmt
IBM Master Data Management
Industry SOA Business Processes Industry SOA Business Processes
Operational MDMCollaborative MDM Analytical MDMOperational MDMCollaborative MDM Analytical MDM
Customer
Customer / Shipping
Customer
Customer / Shipping
Product
Location
Product
Location
Supplier
Account
Supplier
Account
Master Master DataData
Master Data RepositoryMaster Data Repository
Lifecycle Management ServicesLifecycle Management Services
Integration ServicesIntegration Services
InformationInformationIntegrityIntegrity
Base ServicesBase Services
Reference DataReference Data
MetadataMetadata
Master DataMaster DataEvent Event
ManagementManagement AuthoringAuthoring
Hierarchy &Hierarchy &RelationshipRelationshipManagementManagement
HistoryHistoryDataData
IBM MASTER DATA MANAGEMENTIBM MASTER DATA MANAGEMENT
IBM Master Data Management
Sophisticated data integration faster implementation time and lower cost of ownership than competitors
IBM Information Server
Understand
Clean
Transform
Deliver
Source Systems
Event Management Data Quality Management Data Governance
IBM Master Data Management
Industry SOA Business Processes
Operational MDMCollaborative MDM Analytical MDM
Customer
Customer / Shipping
Product
Location
Supplier
Account
IBM Software Group
Entrepôt Information
De l’Entrepôt de données à l’Entrepôt d’Information
Enterprise Data Warehouse
Integrated Data Warehouse
ETLETL
MiningMining OLAPOLAP
In-Line AnalyticsIn-Line
Analytics
Analyse d’EntitéAnalyse d’Entité
Master Data ManagementMaster Data Management
IntegrationSOA
IntegrationSOA Industry Models &
SolutionsIndustry Models &
Solutions
L’ Entrepôt d’Information est un entrepôt d’entreprise qui est en mesure de fournir la bonne version de l’information (Single version of the truth) dans son contexte élargi et hébergée dans une base de données unique évolutive .
IBM Software Group
DB2 UDB ESEDB2 UDB ESE
Web-based Administration ConsoleWeb-based Administration Console
Data Modeling
DataTransform
BI Specialist
Data ArchitectBI Designer
Data Mining
OLAPEnablement
In-LineAnalyticsDBA
Integrated Design CenterIntegrated Design Center
L’entrepôt d’information
EDW
ODS
Predict DFs
Stored Procs
Triggers
MQs
EventEvent
DSS Applications
Iden
tify
Lang
uage
Fin
d W
ords
& R
oots
Cat
egor
izat
ion
Plu
g In
Ann
otat
or
Plu
g In
Ann
otat
or
ExtractedMetadataand Facts
Text Data Warehouse
Rules
Engine
Any Application
Search Applicatio
n
Reports
Search
Index
WebSphere II OmniFind Edition
Plu
g In
Ann
otat
or
Plu
g In
Ann
otat
or
Entreprise Information Integration
IBM Software Group
Et plus …
abc…DB2IBM
ContentManager
Oraclexyz…
Heterogeneous Applications & Information
Intelligence Temps réel et Flux Tableaux de bord Outils & Applications
Information as a Service
Données & Contenu
Information
Intelligence
Temps réel : e.g., Aide en ligne adaptée, Synchronisation de données de réference …
Extracteion: e.g. Basel II, Optimisation Business …
Basé sur les Standards : e.g., XQuery, JSR170, JDBC, Web Services...
Gestion des méta-données
Le service Information Du mode projet à une architecture flexible (SOA)
IBM Software Group
Sujet 1 : UIMA Collection Processing Engine (CPE)Collection Processing Engine (CPE)
Aggregate Analysis Engine Aggregate Analysis Engine
Collection
Reader
Collection
ReaderText, Chat,
Email, Audio, Video
Text, Chat, Email, Audio,
Video
Analysis Engine Analysis Engine
AnnotatorAnnotator
Analysis Engine Analysis Engine
AnnotatorAnnotator
CASCAS
CAS ConsumerCAS Consumer
CAS ConsumerCAS Consumer
CAS ConsumerCAS Consumer
OntologiesOntologies
SearchEngineIndex
SearchEngineIndex
DBsDBs
KnowledgeBases
KnowledgeBases
CASCAS
CAS InitializerCAS Initializer
CASCAS
Identify Relevant Entities → Build StructurePeople, Places, Organizations, RelationshipsParts, Problems, Conditions Topics, Products, Interests, SentimentTimes, Events, Threats, Plots, Associations
Identify Relevant Entities → Build StructurePeople, Places, Organizations, RelationshipsParts, Problems, Conditions Topics, Products, Interests, SentimentTimes, Events, Threats, Plots, Associations
IBM Software Group
How OmniFind Enables UIMA Solutions
Iden
tify
Lang
uage
Fin
d W
ords
& R
oots
Par
ts o
f S
peec
h
EnhancedMetadata
Provides a supported UIMA implementation to deliver text analytics capabilities
OmniFind
Crawlers Parsing Base Annotators Indexing
OmniFindIndex
Searching
Text
IBM Software Group
Collection Processing Engine Collection Processing Engine
How OmniFind Enables UIMA Solutions
Iden
tify
Lang
uage
Fin
d W
ords
& R
oots
Par
ts o
f S
peec
h
EnhancedMetadata
ExternalData Store
Provides a supported UIMA implementation to deliver text analytics capabilities
Third Party Annotators
OmniFindIndex
Nam
ed-e
ntity
ext
ract
ion
Iden
tify
Rel
atio
nshi
ps
Third Party Applications
Text
OmniFind
IBM Software Group
Sujet 2 : Stockage XML dans DB2 9 PureXML
Collection Processing Engine Collection Processing Engine
Iden
tify
Lang
uage
Fin
d W
ords
& R
oots
Par
ts o
f S
peec
h
EnhancedMetadata
Third Party Annotators
OmniFindIndex
Nam
ed-e
ntity
ext
ract
ion
Iden
tify
Rel
atio
nshi
ps
Third Party Applications
Text
OmniFind
IBM Software Group
39
Data Source N
Data Warehouse
ETL
Data Source 1
Reference Architecture for Event-Driven Middleware
DBMS
App 1 App N
Short-term storage
Long-term storage
…Intelligent, Time-
dependent, Pub/Sub, and Routing Hub
…
DB Tradeoffs for Event-Handling
1. Latency for Consistency
2. Throughput for Persistence
ESB responsible for:
High-throughput data handling
Low-latency messaging and routing
IBM Software Group
Requirements for Event-Driven Applications
Responsiveness
Event Throughput
(events/sec
/server)
Event Processing Language
Richness Ease of Use
Scalability
Hard real-time
(deterministic, us)100,000’s
Inductive reasoning
- Untrained patterns
- Trained patterns
Internet scale:
100,000’s endpoints
Soft real-time
(scheduled, ms) 10,000’s
Tools for integrating content behavior models
Collaborating domains
Near real-time
(< sec) 1000’sIntegration with processes, workflows
Tools for distributed deployment
Managed ESB with event services
Transactional
OLTP 100’s General multi-stream pattern specifications
Tools for designing event flow
Event server clusters
Data Mining
OLAP 10’s Sequences, thresholds, groups
Simple event pattern tool support
Single server
Data Warehouse1’s
Message at a time filter/route
Incr
easi
ng
Cap
abili
ty
IBM Software Group
Responsiveness
Event Throughput
(events/sec
/server)
Event Processing Language
Richness Ease of Use
Scalability
Hard real-time
(deterministic, us)100,000’s
Inductive reasoning
- Untrained patterns
- Trained patterns
Internet scale:
100,000’s endpoints
Soft real-time
(scheduled, ms) 10,000’s
Tools for integrating content behavior models
Collaborating domains
Near real-time
(< sec) 1000’sIntegration with processes, workflows
Tools for distributed deployment
Managed ESB with event services
Transactional
OLTP 100’s General multi-stream pattern specifications
Tools for designing event flow
Event server clusters
Data Mining
OLAP 10’s Sequences, thresholds, groups
Simple event pattern tool support
Single server
Data Warehouse1’s
Message at a time filter/route
Middleware for Time-Dependent Internet TrafficIn
crea
sin
g C
apab
ility
Internet Traffic
IBM Software Group
Responsiveness
Event Throughput
(events/sec
/server)
Event Processing Language
Richness Ease of Use
Scalability
Hard real-time
(deterministic, us)100,000’s
Inductive reasoning
- Untrained patterns
- Trained patterns
Internet scale:
100,000’s endpoints
Soft real-time
(scheduled, ms) 10,000’s
Tools for integrating content behavior models
Collaborating domains
Near real-time
(< sec) 1000’sIntegration with processes, workflows
Tools for distributed deployment
Managed ESB with event services
Transactional
OLTP 100’s General multi-stream pattern specifications
Tools for designing event flow
Event server clusters
Data Mining
OLAP 10’s Sequences, thresholds, groups
Simple event pattern tool support
Single server
Data Warehouse1’s
Message at a time filter/route
Middleware for RFID ApplicationsIn
crea
sin
g C
apab
ility
RFID for retail, distribution, manufacturing
IBM Software Group
Responsiveness
Event Throughput
(events/sec
/server)
Event Processing Language
Richness Ease of Use
Scalability
Hard real-time
(deterministic, us)100,000’s
Inductive reasoning
- Untrained patterns
- Trained patterns
Internet scale:
100,000’s endpoints
Soft real-time
(scheduled, ms) 10,000’s
Tools for integrating content behavior models
Collaborating domains
Near real-time
(< sec) 1000’sIntegration with processes, workflows
Tools for distributed deployment
Managed ESB with event services
Transactional
OLTP 100’s General multi-stream pattern specifications
Tools for designing event flow
Event server clusters
Data Mining
OLAP 10’s Sequences, thresholds, groups
Simple event pattern tool support
Single server
Data Warehouse1’s
Message at a time filter/route
Middleware for Surveillance ApplicationsIn
crea
sin
g C
apab
ility
Surveillance Markets
IBM Software Group
Responsiveness
Event Throughput
(events/sec
/server)
Event Processing Language
Richness Ease of Use
Scalability
Hard real-time
(deterministic, us)100,000’s
Inductive reasoning
- Untrained patterns
- Trained patterns
Internet scale:
100,000’s endpoints
Soft real-time
(scheduled, ms) 10,000’s
Tools for integrating content behavior models
Collaborating domains
Near real-time
(< sec) 1000’sIntegration with processes, workflows
Tools for distributed deployment
Managed ESB with event services
Transactional
OLTP 100’s General multi-stream pattern specifications
Tools for designing event flow
Event server clusters
Data Mining
OLAP 10’s Sequences, thresholds, groups
Simple event pattern tool support
Single server
Data Warehouse1’s
Message at a time filter/route
Middleware for Financial ServicesIn
crea
sin
g C
apab
ility
Financial market information and program trading
IBM Software Group
Intelligence Application
Intelligence (applied knowledge)Control
Knowledge (fact relationships)
Information (facts)
Data (streams)
Signal (sensors)
Daily Internet Traffic Volume
2002: 23 PB
2007: 647 PB (est.)
1999: 610 Billion Emails (11 PB)
2002: 11 Trillion Emails
2006: 22 Trillion Emails (est.)
Telephony
2002: 187 Billion minutes
Emerging VoIP
Instant Messaging
2002: 41 Million users
2003: 275 Million usersE-mail, Voice, Image, Video, IMS, TV/Radio Broadcast, Web Traffic, etc.
IBM Software Group
Streaming Data Example: Soccer
Ball, players, and referees are RF tagged (26 transmitters)
Position and speed data are streamed to RTL/DSE (± 1.5 cm, 100K messages/s)
RTL/DSE stores time-stamped data in database at the rate of 7K-12K messages/sec
Prototyped and planned for use in World Cup Soccer 2006
Time-stamped data history
Events
SQL queries
DB2
Informix Dynamic Server
Periodic writes to database
In-Memory DatabaseInformix
Real-time Loader
(RTL/DSE)
IBM Software Group