Эволюция big data и information management. reference architecture
DESCRIPTION
Референсная архитектура Oracle Big Data и эволюционное развитие Information ManagementTRANSCRIPT
Information Management Reference Architecture3rd Evolution
EMEA Enterprise Architecture
Contents
Introduction Conceptual view Design Patterns IM Logical view and component outline Discovery Lab R/T Event Engine logical view Mapping to previous Reference Architecture release
Introduction
Introduction
This PPT documents the main architectural components of Oracle’s Information Management Reference Architecture.
The architecture is intended to be practical and pragmatic, with many of the ideas and experiences that inform the approach dating back almost 20 years in Oracle and are based on real world customer experiences.
We define Information Management to mean the following. Please note that this definition embraces all types and forms of data as well as embracing aspects such as Information Discovery and Governance:
“Information Management is the means by which an organisation maximises the efficiency with which it plans, collects, organises, uses, controls, stores, disseminates, and disposes of its Information, and through which it ensures that the value of that information is identified and exploited to the maximum extent possible”
3rd Evolution of Oracle’s Information Management Reference Architecture
Oracle’s Information Management Reference Architecture (3rd Edition)
More relevant to Big Data oriented audience Better representation of pragmatic customer projects Includes Raw data store as part of the architecture Show effort / cost to store and interpret data that separates schema-on-read and schema-on-write approaches
Aligned to Analytics 3.0 Consistent with Oracle’s engineering efforts
What’s changed?
Aligning analytical requirements and IM architectureEnabling Analytics 3.0 with a pragmatic architecture
Analytics 2.0
Analytics 3.0
Analytics 1.0
• Reporting with limited use of descriptive analytics
• Limited range of tabular data
• Batch oriented analysis • Analysis bolted onto limited
set of business processes
• Firms “Competing on Analytics”• Extended analytics to larger
and less structured datasets• Emergence of Big Data into
the commercial world• Recognition of Data Science
role in commercial orgs.
• Platform for monetisation• Deeper analysis & more data• Faster test-do-learn iterations• Different types of data & wider
business process coverage• Analysts focus on discovery and
driving business value• “Agile” with operational elements
incorporated into design patterns
Adapted from Tom Davenport material
Oracle’s Information Management Reference Architecture (3rd Edition)
“All those layers and definitions in your Reference Architecture, I just don’t get it… and it looks complicated !”
Hadoop developer knee deep in complex Map:Reduce code
What’s changed?
Business Trends
Technology Trends
Data
Trends
Conceptual View
ActionableEvents
Event Engine Data Reservoir
Data Factory Enterprise Information Store
Reporting
Discovery Lab
ActionableInformation
ActionableInsights
InputEvents
Execution
Innovation
Discovery Output
Events & Data
Conceptual View
StructuredEnterprise Data
OtherData
Component OutlineData Engine Respond to R/T events in appropriate and/or optimised fashion
Data Reservoir Raw data Reservoir – typically event data at lowest grain
Data Factory Managed ETL onto, within and between platforms
Enterprise Data Data stores for Information Management
Reporting BI tools and infrastructure components
Discovery Lab Platform, data and tools to support discovery process
Execution – things you do every day
Innovation – innovation to drive tomorrows businessLine of Governance!
Discovery Output
– Possible outputs include new knowledge, mining models / parameters, scored data…
Design Patterns
Design Pattern: Discovery Lab
Specific focus on identifying commercial value for exploitation Small group of highly skilled individuals (aka Data Scientists) Iterative development approach – data oriented NOT development oriented
Wide range of tools and techniques applied Data provisioned through
Data Factory or own ETL Typically separate infrastructure
but could also be unified Reservoirif resource managed effectively
Design Pattern : Information Platform
Build the next generation Information Management platform Either Business Strategy driven or IT cost / capability driven initiative Initial project may be specifically linked to lower data grain or retention
BUT it is the platform as a whole that forms the solution required Platform for consolidating other IM assets onto Key issues related to differences in
procurement, development process,governance and skills differences
Discovery Lab may be implementedas a pragmatic initial POV.
Design Pattern : Data Application
Big Data technologies applied to a specific business probleme.g. Genome sequence analysis using BLAST or log data from pharmaceutical production plant and machinery required for traceabiliy
Limited or no integration to broader Information Management estate Specific solution so Non-functional requirements have less impact
on solution quality or long term costs Platform costs and scalability are
important considerations
Design Pattern: Information Solution
Specific solution based on Big Data technologies requiring broader integration to the wider Information Management estatee.g. ETL pre-processor for the DW or affordably store a lower level of grain
Non-functional requirements more critical in this solution Scalable integration to IM estate
an important factor for success Analysis may take place in Reservoir
or Reservoir only used as an aggregator
Design Pattern: Real-Time Events
May take place at multiple locations between place of data origination and the Data Centre – requiring careful design and implementation
May include Next-Best-Activity, declarative rules and Data Mining technologies to optimise decisions. i.e. optimise across declarative, data mining, customer preference & business-defined rules
May include considerations for personal preferences and privacy(e.g. opt-out) for customer relatedevents
Common component seen acrossmany industries & marketse.g. connected vehicle
Real-Time optimisation of events
Design Pattern against component usage map
Design pattern Discovery Lab Information Platform Data Application Information Solution R/T Events
Outline
Data science labAssess the value of the data
Next Generation information platform to align IM capability with business strategy
Addressing a specific data problem in Hadoop with no broader integration required.
Addressing a specific data problem but requires broader enterprise wide integrations. e.g. ETL pre-processing, Event Store at lower grain than existing DW
Execution platform to respond to R/T events
Examples Gov. HealthcareMobile operator
Spanish Bank (Business led)UK Gov. Dept. (Tech. led)
Pharma Genome projectPharma production archive
Investment Bank – trade riskMobile Operator – ETL processing
Mobile operator – location based offers
Data Engine Possible Yes
Data Reservoir Yes Yes Yes
Data Factory Yes Yes Yes
Enterprise Data Yes
Reporting Yes
Discovery Lab Yes Implied Alternative approach to Reservoir + Factory above
IM Logical View and Components
Information Management – Logical ViewData Sources
Data Ingestion
Methods and process to load data into ourmanaged data store and manage dataquality
• Contemporary Information Management solutions must be able to ingest any type of data from any source in any format and mechanism and at any frequency. e.g. Flat file loads, streaming…
• The data may be highly unstructured, mono-structured or highly poly-structured.• Data will vary in volume and in Data Quality. • Operational isolation should be considered to ensure operational applications will continue in the event of the loss of the
Information Management system
Data Engines & Poly-structured sources
Content
Docs Web & Social Media
SMS
StructuredDataSources
•Operational Data•COTS Data•Master & Ref. Data•Streaming & BAM
Information Management – Logical ViewInformation Ingestion
Data Ingestion
Information Interpretation
Methods and process to load data and manage DataQuality
Methods and process needed to access information
Managed Data
Load
All data under management
Query
• Data structure and processing required to load data into managed data stores• Shape represents the work done on the data to load data and/or process between layers• Layer may include file mechanism where required to facilitate loading
(e.g. Fuse fs or ZFS for operational isolation and file concat)• Normal rules of micro-batch, taking all the data and KISS principles recommended• DQ and loading stats presented through BI dashboards as a non-judgemental mechanism to improve DQ.• Data may be landed in the Ingestion layer to facilitate loading but not typically stored for any length of time. e.g. Raw data loaded from
web logs but sessionised data then loaded to Raw. Another example is data used to manage CDC may be stored in this layer.
Information Management – Logical ViewData Interpretation
Data Ingestion
Information Interpretation
Methods and process to load data and manage DataQuality
Methods and process needed to access information
Managed Data
Load
All data under management
Query
• Methods and processes required to access information in each of the stores• Shape represents the cost of interpreting the data under management• For schema-on-read the cost may include the AVRO, SerDe or reader class as well as the associated processing code to
select, filter and process the data. • For schema-on-write the cost is represented by the complexity of the SQL required to access the data only – more complex
typically for 3NF than for a dimensional query.
Information Management – Logical ViewData Layers – cost, quality and concurrency trade off
Managed DataAccess & Performance Layer
Foundation Data Layer
Raw Data Reservoir
Immutable raw data reservoirRaw data at rest is not interpreted
Immutable modelled data. Business Process Neutral form. Abstracted from business process changes
Past, current and future interpretation of enterprise data. Structured to support agile access & navigation
• Increasing enrichment• Increasing data quality
• Reducing concurrency costs
• Data under management includes 3 key layers – Raw, Foundation and Access and Performance layers.• Data normally loaded into Raw and Foundation layers BUT BI Apps loads data directly into APL and federated warehouses
may well also load data at aggregate level from federated operating companies.• Data Factory is responsible for loading and then managing data between layers. • Work is done to elevate the data between layers – typically further enriching and improving data quality. • Work done in processing the data between the layers significantly reduce query costs. i.e. higher levels of concurrency can be
sustained for the same processing power.
• Increasing formalisation of definition
Information Management – Logical ViewData Layers – Analytical processing
Managed DataAccess & Performance Layer
Foundation Data Layer
Raw Data Reservoir
• Analytical processing capabilities of Hadoop and RDBMS used to elevate data between layers as previously described.• These analytical capabilities can also be leveraged by tools that access the data directly.
Typically this would be by a Data Scientist for Discovery Lab operations or BI Tools and Services that are processing data using a model previously defined by the Data Scientist.
OLAP
Data MiningStatisticsOLAP
Text Mining
OtherAnalytical
Processing
Data MiningText Mining
ImageProcessing
• Increasing enrichment• Increasing data quality
• Reducing concurrency costs
• Increasing formalisation of definition
Information Management – Logical ViewData Layers – Raw Data Reservoir
Managed DataAccess & Performance Layer
Foundation Data Layer
Raw Data Reservoir
Immutable raw data reservoirRaw data at rest is not interpreted
Immutable modelled data. Business Process Neutral form. Abstracted from business process changes
Past, current and future interpretation of enterprise data. Structured to support agile access & navigation
• Immutable data store with data at lowest level of grain.• Typically implemented in Hadoop or NoSQL for cost reasons but not always.• May be:
• Queries directly, • Used to derive base level data for Foundation Layer. Data may be represented logically in Foundation or physically as the
store is immutable BUT this effects ILM policy.• or used to derive values or aggregates for Access and Performance layer. (e.g. propensity score or total monthly SMS’s)
Information Management – Logical ViewData Layers – Foundation Data Layer
Managed DataAccess & Performance Layer
Foundation Data Layer
Raw Data Reservoir
Immutable raw data reservoirRaw data at rest is not interpreted
Immutable modelled data. Business Process Neutral form. Abstracted from business process changes
Past, current and future interpretation of enterprise data. Structured to support agile access & navigation
• Immutable integrated and standardised store of enterprise class data. Stuff the business has agreed and organises around.• Data at lowest level of grain of value for Enterprise data.• Stored in business process neutral fashion to avoid data maintenance tasks to keep in step with current business interpretations. • Typically close to 3NF. Special attention to modelling hierarchy, flexible entity attributions, customer / supplier etc. • ONLY implemented in relational technology BUT this could be logical as previously noted in Raw Data Reservoir.• May be queries directly by a select few individuals. Wider access to detail data provided through views in APL, often with VPD
implemented to prevent queries to antecedent data.• Data in the Foundation Layer should be retained for as long as possible.• Consideration should be given to retaining data in Raw Data Reservoir rather than archiving.
Information Management – Logical ViewData Layers – Access and Performance Layer
Managed DataAccess & Performance Layer
Foundation Data Layer
Raw Data Reservoir
Immutable raw data reservoirRaw data at rest is not interpreted
Immutable modelled data. Business Process Neutral form. Abstracted from business process changes
Past, current and future interpretation of enterprise data. Structured to support agile access & navigation
• Layer facilitates access, navigation and performance of queries.• Allows for multiple interpretations of data from Foundation or Raw data Reservoir.• Most structures can be thrown away and re-built from scratch based on Foundation and Raw Reservoir.• The exception is derived and aggregate data which may have to be retained if the underlying data/mechanism is archived.• Most users presenting information in a standardised fashion on dashboards and reports will access this layer only.
Access and Performance Layer
Information Interpretation
Access & Performance Layer
Foundation Data Layer
Raw Data Reservoir
Data Engines & Poly-structured sources
Content
Docs Web & Social Media
SMS
StructuredDataSources
•Operational Data•COTS Data•Master & Ref. Data•BAM Data
• Data destined for Raw Data Reservoir may be loaded directly (e.g. through Flume) or may be stored temporarily in fs prior to loading (e.g. Fuse fs)
• Relational data ingested in most appropriate mechanism before persisting in Foundation Data Layer (usual rules apply…)• Ideally micro batch using simplest mechanism possible • Only data of agreed quality loaded in FDL• For efficient loading relationally data may be pre-staged in fs so a large number of small files can be concatenated
Information Management – Logical ViewData Factory Ingestion flow
Data IngestionBatch & Real-Time
ETL / ELT
CDC
Stream
File Ops.
Access and Performance Layer
Data Ingestion
Information Interpretation
Access & Performance Layer
Foundation Data Layer
Raw Data Reservoir
Flow shown:1. Data to be formalised from HDFS store extracted and loaded into Foundation Data Layer.
e.g. where Flume/HDFS is being used as an ETL pre-processor for Enterprise Dataor where HDFS data is being logically modelled in the foundation layer
2. Data is re-structured and/or aggregated to facilitate access by users and business processes3. Data may also be re-structured and/or aggregated from HDFS store where there are no specific
requirements to manage Enterprise Data in a more formal data store over time
1
2
3
Information Management – Logical ViewData Factory intra data processing flow
Access and Performance Layer
Information Management – Logical ViewInformation Provisioning – BI & Data Science Components
Vir
tua
lisat
ion
&
Que
ry F
ed
era
tion
Enterprise Performance Management
Pre-built & Ad-hoc BI Assets
InformationServices
Data Ingestion
Information Interpretation
Access & Performance Layer
Foundation Data Layer
Raw Data Reservoir
Vir
tua
lisat
ion
&
Que
ry F
ed
era
tion
• Data Virtualisation and the various components to access the data are as per our previous view on BI tools.• Data Virtualisation is a key components that helps to deliver tools independence, services integration and a future state roadmap• Big Data has focused considerable attention on Data Science• Analytical capabilities delivered through analytical processing in the data layers and Advanced Analytical Tools used to drive capabilities• Data Mining in particular often involves complex data processing to flatten data into a longitudinal form. This derived data and model results
are typically written to a project based sandbox. • Agile discovery is often best served through a separate Discovery Lab infrastructure (see later details)
Data Science
Access and Performance Layer
Information Management – Logical ViewInformation Provisioning BI Flows
Vir
tua
lisat
ion
&
Que
ry F
ed
era
tion
Enterprise Performance Management
Pre-built & Ad-hoc BI Assets
InformationServices
Data Science
Data Ingestion
Information Interpretation
Access & Performance Layer
Foundation Data Layer
Raw Data Reservoir
2
3
1. Typical access mechanism for Enterprise data via Access and Performance layer structures2. Access to Foundation Layer Data to specific functions, processes and users only3. Data interpretation & DQ assured through encoded logic, Avro, SerDe, FileReader, HCat etc.
4. Diagonal flows shows how data can be joined between layers as well as accessed directly. e.g. Raw Data can be queried directly through HIVE connector or joined to the RDBMS data and queried.
1
4
4
Information Management – Logical ViewData / Information Quality
Access and Performance Layer
Data Ingestion
Information Interpretation
Access & Performance Layer
Foundation Data Layer
Raw Data Reservoir
Vir
tua
lisat
ion
&
Que
ry F
ed
era
tion
Enterprise Performance Management
Pre-built & Ad-hoc BI Assets
InformationServices
Data Science
Quality of data at rest assured by a number of factors in addition to the underlying quality of data at source– File and event handling to ensure data is not missed (e.g. missing log files assured by log file sequence numbering)– The processing of data between Raw and FDL / APL layers. This can be seen as a DQ firewall to ensure only data of known and
acceptable quality is loaded. Typically this involves an element of synchronisation as some data will need to be held off until required reference data is available due to the micro-batch incremental loading approach.
Quality of information presented to downstream tools and services determined by– Model quality, understanding and performance of provisioning from modelled layers– Consistency of definition, code quality and query performance when accessing Hadoop data (e.g. HR code, Avro definition…)
Information Management – Logical ViewData Reservoir & Enterprise Information Store
Vir
tua
lisat
ion
&
Que
ry F
ed
era
tion
Enterprise Performance Management
Pre-built & Ad-hoc BI Assets
InformationServices
Data Ingestion
Information Interpretation
Access & Performance Layer
Foundation Data Layer
Raw Data Reservoir
Data Science
Data Engines & Poly-structured sources
Content
Docs Web & Social Media
SMS
StructuredDataSources
•Operational Data•COTS Data•Master & Ref. Data•Streaming & BAM
Immutable raw data reservoirRaw data at rest is not interpreted
Immutable modelled data. Business Process Neutral form. Abstracted from business process changes
Past, current and future interpretation of enterprise data. Structured to support agile access & navigation
Discovery Lab
Analysis Processing & Delivery
Discovery Lab & Data Science Tooling
Data Reservoir & Enterprise Data
Data Science(Primary Toolset)
Statistics Tools
Data & Text Mining Tools
Faceted Query Tools
Programming & Scripting
Data Modeling Tools
Query & Search Tools
Pre-BuiltIntelligence
Assets
IntelligenceAnalysis
Tools
Ad Hoc Query& Analysis Tools
OLAP Tools
Forecasting &Simulation Tools
Reporting Tools
Data Scientist
Vir
tua
lisat
ion
&
Info
rma
tion
Ser
vice
s
Data Factoryflow
1. Data Factory responsible for access provisioning to data or replication (all or sample) to Sandbox in Discovery Lab.
2. Direct connection from Data Science tools and analysis sandbox. Data Science tools read and write data from/to project sandboxes.
3. Data Scientist can also access standard dashboards, reports and KPI’s through Data Virtualisation layer
Data Quality & Profiling
Graphical rendering tools
Dashboards & Reports
Scorecards
Charts & Graphs
Sandbox – Project 3
Sandbox – Project 2
Sandbox – Project 1
1
2Data store Analytical
Processing
3
Information Management – Logical ViewDiscovery Lab data flow
R/T event Engine – Logical View and Components
Real-timeData Engine
To Event Subscribers(Events / Data)
Privacy Filter
Data Transform
Rules & Models
Mediation
Next Best Action
Real-Time Data Store
From Input Events
ReferenceData
Models & Rules
PrivacyData
Analytics
Real-Time Data Engine – Logical View
Business Activity Monitoring
Real-Time event monitoring
Real-Time Data Engine
Message mediation service Privacy filter for event data. i.e. apply customer specified privacy
and preference filters to the data stream Transformation of the message data to outbound form Apply declarative rules and models to the data stream to detect
events for further downstream processing Next Best Activity (NBA) event detection and processing. NBA
typically also includes control group management and global optimisation of rules
Business Activity Monitoring Local data store – local persistence of rules and metadata
Components
Privacy Filter
Data Transform
Rules & Models
Mediation
Next Best Action
Real-Time Data Store
BAM
Real-Time data engine flows
Describe each of the data flows
ReferenceData
Models & Rules
PrivacyData
Event Analytics
From Input Events
To Event Subscribers(Events / Data)
R/T Event Monitoring
To Do
Mapping from the previous release of the architecture
Information Management Reference ArchitectureVersion 2.0 of the Architecture
Information Management Reference Architecture
Interpretation layer shows the relative cost of reading data depending on its location
Previous staging layer now split into Data Ingestion and Raw store.
Ingestion layer includes methods and processes to load data and manage DataQuality. Shape represents the relative cost of these processes. i.e. from none for HDFS to lots in APL.
Raw Reservoir is typically at the lowest level of grain. Often lower than the enterprise cares about and so may not have been included in previous representation.
Renamed from Knowledge Discovery to Discovery Lab but otherwise unchanged.The role of Discovery Labs is becoming more central though so additional operational guidance will be added.
Discovery Lab
Still an immutable store but may be physically implemented in relational or non-relational technologies
Key differences from 2.0 to 3.0 of the Architecture
Discovery Lab and Governance considerations
Data discovery for the Enterprise
Discovery phase– Unbounded discovery– Self-Service sandbox– Wide toolset– Agile methods
Promotion to Exploitation– Commercial exploitation– Narrower toolset– Integration to operations– Non-functional requirements– Code standardisation &
governance
Discovery and monetising steps have different requirements
Business Value
Commercial Exploitation
Time / Effort
Discovery phase
Understanding of the data
Governance
To monetise fully you need to standardiseIt’s smart to standardise as part of Governance
Discovery process requires a broad toolset
Standardisation is essential for Commercial exploitation
Sustainability depends on standardisation / rationalisation
– Reduced training burden– Reduced support costs– Reduced license costs– Ongoing agility & alignment
Data Discovery Toolset Data Exploitation Toolset
Rationalised Components
• Cloudera CDH, Oracle, No-SQL• Mammoth, Yarn, EM-plugin• MR, Hive, Pig, Impala, Accum.• Flume NG, Oozie• …• …• …Optional additions• Oracle Connectors
• Additional corporate standardcomponents
Ora
cle
stan
dard
de
ploy
men
tC
orpo
rate
sta
ndar
d
Standardised Hadoop ZooStandardised deployment
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.52
The kind of things we are looking to Discover
Data science skills requiredvary by the type of analysis
Data Management skills varyby the amount of data and itsstructure
So making data movementand manipulation easy willdeliver a better result and deliver it faster
Descriptive
Diagnostic
Predictive
Prescriptive
Business Impact
Ana
lytic
al S
kills
Insight
Foresight
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.53
Discovery is a Data process not a Development Process
Requirement Analysis
High Level Design
Low Level Design Coding Testing Acceptance
Testing
Three Versions of the BI Development Process
Excel Spreadsheet
Shared linked spreadsheets
Local Access Database
Shared Access Server
SQL Server Database
Oracle Datawarehouse
Discovery & Profile Model Exploit
What IT thinks it should be
What normally happens
What Big Data is trying to achieve
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.54
Sandbox delivery options• Separate Data Lab environment• Delivered as part of Information
Management architecture
Self Service Sandboxes• Self service provisioning of new
sandboxes for Discovery phase• Automation of data access rights,
resources and tools provisioning
Data provision• Quickly take on new data to
rapidly make available to Analysts
• Tools such as “Data Factory” canfully automate data flows
Sandboxes facilitate “Agile”Providing the technology platform for agile discovery
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.55
Monetise and Optimise steps are different
New insights deployed into business process in some form– Technical: e.g. Business rules, new customer segments
– Non-technical: e.g. Observations about behaviours
Business Intelligence systems adapted to provide monitoring, feedback and control optimisation
The faster you iterate this cycle the greater the benefit BUT Big Data does not change the fundamental need
for accurate, consistent and integrated information
What happens when we want to exploit insights?
New insights
Business Process
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.56
Rules of thumb for dataOrganised information leads to better analyses
Information needs to be organised in order to analyse it
RDBMS are great when information is organised
Hadoop minimises the penalty for disorganisation
The closer you are to insight, the more complete and organised information needs to be
Data needs to be organised to monetise it effectively
Copyright © 2013, Oracle and/or its affiliates. All rights reserved.57
What that really means is…
We need to apply structureto data in order to analyse it
Schema on read works wellfor us in Discovery as we canbe agile about interpretation
As we move into Discovery schema on read can causes Governance & quality issues
Key lesson: The cost to store & manage is distinct from structural considerations between Big Data and RDBMS technologies
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 1358
De-mystifying schema on read
DQBus. RulesMapping
ETL
Data Reservoirs
Traditional “Schema on Write”– Data quality managed by formalised ETL
process
– Data persisted in tabular, agreed and consistent form
– Data integration happens in ETL
– Structure must be decided before writing
Big Data “Schema on Read”– Interpretation of data captured in code for
each program accessing the data
– Data quality dependent on code quality
– Data integration happens in code
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 1359
Underlying storage capabilities are differentTooling maturity
Stringent Non-Functionals
ACID transactional requirement
Security
Variety of data formats
Data sparsity
ETL simplicity
Cost effectively store low value data
Ingestion rate
Straight Through Processing (STP)
0
5
HadoopRelationalMy Appllication
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 1360
Analytics 3.0 platform include both relational and non-relational technologies
Ken Rudin* refers to this as the genius of AND vs the tyranny of OR(see his TDWI ‘13 presentation)
Unified Reservoir simplifies access to all data regardless of characteristics & analysis requirements
Insert Chart Here
It’s smart to unify your data into a single ReservoirFully expose your data for discovery and monetisation
Ken Rudin is Director of Analytics at Facebook*
All Data
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 1361
Access and Performance Layer
Information Management – Logical View
Vir
tua
lisat
ion
&
Que
ry F
ed
era
tion
Enterprise Performance Management
Pre-built & Ad-hoc BI Assets
InformationServices
Advanced AnalyticalTools
Information Provisioning Analysis Processing & Delivery
Data Ingestion
Information Access
Access & Performance Layer
Foundation Data Layer
Raw Data Reservoir
Immutable raw data reservoirRaw data at rest is not interpreted
Immutable modelled data. Business Process Neutral form. Abstracted from business process changes
Past, current and future interpretation of enterprise data. Structured to support agile access & navigation
Methods and process to load data and manage DataQuality
Methods and process needed to access information
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 1363
Access and Performance Layer
Information Management – Logical ViewAnalytical processing and delivery
Vir
tua
lisat
ion
&
Que
ry F
ed
era
tion
Enterprise Performance Management
Pre-built & Ad-hoc BI Assets
InformationServices
Advanced AnalyticalTools
Data Ingestion
Information Access
Access & Performance Layer
Foundation Data Layer
Raw Data Reservoir
Structures and processing required to load data (batch and Real-Time) and manage Data Quality
Structures required to interpret the data under management.i.e. logical interpretation
• Data Virtualisation and the various components to access the data are as per our previous view on Bo tools.• Data Virtualisation is a key components that helps to deliver tools independence, services integration and a future state roadmap• What has changed is the focused on Analytics• Analytical capabilities is delivered through analytical processing in the data layers and Advanced Analytical Tools used to drive capabilities• Data Mining in particular often involves complex data processing to flatten data into a longitudinal form. This derived data and model results
are typically written to a project based sandbox. • Agile discovery is often best served through a separate Discovery Lab infrastructure (described later)
OLAP
Data MiningStatisticsOLAP
Text Mining
OtherAnalytical
Processing
Data MiningText Mining