big data for oracle architects

25
7/23/2019 Big Data for Oracle Architects http://slidepdf.com/reader/full/big-data-for-oracle-architects 1/25  n Oracle White Paper in Enterprise  rchitecture August 2 2 Oracle Information Architecture:  n rchitect s Guide to Big Data OR Le

Upload: winit1478

Post on 19-Feb-2018

232 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Big Data for Oracle Architects

7/23/2019 Big Data for Oracle Architects

http://slidepdf.com/reader/full/big-data-for-oracle-architects 1/25

  nOracle White Paper in Enterprise   rchitecture

August

2 2

Oracle Information Architecture:

  n rchitect sGuide to Big Data

OR Le

Page 2: Big Data for Oracle Architects

7/23/2019 Big Data for Oracle Architects

http://slidepdf.com/reader/full/big-data-for-oracle-architects 2/25

A—m

 m

Tm

am

f

tm

Page 3: Big Data for Oracle Architects

7/23/2019 Big Data for Oracle Architects

http://slidepdf.com/reader/full/big-data-for-oracle-architects 3/25

An

Oracle

White

Paper

in

Enterprise

Architecture—informationArchitecture: An

Architect s

Guide to Big Data

What Makes Big Data Different? 4

Comparing Information Architecture Operational Paradigms 6

 ig

Data Architecture Capabilities 7

Storage and Management Capability 7

Database Capability 8

Processing Capability 9

Data Integration Capability 9

Statistical Analysis Capability 9

 ig Data Architecture 1

Traditional Information Architecture Capabilities 1

Adding

 ig

Data Capabilities 1

An Integrated Information Architecture 11

Making

 ig Data Architecture Decisions 13

KeyDrivers to Consider 13

Architecture Patterns

in

Three

Use Cases 14

 ig Data Best Practices 2

1:  lign

 ig

Data with Specific Business Goals 2

2: Ease Skills Shortage with Standards and

Governance

2

3: Optimize Knowledge Transfer witha Center of Excellence 21

 4: Top Payoff is  ligning Unstructured with Structured Data 21

 5 :

Plan Your Sandbox For Performance 22

 6:

 lign

with the Cloud Operating Model 22

Summary 22

Enterprise Architecture and Oracle 23

Page 4: Big Data for Oracle Architects

7/23/2019 Big Data for Oracle Architects

http://slidepdf.com/reader/full/big-data-for-oracle-architects 4/25

AnOracle WhitePaper inErrterpriseArchitecture—information Architecture: AnArchitect s Guide to BigData

Int roduct ion

Ifyour organization is like many, you re capturing and

sharing more

data from

more

sources than ever before. As a result, you re facing the challenge of managing high-

volume and

high-velocity

data

streams quickly

and

analytically.

Big Data is all about finding a needle of value in a haystack of unstructured information.

Companies are now investing in solutions that interpret consumer behavior, detect fraud,

and

even

predict

the

future McKinsey

released

a report in May 2011 stating that leading

companies are using big data analytics to gain competitive advantage. They predict a

60 margin increase for retail companies who

ar e

able to harvest the

power

of big data.

To support

these

new

analytics, IT

strategies

are

mushrooming,

the newest techniques

include brute force

assaults

on massive information

sources,

and filtering

data

through

specialized parallel processing

and

indexing

mechanisms.

The results

ar e

correlated

across

time

and

meaning, and often merged with traditional corporate

data sources.

New

data

discovery

techniques

include

spectacular

visualization tools

and

interactive

semantic

query experiences. Knowledge workers and

data

scientists sift through filtered

data

asking

on e unrelated explorative

question after another.

As these supporting

technologies emerge

from

graduate research

programs into

the

world of corporate IT, IT

strategists, planners, and architects need to both understand them

and

ensure

that

they

are

enterprise

grade.

Planning a Big Data architecture is not about understanding just what is different. Ifs

also about

how to integrate

what s new

to

what

you already

have

- from database-and-BI

infrastructure to ITtools,

and end user

applications.

Oracle s

own product

announcements

in hardware, software,

and

new partnerships have

been

designed to

change

the

economics around Big Data investments

and the

accessibility of solutions.

The real industry challenge is not to think of Big Data as a specialized

science

project,

but rather

integrate it into

mainstream

IT.

Inthis paper, we w discuss adding Big Data capabilities to your overall information

architecture, planning for adoption using

an

enterprise architecture perspective,

and

describe

some key use cases .

Page 5: Big Data for Oracle Architects

7/23/2019 Big Data for Oracle Architects

http://slidepdf.com/reader/full/big-data-for-oracle-architects 5/25

An Oracle White Paper In Enterprise Architecture—InformationArchitecture: An Arch i t ect s Gu i d e to B i g D a ta

Wha t Makes

Big

Data

Different?

Bigdata refers to largedatasets that are challenging to store, search, share,\dsuali2e, and analyze.

At first glance, the orders of magnitudeoutstrip conventionaldata processingand the largestof

data warehouses. For example, an

airline

jet collects 10 terabytes of sensor data for every 30

minutes of flying time. Compare that with conventional high performance computing where

New York Stock Exchange collects 1 terabyteof structured trading data per day. Compare again

to a conventional strucmred corporate data warehouse that is sized in terabytes and petabytes.

BigDatais sized in peta-,

exa-,

and soon perhaps,zetta-bytes And,it s nor justaboutvolume,

the approach to analysis

contends

withdata contentand structure that cannotbe anticipated or

predicted. Tliese analytics andthe science behind themfilter low

value

or

low-density

data to

reveal liigh value or iiigh-densit}

data.

As a

result,

newandoften proprietar) analjTical

techniques are required. BigData has a broad array of interesting architecture challenges.

It is often said that data volume, velocit , and variet define BigData, but the unique

characteristic of

Big

Data is the mannerin

which

tlie

value

is discovered. BigData is

unlike

conventional

business

intelligence, where the

simple

summing of a known value reveals a result,

such as order

sales

becoming year-to-date sales. With Big Data, thevalue is discovered through a

refining modeling process: make

a

h)y>othesis, create statistical,

visual, or

semantic

models,

validate,

then make a new hypothesis. It either takes a person interpreting \dsualizations or

making interactive knowledge-based queries, or bydeveloping

 machine

learning

adaptive

algorithms drat

can

discover meaning.

And in die end, the

algoritlim may

be

short-lived.

The growth of bigdatais a

result

of the increasing channels and

variet)

of data in

today s world.

Some

of

the new data sourcesare user-generated content through sodal media,web and software

logs,

cameras,

information-sensing mobile devices,

aerial

sensoiy

technologies, genomics, and

medical records

Companies

have realized diat

there

is

competitive

advantage in this

information

and thatnow

 

the timeto put this data to work.

Page 6: Big Data for Oracle Architects

7/23/2019 Big Data for Oracle Architects

http://slidepdf.com/reader/full/big-data-for-oracle-architects 6/25

AnOracle White Paper InEnterprise ArchHecture—InformationArchitecture: AnArchitect s Guide to BigData

A Big

Data Use

Case Personalized Insurance Premiums

To begin, let s consider the business opportunit} tliat Big Data brings to a conventional business

process in the insuranceindustry—calculating a competitive and profitableinsurance premium.

In an effort to be morecompetitive, aninsurance company wants to offer their customers the

lowest possible premium, but only to those who are unlikely to make a claim, thereby optimizing

their profits, One way to approach tliis problem is to collectmore detailed data about an

individual s drivinghabits and then assess tlieir risk.

In

fact,

insurance companies arenowstarting to collect data on driving habits utilizing sensors in

theircustomers cars. These sensors capture dri\hng data,suchas routesdriven, miles driven,

time

of

day,and braking abruptness. This data is used to assess driver risk; they compare

individual drivingpatterns with other statistical information, such as average

miles

driven in your

state, and peak hours of drivers on the road. Driver risk plus actuarial information is then

correlatedwith

policy

and profileinformation to offer a competitive and more profitable rate for

the company. The result? A personalized insurance plan. These unique capabilities, delivered

from big data

anal}tics,

are revolutionizing the insuranceindustiy.

To accomplish this task, a great amount of continuous data must be collected, stored, and

correlated. Hadoop is an excellent choice for acquisition and reduction

of

the automobilesensor

data. Master data and certain reference data includingcustomer profile information are

likely

to

be stored in the existing DBMS systems, and a NoSQL databasecan be used to capture and store

reference data that aremore dynamic,

diverse

in

formats,

and

change

frequently. Project R and

Oracle R Enterprise are appropriate choices for analyzing bodi private insurancecompanydata

as wellas data captured from publicsources. And

finally,

loading the MapReduce results into an

existingBI environmentallows for further data explorationand data correlation. With these new

tools,

the

company

isnow able to

addresses

the

storage, retrieval, modeling,

and processing side

of

the requirements.

In thiscase, the traditional business process and supporting data  master, transaction, analytic

data are able to add statistically relevantinformation to their profit model and deliveran industry

innovative result

Ever industr haspertinent examples. Bigdata sources take a variet

of

forms: logs,sensors,

text, spatial, and

more.

As IT strategists, planners, and architects, weknow that our

businesses

havebeen trying to find the relevant information in all thisunstructured data foryears. But the

economic choices aroundgetting to theserequirements have been a barrier untilrecendy.

So,whatis your approach to offer your lineof business a set ofrefined data services that can

help them not just

find

data,

but improve and innovate theircorebusiness processes? What s

die impact to yourinformation architecture? Yourorganization? Your analytical skills?

Page 7: Big Data for Oracle Architects

7/23/2019 Big Data for Oracle Architects

http://slidepdf.com/reader/full/big-data-for-oracle-architects 7/25

An Oracle White

Paper

In Enterprise

Architecture—Information Architecture:

An Architect s

Guide

to Big Data

Comparing

Information Architecture Operational

Paradigms

Bigdata differs from other data realmsin manydimensions. In tiic following table you can

compare and contrastthe characteristics of bigdataalongside dieother data

realms

described in

Oracle s

Information Architecmre

Framework fOlAF).

Data Rea lm St ruc tu re Vo lume

Master D ata Structured Low

Transact ion

Data

Reference

Data

Metadata

Analytical

Data

Document s

and

Conten t

Big

Data

Structured  

structured

Stnjctured  

structured

Medium

-high

L o w

Medium

Structured

Low

Structured

Unstructured

Structured,

s emi

structured,  

uns t ructured

Medium-

High

Medium

- High

High

Description

Enterprise-level

data

entities that

ar e of

strategic value

to

an

organization. Typically non-volatile

and non-t ransact ional In nature.

Business

t ransac t ions tha t

a re

captured

during business

operations and

processes

Internally managed or externally

sourced facts to

support

an

organization s ability to effectively

process transactions,

manage

master

data,

and provide decision

support capabilities.

Defined as  data abou t t he

data.

Used as

an

abstraction

layer for

standardized

descriptions

and

operations, E.g. integration,

intelligence, services.

Derivations

of

t he bu s in e s s

operation and transaction data

used

to satisfy reporting and

analytical

needs.

Documents, digital images, geo-

spatial data, and multi-media files.

Large

datasets that ar e

chal lenging to

store,

search,

share, visualize, and

analyze.

Table 1: Data Realms Definitions (Oracle Information Architecture Framework)

Examples

Customer,

product, supplier,

an d locat ion s i t e

Purchase records,

inquiries,

and

payments

Geo

d at a a nd market d a t a

Data name , data dimensions

or units,

definition

of a data

entity, or a calculation

formula of

metrics.

Data

tha t

res ide in data

warehouses , da ta marts,

an d

other decision support

applications.

Claim forms, medical

images, maps, video

files.

Use r

an d mach i ne

generated

content through

social media, web and

software logs,

cameras,

information-sensing

mobile devices, aerial

sensory

technologies,

and

genomlcs.

These different characteristics have influenced how we capture, store, process, retrieve, and

secure

our

information arcliitectures. As we evolve into Big Data, you can minimize your

arcliitecture riskby

finding

synergies across your investments

allowing

youto leverage your

specialized

organizations and theirskills equipment, standards, and governanceprocesses

Page 8: Big Data for Oracle Architects

7/23/2019 Big Data for Oracle Architects

http://slidepdf.com/reader/full/big-data-for-oracle-architects 8/25

AnOracleWhite Paper InEnterprise Architecture—Information Architecture: An Architect s Guide to Big Data

Here are some capabilitiesthat you can leverage:

Data

Realms Security

Storage

 

Retr ieva l

Master

data

Database RDBMS/

Transactions app SQL

Analytical data user

Metada ta access

Modeling Processing

  Integration

Pre defined

ETL/ELT

CDC

relational or Replication

dimensional Message

modeling

Reference data

Platform XML / xQuery Flexible ETL ELT

security

Extensible

Message

  ocumen t s

and

  onten t

Big Data

- Weblogs

• Senso r s

• Socia l Media

File

system

bas ed

Tiie

system  

database

File System / Free Form

OS ievei

file

Sea rch movemen t

Distributed Flexible Hadoop

FS / noSQL (Key Value)

MapReduce

ETL/ELT,

Message

Consumption

Bi

  Statistical

Tools Operational

Applications

System based

data

consumption

Content

Mgmt

Bi

Statist ical

Too ls

Table 2: Data Realm Characteristics (Oracle information Architecture Framework]

Big Data Architecture Capabilities

Here is a brief outline of BigData capabilities and their primarj technologies:

Storage and Management Capability

Hadoop Distributed File System(HDFS):

• AnApache open sourcedistributed file system http://hadoop.apache.org

• Expected to run on high-performance commodityhardware

• Known for highly scalable storage and automaticdata replication across three nodes for

fault

tolerance

• Automatic data replication across three nodes eliminatesneed for backup

• Write once, readmany times

Cloudera Manager:

• Cloudera Manager is an end-to-end management application for Cloudera s Distribution of

ApacheHadoop, http://www.cloudera.com

• Cloudera Managergives a cluster-wide, real-time view

of

nodes and servicesrunning;

providesa single central place to enact configurationchangesacross the cluster;and

incorporates a

full

rangeof reportingand diagnostic tools to help optimizecluster

performance and utilization.

Page 9: Big Data for Oracle Architects

7/23/2019 Big Data for Oracle Architects

http://slidepdf.com/reader/full/big-data-for-oracle-architects 9/25

An

Oracl e

White

P a p e r

in

Enterprise Architecture—information Architecture

An

Arch i tect s Gu i d e

t o Bi g

Data

Database

Capability

Oracle NoSOL: Click for more information)

• Dynamic and flexible schema design. High performance key value pair

database.

Keyvalue

pairis an

alternative

to a pre-defined

schema. Used

fox

non-predictive and

dynamic data.

• Able to

efficiently

process datawithouta row and column structure.

Major

+ Minorkey

paradigm allows multiple recordreadsin a

single

API

call

Highly scalable

multi-node,

multiple

data center,

fault

tolerant, ACID operations

• Simple programming model, random index reads and

writes

• Not

Only

SQL. Simple pattern queries andcustom-developed solutions to access data

such

asJava APIs.

Apache

HBase:  Click for

more

information)

• Allows random, real time read/write access

Strictly

consistent readsand

writes

• Automaticand configurable shardingof tables

• Automatic

failover

support between

Region

Servers

Apache Cassandra:

  Click for more information)

• Data model offers column indexeswith the performance of log-structured updates,

materializedviews, and built-in caching

• Fault

tolerance

capabilitj isdesigned forever) node, replicating across

multiple

datacenters

• Can choosebetween synchronous or asynchronous replication for eachupdate

Apache

Hive:

  Click for more information)

• Tools to

enable

easy dataextract/transform/load   ETL) from files stored either

direcdy

in

Apache

HDFS or inotlier

data

storage

systems such

as

Apache HBase

Uses

a simple

SQL-like query language called

HiveQL

• Quer) executionvia MapReduce

Page 10: Big Data for Oracle Architects

7/23/2019 Big Data for Oracle Architects

http://slidepdf.com/reader/full/big-data-for-oracle-architects 10/25

AnOracle WhitePaper InEnterprise Architecture—information Architecture: An Architect s Quide to BigData

V

Processing Capability

MapReduce:

• Defined

by Google in

2004 Click here for original

paper|^

^

• Break problem up into smaller sub-problems

• Able

to

dist r ibu te da ta work loads ac ross t l iousands   nodes

• Can be ex po sed via SQL and in SQL-b ased BI tools

Apache Hadoop:

• Leading MapReduceimplementation

• Highly

scalable

parallel batch processing

• Highlycustomizable infrastructure

• Writes multiple copies across cluster for fault tolerance

Data Integration Capability

Oracle Big Data Connectors. Oracle Loader for Hadoop. Oracle Data Integrator:

 Click here for Oracle Data Integration and BigData^

• Exports MapReduce resiolts to RDBMS, Hadoop, and otlier targets

• Connects Hadoop to relational databases for SQL processing

• Includes agrapliical user interface integration designer thatgenerates

Hive

scripts to move

and transform MapReduce results

• Optimized processingwith parallel data import/export

• Can be installed on OracleBigData Appliance or on a generic Hadoop cluster

Statistical Analysis Capability

Open Source Project R and Oracle R

EnteqDrise:

• Programming language for statisticalanalysis Clickhere for Project R)

• Introduced into Oracle Database as a SQL extension to perform high performance in-

database statisticalanalysis  Clickhere for Oracle R Enterprise^

• OracleR Enterprise allows reuseof pre existing R scripts with no modification

Page 11: Big Data for Oracle Architects

7/23/2019 Big Data for Oracle Architects

http://slidepdf.com/reader/full/big-data-for-oracle-architects 11/25

An

Oracle

White

Paper

In

Enterprise Architecture—Information

Architecture: An

Ar chitect s Guide

to Big Data

Big

Data Architecture

In thissection, wewilltake a closer look at the overall architecture for big data.

Traditional Information

Architecture Capabilities

To understand the liigh-level architecture aspects of BigData, let's first reviewa well formed

logical

information arcliitecrure for structured

data.

In the illustration, youseetwo datasources

that useintegration(ELT/ETL/Change Data Capture) techniques to transfer data into a DBMS

datawarehouseor operationaldata store,and then offer a widevarietj- of analytical capabilities to

reveal

the

data.

Someof these

analytic capabilities include:

dashboards, reporting, EPM/BI

applications,

summary^

and

statistical

query, semantic interpretations for textual

data,

and

visualization tools for liigh-density data. In addition,someorganizations haveapplied oversight

and standardization across projects, and perhaps havematured the informationarcliitecture

capability

through managingit at the enterprise level.

«

R e f e r e n c e

 

M a s t e r Dat a

Transaction

D a t a

Enterprise

integration

y.

Analytic

Capabilities

 

01

c . u

o> >> c

E €  

4) 3 £

D> o 0)

5   >

£ w o

n o

Figure 1: Traditional informationArchitectureCapabilities

Tlie key information

architecture

principles

include

treating dataasanasset through a value, cost,

and risklens,and ensuring timeliness, quality, and accuracy of data. And, tlieEAoversight

responsibility

isto

establish

and

maintain

a

balanced governance

approach

including using center

of excellence for standardsmanagement and training.

Adding Big

Data

Capabilities

Tlie defining processing capabilities forbigdata arcliitecture areto meet the

volume, velocity,

variety,

and value requirements.

Unique

distributed

(multi-node)

parallel processing arcliitectures

havebeen

created

to parse these

large

data

sets,

lliere are

differing technology strategies

for

real-time

and batch processing requirements. For

real-time,

key-value data stores, suchas

NoSQL,

allow for liigh performance, index-based

retrieval.

For

batch

processing, a

technique

known

as Map

Reduce, filters

data

according

to a

specific

data

discovery

strategy^

After the

filtered data is discovered, it canbe analyzed directly, loaded into other unstructured databases,

sent to

mobile

de\ ices, or merged into traditional datawarehousing enidronment and correlated

t o s t r u c mr e d

data.

M a c h i n e

Genera ted

Text, Image,

Audio

Video

y.

Distributed

File

System

Key Value

D a t a S t o r e

y

Figure2; BigData InformationArchitecture Capabilities

Map Reduce

y

Sandboxes

Data

W a r e h o u s e

y.

Analytic

Capabilities

 

Page 12: Big Data for Oracle Architects

7/23/2019 Big Data for Oracle Architects

http://slidepdf.com/reader/full/big-data-for-oracle-architects 12/25

AnOracle White

Paper

InEnterprise Architecture—information Architecture; An Architect s Guide to Big Data

In additionto newunstructureddata realms there are two keydifferences for big data. First, due

to the sizeof the data sets, we don t move the raw data directly to a data warehouse. However,

after MapReduce processing wemay integrate the reduction result into the datawarehouse

environment so tliat we can leverageconventional BJ reporting, statistical, semantic, and

correlation capabilities. It is idealto have analjTic capabilities that combine a conventional BI

platform alongwith big data \isualization and quert capabilities. And second, to facilitate

analysis in the Hadoop environment, sandbox environments can be created.

For manyuse cases,big data needs to capture data that is continuously changing and

unpredictable. And to analyzetliat data, a new architecture is needed. In retail, a good example

is capturing real time foot trafficwith the intent of deliveringin-store promotion. To track the

effectiveness of floor displays andpromotions, customermovement and behaviormust be

interacdvely exploredwith visualization or querytools.

In other usecases the analysis cannot be complete untilyou correlate it withother enterprise

data - structureddata. In the example

of

consumer sentiment

analysis

capturinga positiveor

negative

social media

comment has some

value

but associating it witli yourmost or least

profitable customer makesit farmore valuable. So, tlie needed

capabilit}

widiBigData BI is

context and understanding. Using powerful statisticaland semantic tools allowyou to find die

needlein the haystack and

will

help youpredict the future.

In

summar}

the BigData architecture challenge is to meet the rapid use and rapid data

interpretation requirements whileat the sametime correlating it with other data.

W^iat s important is that the key information arcliitecturc principles aredie same,but the tactics

of

applying these principles differ. For example, how do welook at big data asan

asset.

Weall

agreethere s valuehidingwithinthe massive liigh-densit} data set.But how do weevaluateone

set of big data against the other?How doweprioritize? The keyis to think in termsof the end

goal. Focus on the business values and understand how criticalthey are in support of the

business decisions, as well as the potential

risks

of not knowing the liiddenpatterns.

Another

example

of

applying

architecture principles differently is

data

governance. The quality

and

accuracy

requirements of bigdata can

vaty^

tremendously. Usingstrict data precisionrules

on user sentiment data might filter out too much useful information, whereas data standards and

common definitions are stillgoingto be critical for fraud detections scenarios.

To reiterate, it is important to leverage yourcore information arcliitecture principles and

practices, but apply them in a wat that s relevantto big data. In addition, the EA responsibility

remains

the

same

for big

data.

It is to

optimize success centralize

training and establish

standards

An

Integrated Information Architecture

One

of

the obstacles observedin Hadoop adoption in enterpriseis the lackof integrationwith

existing

BI

eco-system.

At present, the

traditional

BIand bigdata

ecosystems

are separate

causing

integrated dataanalysis

headaches.

Asa

result

they are not ready for usebytlietypical

business user  

executive

Page 13: Big Data for Oracle Architects

7/23/2019 Big Data for Oracle Architects

http://slidepdf.com/reader/full/big-data-for-oracle-architects 13/25

An Oracle White Paper in Enterprise Architecture—Information Architecture: An Architect s Guide to Big Data

Earlier adoptersof bigdatahave often times written customcodeto movedie processed

results

of bigdatabackintodatabase or developed customsolutions to report and

analyze

on them.

Tliese

options

might

not be

feasible

and economical for enterprise IT. First of all, it

creates

proliferations of one-off

code

anddifferent standards. Architecture

impacts

IT

economics.

Big

Datadone independendy runs the riskof redundant investments. In addition, mostbusinesses

simply do not havethe staffandskill level for suchcustomdevelopment

work.

Abetter option is to incorporate theBig Data results into dieexisting Data Warehousing

platform. Tliepower of

information

lies inour abilit) to make associations andcorrelation.

What

weneedis the abilit) to bringdifferent datasources, processing requirements togetherfor

timely

and valuable analysis.

HereisOracle s holistic capability map that

bridges

traditional information architecture and big

data arcliitecture:

Specialized

Hardware

Acquire

Big

Data

Clu s t e r

ETL/ELT

ChangeDC

R ea T im e

Message-

Based

Hadoop

(MapReduce)

High

Speed

Network

Figure 3: Oracle Integrated InformationArchitectureCapabilities

Organize \ Analyze

In Da taba se

Analytics

RDBMS

Clu s t e r

Decide

Rcporting

Dashbo a rd s

Aler1ing

Re c o m m o n d a t i o n s

EPM, Bl.

Social

Applications

Text Analytics

a nd S e a rc h

  dvanced

Analytics

Interact ive

Discovery

In Memory

Analytics [

Imi l I anGf

Asvarious data arecaptured, they canbe stored and processed in traditional

DBMS,

simple files,

or distributed-clustered systems suchasNoSQLand HadoopDistributedFile System HDFS),

Architecturally, the critical component tiiat breaksthe di\nde is the integrationlayerin the

middle. Tliisintegration layer

needs

to extendacross all of the datatypes and domains, and

bridge the gapbetween the traditional andnewdataacquisition and processing

framework.

The

data integration

capability needs

to cover theentirespectrumofvelocity and frequency. It needs

to handleextremeand ever-growingvolumerequirements. And it needs to bridge the variety of

data structures. Youneed to look fortechnologies that

allow

youto integrate Hadoop /

MapReduce with your data warehouse

and

transactional data

stores in a

bi-directional manner.

The nextlayer iswhereyou

load

the reduction-results fromBig Dataprocessing output into

your data warehouse for further analysis. You

also

need the

ability

to access your

structured

data

Page 14: Big Data for Oracle Architects

7/23/2019 Big Data for Oracle Architects

http://slidepdf.com/reader/full/big-data-for-oracle-architects 14/25

An Oracle White Paper in Enterprise Architecture—Information Architecture; An Architect s Guide to Big Data

such as customer profileinformationwhileyouprocess throughyour bigdata to look for

patterns such as detecting fraudulent activities.

The BigData processingoutput will be loadedinto tlie traditional ODS data warehouse and

data marts for fiarther

analysis

just as the transactiondata.The additionalcomponent in this

layeris the Complex Event Processingengine to

analy2e

stream data in real time.

The Business Intelligencelayerwillbe equipped with advanced analytics in-database statistical

analysis

and advancedvisualization on top of the traditionalcomponents such as reports

dashboards and queries.

Governance security and operationalmanagement alsocover the entirespectrumof data and

informationlandscape at the enterpriselevel

With this architecture the businessusers do not seea

divide

They don t evenneed to be made

awarethat there is a differencebetween traditional transaction data and BigData. The data and

analysis flowwouldbe seamless as theynavigate through variousdata and information sets test

hypothesis analyzepattems and make informed decisions.

Making Big Data Architecture Decisions

InformationArchitecture is perhaps the most complexarea of IT. It is the ultimate investment

payoff

Today s economicenvironmentdemands that business be drivenby

useful

accurate and

timely

information. And the world of BigData adds another dimension to the problem.

However there are

always

business and IT tradeoffs to get to data and information in a most

cost-effectiveway.

Key Drivers to

Consider

Hereis a summary of variousbusiness and IT drivers you need to considerwhen makingthese

architecture choices

BUSINESS DRIVERS

Better insight

Fas te r tum around

Accuracy

and timeliness

IT

DRIVERS

Reduce

storage cost

Reduce da ta

movemen t

Faster t ime to market

Standard ized toolse t

Ease of management and

operation

Security

and governance

 

Page 15: Big Data for Oracle Architects

7/23/2019 Big Data for Oracle Architects

http://slidepdf.com/reader/full/big-data-for-oracle-architects 15/25

An Oracle White Paper In Enterprise Architecture—Information

Architecture:

An Architect s Guide to Big

Data

Architecture

 atterns in Th re e U se

 ases

The businessdrivers forBigData processingand analytics are present in ever) industn and

will

soon be essentialin the deliver of new ser\ ices and in the analysis of a process. In some cases,

the) enable newopportunities withnewdata sources

tlaat

wehavebecome

familiar

with, suchas

tapping into the apps on our mobiledevices, location ser\ ices, socialnetworks,and internet

cotnmerce. Industr) nuances on devicegenerateddata canenable retnote patient monitoring,

personal fitness activit , driving behavior, location-based storedmovement, and predicting

consumer behavior. But, BigData also offers opportunities to refine conventional enterprise

business processes, suchas textand sentiment-based customerinteraction through

sales

and

»servicewebsitesand callcenter functions, human resource resume analysis, engineering change

management fromdefectthroughenhancement in product

lifecycle

management,

factor

automation and

qualit

management in manufacturingexecution

systems,

and

many more.

In

tliis

section, wewill explore threeuse

cases

and walk tlirough the arcliitecture

decisions

and

technology components. Case

 

Retail-weblog analysis. Case 2:Financial Services-real-time

t ransact ion detec tion. Case

3; Insurance-unstructured

an d

structured data

correlation.

Use

Case

 1 :

Initial Data

Exploration

Tlie

first

example weare going to lookatis from the

retail

sector. One of nation s leading

retailers had disappointing results fromitsweb channels diuing theCliristmas season and is

looking to improve customers ex]>erience

witli

dieironline shop]5ing site. Some of the potential

areas to

investigate

include theweb

logs

andproduct/site

reviews

for shoppers. Itwould be

beneficial to understand the navigation pattern, especially relatedto abandoned shoppingcarts.

The

business needs to determine the value

of

these data versus the cost before making a major

investment .

This retailer s IT department

faces

challenges in a numberof

areas:

skill

set (or the

lack

thereof

for thesenew setsof dataand processing powerrequirements toprocess such large volume.

Traditional SQL toolsarepreferred choices forbusiness and IT,however, it isnot economically

feasible to load allthe data into a relational databasemanagementplatform.

HDFS

Figure 4: Use Case  1: InitialData Exploration

SQL

Too l s

14

Page 16: Big Data for Oracle Architects

7/23/2019 Big Data for Oracle Architects

http://slidepdf.com/reader/full/big-data-for-oracle-architects 16/25

AnOracleWhite

Paper

InEnterprise Architecture—InfomallonArchitecture: An Architect s Guide to Big Data

As die diagram aboveshows, tliisdesignpattern caUs formounting theHadoop Distributed File

Systemthrough a DBMS system so that traditional SQL tools can be used to explore the dataset

The

key benefits include:

• No data movement

• Leverageexisting investments and skills in database and SQL or BI tools

Oracle

Big Data

Appliance

Oracle

Big Data

Connecto rs

Oracle SQ L

Developer

 any

SQL tools

OBJEE

 B ig

DataAnalysis

Figures:

Use Case

 1: Architecture Decisions

The diagram aboveshows the logical architectureusingOracleproducts to meet these criteria.

The

components

inibis

architecture:

• OracleBigData Appliance (orodier Hadoop Solutions):

o Poweredby the

full

distributionof Cloudera sDistribution including ApacheHadoop

(CDH) to store logs, reviews and other related big data

• Oracle BigData Connectors:

o Create optimized data sets for efficient loading and analysis in Oracle Database

llg

and

OracleEnterprise R

• Oracle Database

llg:

o ExternalTable:A feature inOracledatabase to present data stored in a

file

systemin a

table format and can be used in SQLqueries transparently.

• Traditional

SQL

Tools;

o OracleSQLDeveloper: Development toolwithgrapliic user interface tliat

allows

users

to access data stored in a relational database using SQL.

 

Page 17: Big Data for Oracle Architects

7/23/2019 Big Data for Oracle Architects

http://slidepdf.com/reader/full/big-data-for-oracle-architects 17/25

An Oracle White Paper In Enterprise Architecture—(nformation Architecture: An Architect s Guide to Big Data

o BusinessIntelligence tools such asOBIEE can be alsoused to access data through

Oracle Database

In

summary,

the key architecture choice in tliis scenario isto avoid datamovement, minimize

processing requirement and investment, untilafter the

initial

investigation. Through this

architecture, the retailer mentioned above is able to access Hadoop data directly through

database and SQL interface for the initial exploration.

Use

Case

 2 : Big Data

fo r

Complex Event

Processing

The second use case is relevant to financial servicessector. Large financialinstitutions playa

critical

role in detecting financial criminal and terroristactivity. However, their ability to

effectively meettheserequirements areaffected by tlieirIT departments

ability

tomeet the

followingchallenges:

• The expansion of anti-money laundering

laws

to

include

a growing numberof activities,

suchas gaming, organized crime, drug trafficking, and the financing of terrorism

• The ever

growing volume

of information that must be captured, stored,and

assessed

• The

challenge

of correlating datain disparate formats fromanmultitude of sources

TlieirIT systems needto provide abilities to automatically collect andprocess large volumes of

data froman

array

of sources

including

Currenq Transaction Reports

  CTRs), Suspicious

Activit}

Reports

  SARs), Negotiable Instrument Logs

  NILs),

Internet-based acti\ it) and

transactions, and much

more.

Rea l

Time

NoSQL

 at ch

HDFS

Streaming

(CEP

Engine)

Figure6: Use Case  2: BigData for Complex Event Processing

  ler t s

 PEL

Mobi le

Dashbo a r d

Database

16

Page 18: Big Data for Oracle Architects

7/23/2019 Big Data for Oracle Architects

http://slidepdf.com/reader/full/big-data-for-oracle-architects 18/25

AnOracle WhitePaper InEnterprise Architecture—Information Architecture; AnArchitect s Guide to 8lg Data

The idealscenariowould have been to include allhistoricprofilechanges and transactionrecords

to best determine tlie rate of risk for each of the accounts, customers, counterparties, and legal

entities,at various levels

of

a^regation and hierarchy. However,it was not traditionally possible

due to constraints in processingpower and cost of storage. With HDFS, it is now possible ta^

incorporate

all

the

detailed

data points to

calculate

such risk

profiles

and send to the CEP enginj|

to establish th e

basis

fo r die risk model

l^oSQL database in this

scenario

willcapture and store lowlatenq andlarge

volume

of data

fromvarious sources in a flexible datastructure, as wellas providereal-time dataintegrationwitL :

 Complex Event Processengine to enable automaticalerts, dashboard, and trigger business

process to take appropriate actions.

I

Orac le

Big Data

i

Appliance

I

with

  doop

;

and NoSQL

Streaming

  CE P

Engine

Orac le

EDA/SOA Su it e

\u A

BAM

Dashboa rd s

BA M

  ler t s

<

 P EL

P r oc e s s e s

Rgure 7: Use Case U2: Architecture Decisions

The logical diagram above highlights the following main components of this architecture:

• Oracle BigData Appliance or other Hadoop Solutions):

o Powered by the fulldistribudon

of

Cloudera s Distribution including Apache Hadoop

  CDH) to store logs,reviews, and other relatedb^ data

o NoSQL to capture low latenc) data with flexible data structure and fast querying

o MapReduce to process large amount of data for reduced and optimized dataset to be

loaded into database managementsystem

Oracle

EDA:

o Oracle CEP: Streaming complex event engine to continuously process incoming data,

analyze and evolve patterns, and raise events if suspicious activitiesare detected

17

Page 19: Big Data for Oracle Architects

7/23/2019 Big Data for Oracle Architects

http://slidepdf.com/reader/full/big-data-for-oracle-architects 19/25

An Oracle White Paper In Enterprise Architecture—information Architecture: An Architect s Guide to Big Data

o Oracle BPEL: BusinessProcess Execution Languageengine to define processes and

appropriate actionsbased on the event raised

o Oracle

BAM:

Real-time businessactivit) monitoringdashboards to provide immediate

insight and generate actions

In summar , the

key

principle of this architecture isto integrate bigdata with event

driven

architectureto meetcomplex regulator} requirements. Althoughdatabasemanagementsystems

arenot included in tliis architecture depiction, it isexpected that raised events and fiarther

processing

transactions

and records

will

be stored in the database

either

as

transactions

or for

future analytical requirements.

Us e Case  3: Big

Data

fo r Combined Analytics

Tlie third use caseis to continue our discussion

of

the insurance company mentioned in the

earlier section of thispaper. In a nutshell, tlieinsurance gianthas a need tocapture the

large

amount of sensor data that tracktheir customers drivinghabits, store diem in a cost effective

manner, process diis data to determine

trends

and identify

patterns,

and to

integrate

end

results

with existing transacrional, master, and reference data

they

are

already

capturing.

Haaoop

luttftfw

MapRt< iceiear

a a n a o r ,

raathar,

loca ion)

Big Data Cennactor

Adaancatf

AnalyUea

Rgure 8: Use

Case

 3: Data FlowArchitecture Diagram

Haatar Dau

Rapoat t oi y

  Cuat ot nar

Damographle)

 

Tranaacl lon

Syal ama

  pramlum,

el ahna)

IS -ifi

u HMM

aPK

Data

Waraheuaa

Maits

Traditional Raporting an d Alart*

The

key

architecture

challenge of diis

arcliitecture

isto

integrate Big Data

with

structured

data.

Page 20: Big Data for Oracle Architects

7/23/2019 Big Data for Oracle Architects

http://slidepdf.com/reader/full/big-data-for-oracle-architects 20/25

AnOracle White Paper InEnterprise

/^hliecture—Information

Architecture: An Architect s Guide to Big Data

The diagrambelowis a high-level conceptualviewthat reflects these requirements.

Rgure 9: Use Case  3: Conceptual Architecture for Combined Analytics

The large amount of sensor data needs to be transferred to and stored at the centralized

environment that provides flexible data structure, fastprocessing, aswellas

scalability

and

parallelism. MapReduce functionsare needed to process the low-density data to identify patterns

and trending insights. Tlieendresults need to be integrated into thedatabase management

system with structured data.

Oracle Big Data

Appliance

OptlmizedforHadoop

R. and

NoSQL

Processing

Orac le

Big

Data

  onnec t o r s

imband

Orac le

 xada ta

 System of

Record

Optimized

for DW/OLTP

Orac le

Exalytics

Optimized for

Artalytlcs

Workload

Stream

|

 cquire

|

Organize

|

 nalyze

and

 isualize

Figure10: Use Case  3: Physical Architecture for Combined Analytics

Leveraging Oracleengineered systems includingOracle Data Appliance, OracleExadata, and

Oracle Exalytics reducesimplementation

risks

provides fast time to valueand extreme

performanceand scalabilit} to meet a complexbusinessand IT challenge.

Page 21: Big Data for Oracle Architects

7/23/2019 Big Data for Oracle Architects

http://slidepdf.com/reader/full/big-data-for-oracle-architects 21/25

An

Oracle

White

Paper

in

Enterprise

Architecture—information Architecture: An

  rchitect s Guide

to Big Data

The keycomponents of this architecture include:

• OracleBigData Appliance (or other Hadoop Solutions):

o Poweredbythe

full

distributionof Cloudera s DistributionincludingApacheHadoop

(CDH) to store logs, reviews and other relatedbig data

o NoSQL to capturelowlatency datawith flexible data structure and fast

querying

o MapReduce to process large amount of data for reduced and optimizeddataset to be

loaded into database management system

• OracleBigData Connectors: Provides an adapter for Hadoop thatintegrates Hadoopand

OracleDatabase througheasyto usegraphical userinterface.

• OracleExadata: Engineered databasesystemthat supportsmixedworkloads for

outstanding

performance

of the

transactional

and/or datawarehousing environment for

further combined analytics.

• Oracle

Exalytics: Engineered

BI system that provides speed-of-thought analytical

capabilities to end users.

• Intiniband: ConnectionsbetweenOracleBigDataAppliance, OracleExadata, andOracle

Exal5mcs

arevia InfiniBand enabling high speed datatransfer forbatchor query

workloads.

Big

Data

Best Practices

Herearea

few

general

guidelines

to build a

successful

bigdataarchitecture

foundation:

 1: Align BigData with Specific Business Goals

The key intentof Big Datais to find

hidden

value - value through intelligent

filtering

of

low

density and

high

volumes of

data.

Asan architect be prepared to advise yourbusiness onhow

to apply bigdata techniques to accomplish their goals. Forexample understand howto filter

weblogs to

understand eCommerce

behavior

derive sentiment

from

social

media andcustomer

support interactions understand statistical correlation

methods

and their relevance for customer

product manufacturing or engineering data. Even

though

Big

Dataisa newer IT frontier and

there is an obviousexcitementto master somethingnew,it is important to base newinvestments

in skills

organization

or infrastructure

with

a strong

business driven

context to guarantee

ongoing project investments

and funding.

To know ifyou

are

on the right track

ask

yourself

howdoesit supportand enable yourbusiness architecture and top IT priorities?

 2:

Ease

Skills

Shortage

with

Standards and Governance

 

Page 22: Big Data for Oracle Architects

7/23/2019 Big Data for Oracle Architects

http://slidepdf.com/reader/full/big-data-for-oracle-architects 22/25

AnOracle White Paper InEnterprise Architecture—Information Architecture: AnArchitect's Guide to BigData

McKinsej' Global Institute^ wrote that one of the bluest obstacles for big data is a skills

shortage. With the accelerated adoption of deep analytical techniques, a 60 shortfall is

predictedby 2018. Youcanmitigate tliis riskby ensuringthat BigData technologies,

considerations, and decisions arc added to your IT governanceprogram. Standardizing your

approachwillallow you to manageyour costs and best leverage your resources. Another strateg}

to consider is to implementappliances tliatwould provideyouwith a jumpstartand quickertime

to value asyou grow your in-house expertise.

 3: Optimize Knowledge Transfer with a

Center

of

Excellence

Use a center of excellence (CoE) to share solutionknowledge, planningartifacts, oversight,and

managementcommunications for projects. Wlietherbigdata is a newor expandinginvestment,

tlie soft and hard costs can be an investment sharedacross the enterprise. Another benefit from

the CoE approachis diat it

will

continue to drive the bigdata and

overaU

information

arcliitecture maturit)'in a more structured and systematic way.

 4:

Top

Payoff is Aligning Unstructured with Structured Data

It is certainly valuable to analj'ze BigData on its own. However, by connectinghighdensityBig

Data to the structureddatayouare

already

collecting can bringevengreaterclarity. For example,

there is a differencein distinguisliing allsentiment from that of only your besl customers.

Whetheryouare capturingcustomer, product, equipment,or environmentalBigData, an

appropriate goalis to add more relevantdata points to yourcore master and analytical summaries

and leadyourself to better conclusions. For these reasons, many see BigData as an integral

extensionof yourexistingbusinessintelligence and datawarehousing platform.

Keep inmind that the BigData analytical processes andmodels can be humanand machine

based. Tlie BigData analytical capabilities includestatistics, spatial, semantics, interactive

discovery, and visualization. Tlieyenableyour knowledge workers and new analytical models to

correlate different tj'pes and sources of data, to make associations, and to makemeaningful

discoveries. But all in all considerBigData both a pre-processor andpost-processor of related

transactional data,and leverage your prior investments in infrastructure,platform,BI andDW.

'

McKinsey

Global Institute,

May 2011

The challenge—and opportunitj'—of 'bigdata',

https://www.mckinseyquarterly.com/The

challenge

and

opportunity

of big data

2806

2

Page 23: Big Data for Oracle Architects

7/23/2019 Big Data for Oracle Architects

http://slidepdf.com/reader/full/big-data-for-oracle-architects 23/25

An OracleWhite Paper in EnterpriseArchitecture—InformationArchitecture: An Architect sGuide to Big Data

 5 : Plan Your

  andbox

For

Performance

Discovering meaning in your data is not

always

straightforward. Sometimes,we don't even know

whatwe are lookingfor

initially.

That's completely expected. Managementand IT needs to

support this lack

of

direction or lack of clearrequirement. So, to accommodate the

interactiveexploration

of data and the experimentationof statistical algorithmsweneed high

performancework

areas.

Be sure that 'sandbox' environmentshavethe power theyneed andare

properly governed.

 6: Align with the Cloud Operating Model

BigData processes and users require

access

to broad array of resources for both iterative

experimentation and runningproduction jobs. Data

across

thedata

realms

(transactions, master

data, reference, summarized ispart of a BigData solution. Analytical sandboxes shouldbe

created on-demandand resourcemanagementneeds to have a control of the entiredata

flow

frompre-processing, integration, in-database summarization, post-processing, and analytical

modeling.

A

well

planned private and

public

cloudprovisioning and security

strategy plays

an

integralrole in supporting tJiese changingrequirements.

Summary

BigDataishere. Analysts and research organizations have madeit

clear

thatmining machine

generated datais

essential

to future success. Embracing new technologies and

techniques

are

always

challenging but as

arcliitects

youare

expected

to provide a fast reliable pathto business

adoption.

Asyou

explore

the Vhat's new'across the spectrum of

Big

Data capabilities we

suggest

that you

thinkabout theirintegration intoyour existing infrastructure andBI investments. As

examples

align newoperational andmanagement capabilities withstandard IT,build for enterprise

scale

and

resilience

unifj'your database and developmentparadigms asyouembraceOpen Source,

and sharemetadata whereverpossiblefor both integration and

analytics.

Lastbut not least expandyour IT governance to includea BigData centerof excellence to

ensurebusinessalignment, growyour

skills

manageOpen Sourcetoolsand technolgoies, share

knowledge, establish standards, and to

manage

best practices.

Formoreinformation about

Oracle

and

Big

Data,visit

www.oracle.com/bigdata.

You can listen to Helen Sun discussthe topicsin thiswhite paper at thiswebcast. Selectsession

6,Conquering BigData. Clickhere.

 

Page 24: Big Data for Oracle Architects

7/23/2019 Big Data for Oracle Architects

http://slidepdf.com/reader/full/big-data-for-oracle-architects 24/25

An Oracle White Paper InEnterprise Architecture—Information Architecture; An Architect s Guide to Big Data

Enterprise

Architecture

and

Oracle

Oracle hascreated a streamlined and repeatable processto

facilitate

the development of your big

data

architecture vision

Gove rnance

 us ines s

 rchitecture

  Enterprise

  rchi tec ture

Repository

Roadmap

Future

Sta t e

  rchi tecture

,

Vision

,

Current

Sta te

The

Oracle

Architecture Development Process

divides

the development of architecture into the

phaseslistedabove. OracleEnterpriseArchitects and InformationArchitects use this

methodology to propose solutions and to implementsolutions. This process leverages many

planningassetsand reference architectures to ensure ever} implementation follows Oracle s best

experiences and practices.

For

additional

white

papers

on the Oracle Architecture Development Process OADP). the

associated OracleEnterpriseArchitecture Framework OEAF). read aboutOracle s experiences

in enterprise

architecture projects,

and to participate in a

community

of enterprise

architects,

visit

the www.oracle.com/goto/EA

To understandmore about Oracle s enterprise architecture and informationarchitecture

consulting

services, please visit,

www.oracle.com/goto/EA-Services.

Watchour webcaston BigData, by clicking here.

  3

Page 25: Big Data for Oracle Architects

7/23/2019 Big Data for Oracle Architects

http://slidepdf.com/reader/full/big-data-for-oracle-architects 25/25

O

A

i

Em

O

A

A

A

O

W

9

R

WW

P

F

t

C

CT

wMwWm

pWmm

b

O

b

 XC

AO

Hw