big data & oracle technologies

26
BIG DATA & ORACLE TECHNOLOGIES KIEV OCT 2013 PRACTIC CONSULTING Alliance of Professional IT & Management Consultants HTTP://PRACTIC-CONSULTING.COM

Upload: aleksey-movchaniuk

Post on 01-Jul-2015

357 views

Category:

Data & Analytics


2 download

TRANSCRIPT

Page 1: Big Data & Oracle Technologies

BIG DATA & ORACLE

TECHNOLOGIES

KIEV

OCT 2013

PRACTIC CONSULTINGAlliance of Professional IT & Management Consultants

HTTP://PRACTIC-CONSULTING.COM

Page 2: Big Data & Oracle Technologies

Agenda

• ABOUT BIG DATAWHAT

• INDUSTRY EXAMPLES OF BIG DATAWHEN

• ORACLE NO SQL

• ORACLE R

• ORACLE ENDECA

HOW

Page 3: Big Data & Oracle Technologies

WHAT IS BIG DATA?

PART I

Page 4: Big Data & Oracle Technologies

What Is Big Data?

Big Data – is data that becomes large

enough that it cannot be processed

using conventional methods

Big Data – is the new generation

of data warehousing and

business analysis systems

010101101010100101010101010101010010101010100101010101001010101010010101010101010100101010100101010101010101001010101010010101010101010010101001010101001010101001010101001010101010101001010101010100101010100101010100101010100101010101001010101001010101001010101010010101010100101010010101001010101001010101001010100101010010101010010101010010101010010101010010101001010101010101010101010101010010101010010101010010101010010101010010101010010101010010101010010101010010101010010101010010101010010100101010100101010100101010

Page 5: Big Data & Oracle Technologies

A Wider Variety of Data

Internet

Data

Clickstream

Social media

Social media stream

Web site logs

Research

Data

Experiments

Observations

Surveys

Marketplace data

Healthcare

Data

Treatment data

Telehealth

National Electronic Health Records

Procedures

Image

Data

Image

Video

Satellite image

Surveillance

Device

Data

RF Devices

Sensors

EDI

Telemetry

Page 6: Big Data & Oracle Technologies

Why Is Big Data Important?

Big Data - Just another buzzword

or powerful business & science enabler?

SQL Analytics

• Count

• Mean

• OLAP

Descriptive Analytics

• Univariatedistribution

• Central tendency

• Dispersion

Data Mining

• Association rules

• Clustering

• Feature extraction

Predictive Analytics

• Classification

• Regression

• Forecasting

• Spatial

• Machine Learning

• Text Analytics

Simulation

• Monte Carlo

• Agent-based modeling

• Discrete event modeling

Optimization

• Linear Optimization

• Non-Linear Optimization

Business Intelligence Advanced Analytics

Page 7: Big Data & Oracle Technologies

INDUSTRY EXAMPLES

OF BIG DATA

PART II

Page 8: Big Data & Oracle Technologies

Marketing & Sales + Big Data

TO DELIVER AN ANSWER

100 milliseconds

COUNT OF ADS

100,000 per SECOND

http://www.dataxu.com/

ADVERTISING

PLATFORMClickstream, Behavior

Page 9: Big Data & Oracle Technologies

Retail + Big Data

CAPTURE

1,000 tweets per SECONDS

INCREASE OF DATA

+10 TB per DAY

http://www.walmart.com/

WAL-MART ONLINE

MARKETINGSocial Media

Page 10: Big Data & Oracle Technologies

Health Care + Big Data

INCREASE OF DATA EACH MONTH

+10 TB

PATIENTS INVOLVED

10,000

https://cghub.ucsc.edu/index.html/

CANCER GENOMICS

HUBDNA and RNA data

Page 11: Big Data & Oracle Technologies

Science + Big Data

SEVEN TELESCOPES CAPTURE

2 MB per SECOND

IN NEXT 10-15 YEARS ALL

TELESCOPES WILL RECEIVE

30 TB per SECOND

http://www.skatelescope.org/

THE CATALOG OF

UNIVERSEData from Telescope

Page 12: Big Data & Oracle Technologies

ORACLE TECHNOLOGIES

PART III

Page 13: Big Data & Oracle Technologies

Oracle NoSQL

Hadoop Distributed File

System (HDFS)Oracle NoSQL Database

File System Database

Parallel scanning Indexed storage

No inherent structure Simple data structure

High volume writesHigh volume random reads

and writes

Batch Oriented Real-Time

Big Data Storage Choices

Page 14: Big Data & Oracle Technologies

Oracle NoSQL

• RDBMS

– High value, high density,

complex data

– Complex data relationships

– Schema-centric

– Designed to scale up & out

– Lots of general purpose

features/functionality

High overhead ($ per

operation)

• NoSQL architectures

– Low value, low density, simple

data

– Very simple relationships

– Schema-free, unstructured or

semi-structured data

– Distributed storage and

processing

– Stripped down, special

purpose data store

Lower overhead ($ per

operation)

Page 15: Big Data & Oracle Technologies

Oracle NoSQL

Simple Data Model

Small, distributed footprint

Highly scalable, available

Transparent load balancing

Integrates with Oracle Stack

Application

Storage NodesDatacenter B

Storage NodesDatacenter A

NoSQL Database Driver

Application

NoSQL Database Driver

A Distributed, Scalable Key-Value Database

Page 16: Big Data & Oracle Technologies

Oracle NoSQL

Key-value pairs

• Simple data model – key-value pair (major+minor-key paradigm)

• Simple operations – read/insert/update/delete, RMW support

• Scope of transaction – records within a major key, single API call

• Unordered scan of all data (non-transactional)

userid

addresssubscriptions

email idphone #expiration date

Major key:

Sub key:

Value:

Strings

Byte Array

Page 17: Big Data & Oracle Technologies

Oracle NoSQL

On Line Display Advertising

Page 18: Big Data & Oracle Technologies

Oracle NoSQL

Getting Started with Oracle NoSQL DB

1. Download from OTN:

www.oracle.com/technetwork/products/nosqldb/downloads/index.html

2. Review Quick Start & Getting Started Guide

3. Review Programmatic API Guide

4. Start writing Java code

Page 19: Big Data & Oracle Technologies

What is R?

• R is an Open Source language and

environment for statistical computing

and graphicshttp://www.R-project.org/

• Started in 1994 as an Alternative to

SAS, SPSS & Other proprietary

Statistical Environments

• The R environment– R is an integrated suite of software facilities for data

manipulation, calculation and graphical display

• Around 2 million R users worldwide– Widely taught in Universities

– Many Corporate Analysts know and use R

• Thousands of open sources R

packages to enhance productivity such

as:– Bioinformatics

– Spatial Statistics

– Financial Market Analysis

– Linear and Non Linear Modeling

Page 20: Big Data & Oracle Technologies

Why statisticians/data analysts use

R?

R environment is ..

• Powerful

• Extensible

• Graphical

• Extensive statistics

• OOTB functionality with

many ‘knobs’ but

smart defaults

• Ease of installation and use

• Free

Page 21: Big Data & Oracle Technologies

Limitations of R

• R is a client and server bundled together as 1 executable

– Single user tool, like Excel

– Single-threaded

– Cannot leverage multi-CPU capacity without use of special

packages and coding

• R requires data to be loaded into memory first

– Loading data may not be a limitation given RAM available on

laptops/desktops

– R’s call by value semantics means that as data flows into functions,

for each function invocation, a complete copy of the data is made

– As a result you can quickly run into memory limits

Page 22: Big Data & Oracle Technologies

Oracle R Connector for Hadoop

• Provides transparent access to Hadoop Cluster, which

consists of MapReduce and HDFS-resident data

• R users not required to learn new language or interface to

work with Hadoop

• R users can execute jobs on a Hadoop cluster without

requiring knowledge of Hadoop internals, Hadoop CLI, or

IT infrastructure

• Ability to leverage open source contributed R packages to

work on HDFS-resident data

Page 23: Big Data & Oracle Technologies

Oracle R Enterprise

• Provides familiar R environment to operate on database-

resident data

• Overloads base R functions for scalable execution in

Oracle Database

– Automatically generates SQL from R and submits query to

database

– Leverages table parallelism where applicable

• Enables embedded execution of R scripts at Oracle

Database server

– Provides database-controlled data-parallel execution framework

– Enables leveraging CRAN open source R packages

• Enables integration of structured results and graphics with

OBIEE dashboards and BI Publisher documents

Page 24: Big Data & Oracle Technologies

Oracle R Links

• Blog: https://blogs.oracle.com/R/

• Forum: https://forums.oracle.com/forums/forum.jspa?forumID=1397

• Oracle R Distribution: http://www.oracle.com/technetwork/indexes/downloads/r-distribution-1532464.html

• ROracle: http://cran.r-project.org/web/packages/ROracle

• Oracle R Enterprise: http://www.oracle.com/technetwork/database/options/advanced-analytics/r-enterprise

• Oracle R Connector for Hadoop: http://www.oracle.com/us/products/database/big-data-connectors/overview

Page 25: Big Data & Oracle Technologies

Other Oracle Big Data Products

Oracle Endeca Information Discoveryhttp://www.oracle.com/us/solutions/business-analytics/business-

intelligence/endeca/overview/index.html

Oracle Data Integrator Application Adapter for Hadoophttp://www.oracle.com/us/products/middleware/data-

integration/hadoop/overview/index.html

Oracle Loader for Hadoophttp://www.oracle.com/technetwork/bdc/hadoop-loader/learnmore/index.html

Page 26: Big Data & Oracle Technologies

The End

The best way to predict the future is to

create it!

- Peter F. Drucker