oracle, hadoop, and the big data revolution

120
Oracle, Hadoop, and the Big Data Revolution Guy Harrison Executive Director, R&D Information Management

Upload: others

Post on 15-Jan-2022

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Oracle, Hadoop, and the Big Data Revolution

Oracle, Hadoop, and the Big Data Revolution

Guy HarrisonExecutive Director, R&D Information Management

Page 3: Oracle, Hadoop, and the Big Data Revolution
Page 4: Oracle, Hadoop, and the Big Data Revolution
Page 5: Oracle, Hadoop, and the Big Data Revolution
Page 6: Oracle, Hadoop, and the Big Data Revolution
Page 7: Oracle, Hadoop, and the Big Data Revolution

But Seriously

Page 8: Oracle, Hadoop, and the Big Data Revolution

Oracle OpenWorld 2013

What is Big Data?

Page 9: Oracle, Hadoop, and the Big Data Revolution

Three or Four “V”s

VolumeTerabytesPetabytesExabytesZetabytes

VarietyStructuredUnstructuredHuman GeneratedMachine Generated

VelocityUser populations xTransaction rates xMachine data

Value Competitive or Collective advantage

Page 10: Oracle, Hadoop, and the Big Data Revolution

Data volumes have always been increasing….

2006 Perspective

Page 11: Oracle, Hadoop, and the Big Data Revolution

Though the absolute volumes are boggling…

2.81E+15

1.10E+17

5.48E+18

4.87E+18

1.18E+21

2.13E+21

1.E+09 1.E+11 1.E+13 1.E+15 1.E+17 1.E+19 1.E+21

Human Brain

Google

Living Human Genomes

Digital information 2008

Total Digital capacity

Digital information created 2011

Gigabyte Terabyte Petabyte Exabyte zettabyte

Page 12: Oracle, Hadoop, and the Big Data Revolution

Oracle OpenWorld 2013

Velocity

Page 13: Oracle, Hadoop, and the Big Data Revolution

Oracle OpenWorld 2013

Variety

– or the industrial Revolution of data

Page 14: Oracle, Hadoop, and the Big Data Revolution

14 Software Group

Page 15: Oracle, Hadoop, and the Big Data Revolution

15 Software Group

Page 16: Oracle, Hadoop, and the Big Data Revolution

16 Software Group

Page 17: Oracle, Hadoop, and the Big Data Revolution

17 Software Group

Page 18: Oracle, Hadoop, and the Big Data Revolution

18 Software Group

Page 19: Oracle, Hadoop, and the Big Data Revolution

Data: now and then Generated

internally

Key to operational

efficiency

1993Generated externally

Key to competitiveness

Source of product

innovation

Changing our world

2013

Page 20: Oracle, Hadoop, and the Big Data Revolution

Big Data is the culmination of cloud, social and mobile

Page 21: Oracle, Hadoop, and the Big Data Revolution

Oracle OpenWorld 2013

Big Data can be deadly

Page 22: Oracle, Hadoop, and the Big Data Revolution

Will Big Data kill retail?

Page 23: Oracle, Hadoop, and the Big Data Revolution

Prevalence of Showrooming

0 10 20 30 40 50 60 70

Consumer Electronics

Home Improvement

Pct

Garter Research G00249458

Survey Analysis: Focus on Customer Basics to Challenge Amazon, as 'Showrooming' Is Universal but Not Unbeatable

Published: 12 February 2013

Page 24: Oracle, Hadoop, and the Big Data Revolution
Page 25: Oracle, Hadoop, and the Big Data Revolution
Page 26: Oracle, Hadoop, and the Big Data Revolution
Page 27: Oracle, Hadoop, and the Big Data Revolution
Page 28: Oracle, Hadoop, and the Big Data Revolution

Why showrooming?

Selection

Stock

Faster

Cheaper

Dynamic Pricing

Predictive ordering

Assortment optimization

Predictive recommendations

Personalization

Defences?

Page 29: Oracle, Hadoop, and the Big Data Revolution

Some novel defenses

Page 30: Oracle, Hadoop, and the Big Data Revolution

Web analytics for retail

Page 31: Oracle, Hadoop, and the Big Data Revolution
Page 32: Oracle, Hadoop, and the Big Data Revolution

First mover advantage

• The First vendor to offer you a product at a good price has the advantage

• It is totally insufficient to lay a bunch of products on a table in a building

• Only big data analytics can provide this first mover advantage

Page 33: Oracle, Hadoop, and the Big Data Revolution

There’s a similar story in every industry

Web

Transport

Power Grid

Dating

Retail

Security

FinanceGovernment

Science

Healthcare

Insurance

Telecom

Advertising

Page 34: Oracle, Hadoop, and the Big Data Revolution

The Revolution is not over yet

Page 35: Oracle, Hadoop, and the Big Data Revolution
Page 36: Oracle, Hadoop, and the Big Data Revolution
Page 37: Oracle, Hadoop, and the Big Data Revolution
Page 38: Oracle, Hadoop, and the Big Data Revolution

Willy Bowman

Nationality: German

Don’t Mention the WAR!

Page 39: Oracle, Hadoop, and the Big Data Revolution

Buying choices:

Amazon softcover: $45.99

Oracle Performance Survival Guide

Amazon Kindle: $39.99

Say “screw you bookseller” to buy kindle version

Page 40: Oracle, Hadoop, and the Big Data Revolution
Page 41: Oracle, Hadoop, and the Big Data Revolution
Page 42: Oracle, Hadoop, and the Big Data Revolution

Brain Control

Page 43: Oracle, Hadoop, and the Big Data Revolution
Page 44: Oracle, Hadoop, and the Big Data Revolution
Page 45: Oracle, Hadoop, and the Big Data Revolution

Muze

Page 46: Oracle, Hadoop, and the Big Data Revolution
Page 47: Oracle, Hadoop, and the Big Data Revolution
Page 48: Oracle, Hadoop, and the Big Data Revolution

The instrumented human

• Bluetooth Personal Area Network

• 3G/WiFi Wide Area Network

• GPS

• Storage

• Pulse, temp monitor

• Silent alarms

• Pedometer, sleep monitoring

• Compass

• Camera

• Mike/earphones

• Heads up display

• Emotion/Attention monitor

Page 49: Oracle, Hadoop, and the Big Data Revolution

The instrumented world

Page 50: Oracle, Hadoop, and the Big Data Revolution

All of which accelerates what we call Big Data

Page 51: Oracle, Hadoop, and the Big Data Revolution

Oracle OpenWorld 2013

Big Database technologies

Page 52: Oracle, Hadoop, and the Big Data Revolution

Pioneers of Big Data

Page 53: Oracle, Hadoop, and the Big Data Revolution
Page 54: Oracle, Hadoop, and the Big Data Revolution
Page 55: Oracle, Hadoop, and the Big Data Revolution
Page 56: Oracle, Hadoop, and the Big Data Revolution
Page 57: Oracle, Hadoop, and the Big Data Revolution
Page 58: Oracle, Hadoop, and the Big Data Revolution

Google File System (GFS)

Map Reduce BigTable

Google ApplicationsGoogle Software Architecture

Page 59: Oracle, Hadoop, and the Big Data Revolution

Start ReduceMapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map Reduce

Page 60: Oracle, Hadoop, and the Big Data Revolution

HDFS

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

SCANSORT

MAPPER

MAPPER

MAPPER

MAPPER

AGGREGATE

REDUCEClient

Multi-stage Map-Reduce

Page 61: Oracle, Hadoop, and the Big Data Revolution

Schema on Read vsSchema on Write

Data

Analyse

Aggregate

Normaliz

e

Cleanse

Code

ExtractLoad Transform Data

Warehouse

Data LoadHadoop

Analyse

Cleanse

Code

Utilize

Schema on Write

Schema on Read

Utilize

Page 62: Oracle, Hadoop, and the Big Data Revolution

Oracle OpenWorld 2013

Hadoop: Open Source Map-Reduce Stack

Page 63: Oracle, Hadoop, and the Big Data Revolution

Hadoop at Yahoo

Yahoo! Hadoop cluster:• 4000 nodes• 16PB disk• 64 TB of RAM• 32,000 Cores

Page 64: Oracle, Hadoop, and the Big Data Revolution
Page 65: Oracle, Hadoop, and the Big Data Revolution
Page 66: Oracle, Hadoop, and the Big Data Revolution

Hadoop File System (HDFS)

Map Reduce/ YARNHbase

(Database)

ZooKeeper

(Locking)

SQOOP

(RDBMS loader)

Hive

(Query)

Pig

(Scripting)

Flume

(Log Loader)

Oozie (Workflow manager)

Page 67: Oracle, Hadoop, and the Big Data Revolution

Hadoop 1.0 Architecture

MAP REDUCE (DISTRIBUTED PROCESSING)

HADOOP CLIENT (JAVA, PIG, HIVE)

HDFS (DISTRIBUTED STORAGE)

JOB TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODETASK TRACKER

NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

SECONDARY NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

Page 68: Oracle, Hadoop, and the Big Data Revolution

Hadoop 2.0 YARN*

APPLICATION MASTER

NODE MANAGER

CONTAINER

RESOURCE MANAGER

NODE MANAGER

CONTAINER

NODE MANAGER

CONTAINER

HADOOP CLIENT (JAVA, PIG, HIVE)

*Yet Another Resource Negotiator

Page 69: Oracle, Hadoop, and the Big Data Revolution

Tez1

1Hindi for “fast”

HDFS

MAP

REDUCE

MAP

MAP

REDUCE

MAP

MAP

REDUCE

MAP

Job 2Job 1

Job 3

HDFS

Job 1

Page 70: Oracle, Hadoop, and the Big Data Revolution

HBase A Real time database built on Hadoop

ASM

Datafiles

Buffer Cache

Table Table

Redo

Disks

LogBuffer

HDFS

HFile

MemStore

Table Table

WA Log

Disks

HFile

Page 71: Oracle, Hadoop, and the Big Data Revolution

Name Site Counter

Dick Ebay 507,018

Dick Google 690,414

Jane Google 716,426

Dick Facebook 723,649

Jane Facebook 643,261

Jane ILoveLarry.com 856,767

Dick MadBillFans.com 675,230

NameId Name

1 Dick

2 Jane

SiteId SiteName

1 Ebay

2 Google

3 Facebook

4 ILoveLarry.com

5 MadBillFans.com

NameId SiteId Counter

1 1 507,018

1 3 690,414

2 3 716,426

1 3 723,649

2 3 643,261

2 4 856,767

1 5 675,230

Id Name Ebay Google Facebook (other columns) MadBillFans.com

1 Dick 507,018 690,414 723,649 . . . . . . . . . . . . . . 675,230

Id Name Google Facebook (other columns) ILoveLarry.com

2 Jane 716,426 643,261 . . . . . . . . . . . . . . 856,767

Hbase Data Model

Page 72: Oracle, Hadoop, and the Big Data Revolution

Oracle OpenWorld 2013

Hive

Page 73: Oracle, Hadoop, and the Big Data Revolution
Page 74: Oracle, Hadoop, and the Big Data Revolution

SQL

JA

VA

RE

SU

LTS

Page 75: Oracle, Hadoop, and the Big Data Revolution

Other SQL-like Hadoop Interfaces

Cloudera Impala MapR Drill Aster

Greenplumb (Pivotal HD)

Paraccel Hadapt

Oracle SQL Connector for

Hadoop (External Table interface to

HDFS)

Page 76: Oracle, Hadoop, and the Big Data Revolution

Pig

Pig Latin

SQL or Hive QL

Page 77: Oracle, Hadoop, and the Big Data Revolution

Flume and SQOOP

CUSTOMERS

WebLogs

PRODUCTS

HDFS

RDBMS

FLUME

SQOOP

Page 78: Oracle, Hadoop, and the Big Data Revolution

Oracle Exadata

Database servers

64 cores, 576 GB

RAM

Storage Servers

112 cores,

100 TB SAS or

336 TB SATA plus

5 TB SSD

Page 79: Oracle, Hadoop, and the Big Data Revolution

Economies

$4,911

$750

$0 $1,000 $2,000 $3,000 $4,000 $5,000 $6,000

Exadata

Hadoop

Exadata vs Hadoop $$/TB (Hardware only)

Page 80: Oracle, Hadoop, and the Big Data Revolution

Oracle Big Data Appliance

• 18 Sun X4270 M2 servers– 48GB RAM per node (864GB total)

– 2x6 Core CPU per node (216 total)

– 12x2TB HDD per node (216 spindles, 864 TB)

– 40Gb/s Infiniband between nodes

– 10Gb/s Ethernet to datacentre

• Competitive Pricing

www.oracle.com/us/bigdata/index.html

Page 81: Oracle, Hadoop, and the Big Data Revolution

Big Data Appliance Software

• Cloudera Enterprise

• Oracle Enterprise R

• Oracle NoSQL

• Oracle Big Data Connectors

Page 82: Oracle, Hadoop, and the Big Data Revolution

Generating competitive advantage through “Big Data analytics”

Machine LearningPrograms that evolve with “experience”

Collective IntelligencePrograms that use inputs from “crowds’ to seem intelligent

Predictive AnalyticsPrograms that extrapolate from existing data into the future

Big Data AnalyticsAKA Data Science

Page 83: Oracle, Hadoop, and the Big Data Revolution

Collective Intelligence

Page 84: Oracle, Hadoop, and the Big Data Revolution
Page 85: Oracle, Hadoop, and the Big Data Revolution
Page 86: Oracle, Hadoop, and the Big Data Revolution
Page 87: Oracle, Hadoop, and the Big Data Revolution
Page 88: Oracle, Hadoop, and the Big Data Revolution
Page 89: Oracle, Hadoop, and the Big Data Revolution
Page 90: Oracle, Hadoop, and the Big Data Revolution
Page 91: Oracle, Hadoop, and the Big Data Revolution
Page 92: Oracle, Hadoop, and the Big Data Revolution

Google Flu Trends

Page 93: Oracle, Hadoop, and the Big Data Revolution
Page 94: Oracle, Hadoop, and the Big Data Revolution

Collective Intelligence outsmarts Artificial Intelligence?

Page 95: Oracle, Hadoop, and the Big Data Revolution
Page 96: Oracle, Hadoop, and the Big Data Revolution
Page 97: Oracle, Hadoop, and the Big Data Revolution
Page 98: Oracle, Hadoop, and the Big Data Revolution
Page 99: Oracle, Hadoop, and the Big Data Revolution

Oracle OpenWorld 2013

Artificial Intelligence Strikes back

Page 100: Oracle, Hadoop, and the Big Data Revolution
Page 101: Oracle, Hadoop, and the Big Data Revolution
Page 102: Oracle, Hadoop, and the Big Data Revolution
Page 103: Oracle, Hadoop, and the Big Data Revolution
Page 104: Oracle, Hadoop, and the Big Data Revolution

Oracle OpenWorld 2013

Watson is big data AI

Page 105: Oracle, Hadoop, and the Big Data Revolution

Predictive Analytics

y = 0.9715x + 0.7191

-20

0

20

40

60

80

100

120

0 20 40 60 80 100 120

Page 106: Oracle, Hadoop, and the Big Data Revolution

SupervisedMachine Learning

Raw Data Clean

Validate Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

Page 107: Oracle, Hadoop, and the Big Data Revolution

Inmaps.linkedin.com

Unsupervised learning

Page 108: Oracle, Hadoop, and the Big Data Revolution
Page 109: Oracle, Hadoop, and the Big Data Revolution

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Security

•Vulnerability

•Penetration Detection

Fraud Detection

CRM

•Churn

•Defaults

Medical

•Risk analysis

•Diagnosis

•Prognosis

Game optimization

Advertising

•Targeting

•Tailoring

Page 110: Oracle, Hadoop, and the Big Data Revolution

Data Science is hard• Machine learning, collective

intelligence, Hadoop, predictive analytics, R, Weka, Mahout, are HARD

• Small-medium businesses need help to compete

• Data scientists to the rescue?

Page 111: Oracle, Hadoop, and the Big Data Revolution

Data Scientists to the rescue?

Page 112: Oracle, Hadoop, and the Big Data Revolution

Kitenga Analytics Suite

Page 113: Oracle, Hadoop, and the Big Data Revolution

Toad for Hadoop

http://www.toadworld.com/products/

toad-for-hadoop/default.aspx

Page 114: Oracle, Hadoop, and the Big Data Revolution

SharePlex® for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit / Change

Data

HBase RealTime replication

Page 115: Oracle, Hadoop, and the Big Data Revolution

Toad BI Suite

Page 116: Oracle, Hadoop, and the Big Data Revolution
Page 117: Oracle, Hadoop, and the Big Data Revolution

Recommendations For your business

How could data and algorithms transform your business?

What are the technologies that will be most important?

• Mobility

• Cloud

• Hadoop

• Big Data Analytics

Where is the data?• Start collecting now!

Page 118: Oracle, Hadoop, and the Big Data Revolution

Hadoop and NoSQL creates strong career opportunities for DBAs and developers

• Demand will exceed supply for the foreseeable future

Lot’s of opportunities for those with Math & Statistics

• Good time to brush off that statistics textbook and play with R (maybe Oracle Enterprise R?)

Easy to get started with Hadoop• SQOOP

• Hive

• Pig

Recommendations For your career

Page 119: Oracle, Hadoop, and the Big Data Revolution

Oracle OpenWorld 2013

Dell Toad Party!

Dell/TOAD Party, Tue 6:30-9:30p, Tonga Room, Fairmont Hotel

Page 120: Oracle, Hadoop, and the Big Data Revolution

Web: guyharrison.net Email: [email protected]

Twitter: @guyharrison

www.toadworld.com