oracle, hadoop, and the big data revolution
TRANSCRIPT
Oracle, Hadoop, and the Big Data Revolution
Guy HarrisonExecutive Director, R&D Information Management
Introductions
Web: guyharrison.netEmail: [email protected]: @guyharrison
But Seriously
Oracle OpenWorld 2013
What is Big Data?
Three or Four “V”s
VolumeTerabytesPetabytesExabytesZetabytes
VarietyStructuredUnstructuredHuman GeneratedMachine Generated
VelocityUser populations xTransaction rates xMachine data
Value Competitive or Collective advantage
Data volumes have always been increasing….
2006 Perspective
Though the absolute volumes are boggling…
2.81E+15
1.10E+17
5.48E+18
4.87E+18
1.18E+21
2.13E+21
1.E+09 1.E+11 1.E+13 1.E+15 1.E+17 1.E+19 1.E+21
Human Brain
Living Human Genomes
Digital information 2008
Total Digital capacity
Digital information created 2011
Gigabyte Terabyte Petabyte Exabyte zettabyte
Oracle OpenWorld 2013
Velocity
Oracle OpenWorld 2013
Variety
– or the industrial Revolution of data
14 Software Group
15 Software Group
16 Software Group
17 Software Group
18 Software Group
Data: now and then Generated
internally
Key to operational
efficiency
1993Generated externally
Key to competitiveness
Source of product
innovation
Changing our world
2013
Big Data is the culmination of cloud, social and mobile
Oracle OpenWorld 2013
Big Data can be deadly
Will Big Data kill retail?
Prevalence of Showrooming
0 10 20 30 40 50 60 70
Consumer Electronics
Home Improvement
Pct
Garter Research G00249458
Survey Analysis: Focus on Customer Basics to Challenge Amazon, as 'Showrooming' Is Universal but Not Unbeatable
Published: 12 February 2013
Why showrooming?
Selection
Stock
Faster
Cheaper
Dynamic Pricing
Predictive ordering
Assortment optimization
Predictive recommendations
Personalization
Defences?
Some novel defenses
Web analytics for retail
First mover advantage
• The First vendor to offer you a product at a good price has the advantage
• It is totally insufficient to lay a bunch of products on a table in a building
• Only big data analytics can provide this first mover advantage
There’s a similar story in every industry
Web
Transport
Power Grid
Dating
Retail
Security
FinanceGovernment
Science
Healthcare
Insurance
Telecom
Advertising
The Revolution is not over yet
Willy Bowman
Nationality: German
Don’t Mention the WAR!
Buying choices:
Amazon softcover: $45.99
Oracle Performance Survival Guide
Amazon Kindle: $39.99
Say “screw you bookseller” to buy kindle version
Brain Control
Muze
The instrumented human
• Bluetooth Personal Area Network
• 3G/WiFi Wide Area Network
• GPS
• Storage
• Pulse, temp monitor
• Silent alarms
• Pedometer, sleep monitoring
• Compass
• Camera
• Mike/earphones
• Heads up display
• Emotion/Attention monitor
The instrumented world
All of which accelerates what we call Big Data
Oracle OpenWorld 2013
Big Database technologies
Pioneers of Big Data
Google File System (GFS)
Map Reduce BigTable
Google ApplicationsGoogle Software Architecture
Start ReduceMapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
MapMap
Map Reduce
HDFS
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
MAPPER
SCANSORT
MAPPER
MAPPER
MAPPER
MAPPER
AGGREGATE
REDUCEClient
Multi-stage Map-Reduce
Schema on Read vsSchema on Write
Data
Analyse
Aggregate
Normaliz
e
Cleanse
Code
ExtractLoad Transform Data
Warehouse
Data LoadHadoop
Analyse
Cleanse
Code
Utilize
Schema on Write
Schema on Read
Utilize
Oracle OpenWorld 2013
Hadoop: Open Source Map-Reduce Stack
Hadoop at Yahoo
Yahoo! Hadoop cluster:• 4000 nodes• 16PB disk• 64 TB of RAM• 32,000 Cores
Hadoop File System (HDFS)
Map Reduce/ YARNHbase
(Database)
ZooKeeper
(Locking)
SQOOP
(RDBMS loader)
Hive
(Query)
Pig
(Scripting)
Flume
(Log Loader)
Oozie (Workflow manager)
Hadoop 1.0 Architecture
MAP REDUCE (DISTRIBUTED PROCESSING)
HADOOP CLIENT (JAVA, PIG, HIVE)
HDFS (DISTRIBUTED STORAGE)
JOB TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODETASK TRACKER
NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
SECONDARY NAME NODE
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
DATA NODE TASK TRACKER
Hadoop 2.0 YARN*
APPLICATION MASTER
NODE MANAGER
CONTAINER
RESOURCE MANAGER
NODE MANAGER
CONTAINER
NODE MANAGER
CONTAINER
HADOOP CLIENT (JAVA, PIG, HIVE)
*Yet Another Resource Negotiator
Tez1
1Hindi for “fast”
HDFS
MAP
REDUCE
MAP
MAP
REDUCE
MAP
MAP
REDUCE
MAP
Job 2Job 1
Job 3
HDFS
Job 1
HBase A Real time database built on Hadoop
ASM
Datafiles
Buffer Cache
Table Table
Redo
Disks
LogBuffer
HDFS
HFile
MemStore
Table Table
WA Log
Disks
HFile
Name Site Counter
Dick Ebay 507,018
Dick Google 690,414
Jane Google 716,426
Dick Facebook 723,649
Jane Facebook 643,261
Jane ILoveLarry.com 856,767
Dick MadBillFans.com 675,230
NameId Name
1 Dick
2 Jane
SiteId SiteName
1 Ebay
2 Google
3 Facebook
4 ILoveLarry.com
5 MadBillFans.com
NameId SiteId Counter
1 1 507,018
1 3 690,414
2 3 716,426
1 3 723,649
2 3 643,261
2 4 856,767
1 5 675,230
Id Name Ebay Google Facebook (other columns) MadBillFans.com
1 Dick 507,018 690,414 723,649 . . . . . . . . . . . . . . 675,230
Id Name Google Facebook (other columns) ILoveLarry.com
2 Jane 716,426 643,261 . . . . . . . . . . . . . . 856,767
Hbase Data Model
Oracle OpenWorld 2013
Hive
SQL
JA
VA
RE
SU
LTS
Other SQL-like Hadoop Interfaces
Cloudera Impala MapR Drill Aster
Greenplumb (Pivotal HD)
Paraccel Hadapt
Oracle SQL Connector for
Hadoop (External Table interface to
HDFS)
Pig
Pig Latin
SQL or Hive QL
Flume and SQOOP
CUSTOMERS
WebLogs
PRODUCTS
HDFS
RDBMS
FLUME
SQOOP
Oracle Exadata
Database servers
64 cores, 576 GB
RAM
Storage Servers
112 cores,
100 TB SAS or
336 TB SATA plus
5 TB SSD
Economies
$4,911
$750
$0 $1,000 $2,000 $3,000 $4,000 $5,000 $6,000
Exadata
Hadoop
Exadata vs Hadoop $$/TB (Hardware only)
Oracle Big Data Appliance
• 18 Sun X4270 M2 servers– 48GB RAM per node (864GB total)
– 2x6 Core CPU per node (216 total)
– 12x2TB HDD per node (216 spindles, 864 TB)
– 40Gb/s Infiniband between nodes
– 10Gb/s Ethernet to datacentre
• Competitive Pricing
www.oracle.com/us/bigdata/index.html
Big Data Appliance Software
• Cloudera Enterprise
• Oracle Enterprise R
• Oracle NoSQL
• Oracle Big Data Connectors
Generating competitive advantage through “Big Data analytics”
Machine LearningPrograms that evolve with “experience”
Collective IntelligencePrograms that use inputs from “crowds’ to seem intelligent
Predictive AnalyticsPrograms that extrapolate from existing data into the future
Big Data AnalyticsAKA Data Science
Collective Intelligence
Google Flu Trends
Collective Intelligence outsmarts Artificial Intelligence?
Oracle OpenWorld 2013
Artificial Intelligence Strikes back
Oracle OpenWorld 2013
Watson is big data AI
Predictive Analytics
y = 0.9715x + 0.7191
-20
0
20
40
60
80
100
120
0 20 40 60 80 100 120
SupervisedMachine Learning
Raw Data Clean
Validate Model
Candidate
ModelTraining Set
Validation Set
Production
ModelNew Data
New Business
Existing Business
Prediction
Inmaps.linkedin.com
Unsupervised learning
Big Data Analytics
Data Science
Search Optimization
Recommendation Systems
Security
•Vulnerability
•Penetration Detection
Fraud Detection
CRM
•Churn
•Defaults
Medical
•Risk analysis
•Diagnosis
•Prognosis
Game optimization
Advertising
•Targeting
•Tailoring
Data Science is hard• Machine learning, collective
intelligence, Hadoop, predictive analytics, R, Weka, Mahout, are HARD
• Small-medium businesses need help to compete
• Data scientists to the rescue?
Data Scientists to the rescue?
Kitenga Analytics Suite
Toad for Hadoop
http://www.toadworld.com/products/
toad-for-hadoop/default.aspx
SharePlex® for Hadoop
Redo-logs
Change Data Capture
JMS Queue Hadoop Poster
BatchedHDFS File Copy Audit / Change
Data
HBase RealTime replication
Toad BI Suite
Recommendations For your business
How could data and algorithms transform your business?
What are the technologies that will be most important?
• Mobility
• Cloud
• Hadoop
• Big Data Analytics
Where is the data?• Start collecting now!
Hadoop and NoSQL creates strong career opportunities for DBAs and developers
• Demand will exceed supply for the foreseeable future
Lot’s of opportunities for those with Math & Statistics
• Good time to brush off that statistics textbook and play with R (maybe Oracle Enterprise R?)
Easy to get started with Hadoop• SQOOP
• Hive
• Pig
Recommendations For your career
Oracle OpenWorld 2013
Dell Toad Party!
Dell/TOAD Party, Tue 6:30-9:30p, Tonga Room, Fairmont Hotel