oracle big data y database analytics - andres araujo
DESCRIPTION
Oracle proporciona una solución completa y abierta, sencilla de implementar, que combina hardware y software, para incorporar entornos y arquitecturas Big Data en entornos IT empresariales que requieran elevados niveles de fiabilidad, seguridad y productividad. Con Oracle Big Data SQL es posible mantener múltiples repositorios de información -Hadoop, NoSQL y relacionales- y acceder a ellos de forma unificada mediante SQL con el máximo rendimiento y el mínimo movimiento de información.TRANSCRIPT
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle Big Data y Database Analytics en el ámbito empresarial
J. Andrés Araújo
Principal Sales Consultant
Technology Sales Consulting
Sevilla, 10 Diciembre 2014
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
If you take a
snapshot of a
minute on the global
internet all of these activities
are happening ...
Big Data is the result of a Data Explosion
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 98
What Makes it Big Data?
VOLUME VELOCITY VARIETY VALUE
BLOG BLOG
Smart Metering
Social Social
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Why Is Big Data Important? Value Creation
HEALTH CARE MANUFACTURING COMMUNICATIONS
“In a big data world, a competitor that fails to sufficiently develop its capabilities will be left behind.”
Reduce Prescription Fraud
Accelerate Test Cycles to Reduce Backlog
Offering New Services based on Location
Data
McKinsey Global Institute
RETAIL
Better Predict Product Success
PUBLIC SECTOR
Improve Student Outcomes
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Big Data Technology Overview
100
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Extending Data Management… Big Data = Hadoop + NoSQL + Relational
Oracle Confidential – Internal/Restricted/Highly Restricted 101
• Run the Business
– Integrate existing systems
– Support mission-critical tasks
– Protect existing expenditures
– Insure skills relevance
Relational Hadoop
• Change the Business
– Disrupt competitors
– Disintermediate supply chains
– Leverage new paradigms
– Exploit new analyses
NoSQL
• Scale the Business
– Meet mobile challenges
– Accelerate developer agility
– Scale-out economically
– Serve data faster
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Software framework
Distributed processing in large sets of computers with redundant storage
Highly scalable data processing
Cost-effective model for high volume, low density data
Open source
Batch operation
102
Big Data Technology Today Hadoop & MapReduce
Management/Monitoring
Hadoop Distributed File System (HDFS)
MapReduce
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 103
Scanning All The Data Using Map/Reduce
SHUFFLE
/SORT
MAP
MAP
MAP
MAP SHUFFLE
/SORT
REDUCE
REDUCE
SHUFFLE
/SORT
SHUFFLE
/SORT
REDUCE
REDUCE
REDUCE
INPUT 2
OUTPUT 2
OUTPUT 1
MAP
MAP
MAP
MAP
MAP
REDUCE
REDUCE
REDUCE
MAP
MAP
MAP
MAP
MAP
MAP
REDUCE
REDUCE
SHUFFLE
/SORT
INPUT 1
MAP
MAP
MAP
MAP
MAP
REDUCE
REDUCE
REDUCE
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Big Data Technology Today
• Not-only-SQL (2009)
• Broad class of non-relational DBMS systems that typically – Provide horizontal/distributed scalability
– Avoid joins
– Have relaxed consistency guarantees
– Don’t require a structured schema
– Are application/developer-centric
• No standards – Rapid evolving set of solutions (150+ on nosql-database.org)
– Highly variable feature set
– UnQL launched in July
• Majority are open source
104
NoSQL databases
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Key value pair database
Dynamic data model
Highly scalable, available
Transparent load balancing
Commercial software and support
Easy management
Built using Berkeley DB
105
Oracle NoSQL Database
Nodes East
Nodes West
Nodes Central
Nodes
NoSQL Driver
Application
NoSQL Driver
Application
… Nodes
…
Rea
d
Del
ete
Rea
d
Up
dat
e
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Open source language and environment
Used for statistical computing and graphics
Strength in easily producing publication-quality graphs
Highly extensible
Created by Robert Gentleman and Ross Ihaka.
106
Big Data Technology Today R Statistical Programming Language
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Big Data The Oracle Proposal
107
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle Big Data Management System
SOU
RC
ES
DATA RESERVOIR DATA WAREHOUSE
Oracle Database
Oracle Industry Models
Oracle Advanced
Analytics
Oracle Spatial & Graph
Big Data Appliance
Apache Flume
Oracle GoldenGate
Oracle Event Processing
Cloudera Hadoop
Oracle NoSQL
Oracle R Advanced Analytics for Hadoop
Oracle R Distribution
Oracle Database
In-Memory, Multi-tenant
Oracle Industry Models
Oracle Advanced Analytics
Oracle Spatial & Graph
Exadata
Oracle GoldenGate
Oracle Event Processing
Oracle Data Integrator
Oracle Big Data Connectors
Oracle Data Integrator
ORACLE BIG DATA SQL
B
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 109
Big Data Hardware
Physical Installation (10 racks)
Electricians
Network Engineers
Storage Engineers
System Administrators
286 hours 236 hours, 616 cables
264 hours, 864 cables
320 hours, 576 cables
232 hours
Totals: 1338 people hours, 677 elapsed hours, 2344 cables
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle Big Data Appliance Hardware
• 18 Nodes fully cabled
• 288 Intel® Xeon® E5-2650 V2
• 1152 GB total memory*
• 864 TB total raw storage capacity
• 40Gb/sec InfiniBand Network
• 10Gb/sec Data Center Connectivity
110
X4-2 Full Rack
* Expandable to 9216 GB
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 111
Oracle Big Data Appliance Installation
Physical Installation (10 racks)
Electricians
Network Engineers
Storage Engineers
System Administrators
286 hours 236 hours, 616 cables
264 hours, 864 cables
320 hours, 576 cables
232 hours
16 hours 16 hours, 32 cables
6 hours, 14 cables
n/a n/a
38 vs. 1306 hours 19 vs. 677 elapsed hours 46 vs. 2344 cables
vs.
Oracle
Custom
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle Big Data Appliance Hardware
• 6 Nodes fully cabled
• 96 Intel® Xeon® E5 Processors (SandyBridge)
• 384 GB total memory
• 288 TB total raw storage capacity
• 40Gb/sec InfiniBand Network
• 10Gb/sec Data Center Connectivity
• All required switches for growth and Exadata Connectivity
112
X4-2 Starter Rack / In Rack Expansion
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Enterprise-ready Big Data platform.
• 100% pure Apache Hadoop
• All components for Hadoop deployment
• Cloudera Manager and all Cloudera subscription products included
Tested by Cloudera
Supported by Oracle
113
Big Data Software Cloudera Distribution Including Apache Hadoop
Coordination
Data Integration Fast
Read/Write Access
Languages / Compilers
Workflow Scheduling Metadata
APACHE ZOOKEEPER
APACHE FLUME, APACHE SQOOP
APACHE HBASE
APACHE PIG, APACHE HIVE, APACHE MAHOUT
APACHE OOZIE APACHE OOZIE APACHE HIVE
File System Mount UI Framework SDK
FUSE-DFS HUE HUE SDK
HDFS, MAPREDUCE
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle Big Data Appliance Software (I)
• Oracle Linux 6.4 with UEK 2 (v2.6.39)
• Oracle Java – JDK 7
• Cloudera CDH 4.4
– including Impala, Hbase, Accumulo and Search
• Cloudera Manager 4.7
– including Backup and Disaster Recovery (BDR) and Navigator
• Big Data Appliance Enterprise Manager Plug-In
• NoSQL DB CE 12cR1
• Oracle R Distribution (Open Source)
114
Pre-installed, pre-Integrated
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle Big Data Appliance Software (II)
• Oracle Big Data Connectors 2.3*
– Oracle SQL Connector for Hadoop
– Oracle Loader for Hadoop
– Oracle XQuery for Hadoop
– Oracle R Advanced Analytics for Hadoop
– Oracle Data Integrator Application Adapter for Hadoop
• Oracle Audit Vault and Database Firewall for Hadoop Auditing*
• Oracle Data Integrator*
• Oracle NoSQL Database Enterprise Edition*
115
Pre-installed, pre-Integrated
* Separately licensed software, can be pre-installed and configured on BDA
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 116
Required Skills for MapReduce Development
Java
Hadoop Framework
Parallel Algorithms
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 117
A Map/Reduce Pipeline
SHUFFLE /SORT
SHUFFLE /SORT
MAP
MAP
MAP
MAP SHUFFLE
/SORT
REDUCE
REDUCE
SHUFFLE /SORT
SHUFFLE /SORT
REDUCE
REDUCE
REDUCE
INPUT 2
INPUT 1
OUTPUT 2
OUTPUT 1
MAP
MAP
MAP
MAP
MAP
REDUCE
REDUCE
REDUCE
MAP
MAP
MAP
MAP
MAP
MAP
REDUCE
REDUCE
MAP
MAP
MAP
MAP
MAP
REDUCE
REDUCE
REDUCE
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle Data Integrator
Reduces Hadoop complexities through graphical tooling
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 118
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle Data Integrator: NoETL Approach
119
One Logical Design: Many Engine Alternatives:
Data Engines: Examples: Engine I/O: Best Use:
SQL / OLTP Database
• Oracle DBMS • Any OLTP DBMS • DW Appliances
SSD / Disk based
High volumes of transformations on relational data
MapReduce • Hive / MR2 • Pig / Oozie / MR2
SSD / Disk based
Huge batch-like transformations on any data types
In Memory (SQL / Big Data)
• Oracle InMemory • Hive / Tez / YARN • Spark / YARN • Cloudera Impala
D/RAM; with various built in spill to disk approaches
Highly interactive data transformation patterns
Streaming Big Data
• Storm / YARN • Oracle Event
Processor (OEP)
D/RAM; “always on” data pipeline
Very low latency transformations
Modern design studio for simple map development
Team-based GUI Tooling for work on Enterprise projects
Integrated lifecycle and metadata management
Automated support for Changed Data Capture
SEPARATE ETL ENGINE NOT REQUIRED!
Oracle OpenWorld 2014
Data Integrator
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle OpenWorld 2014 120
Oracle Data Integration – Powerful Big Data Solutions
Commodity Data Reservoir Leverage Oracle Data Integration
with a wide array of databases or data warehouse appliances
Support Hadoop distributions on commodity hardware
Oracle Engineered Systems Deeply integrated with Oracle Big
Data Appliance and Exadata Take advantage of Infiniband
performance, Oracle Big Data SQL, Columnar Compression, and all integrated Loader technologies
Streaming Big Data Integrate realtime transactional
databases with streaming analytics Filter, join and transform data while
it is in motion, make business decisions while data is in memory
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Most Heterogeneous Solution
Oracle OpenWorld 2014 121
Hadoop HBase Hadoop Hive/Flume HP Enscribe HP NonStop HP Neoview Hypersonic SQL IBM DB2 i Series IBM DB2 UDB IBM DB2 z Series IBM Informix IBM Netezza JMS / MQ Microsoft Access Microsoft SQLServer MySQL Pivotal Greenplum PostgreSQL Salesforce.com SAP BW / BI SAP ERP / ECC SAS SQL/MP SQL/MX Sybase ASE Sybase IQ Teradata
Adaptive Altova Apache Hcatalog Apache Hive/HQL Borland CA ERwin Cloudera Impala COBOL Copybook DataStax Embarcadero EMC ProActivity GentleWare Google BigQuery Grandite Hadapt Hive Hortonworks Hive IBM Cognos IBM DB2 IBM DataStage IBM Discovery IBM Federation Server IBM Lotus Notes IBM Netezza IBM Rational Rose IBM Rational Architect Informatica Metadata Mgr. Informatica PowerCenter
CoSORT ISO SQL Standard (DDL) MapR Hadoop Hive MicroFocus Microsoft Access Microsoft Office Excel Microsoft Visio Microsoft SQL Server Microsoft SSIS Microsoft Visual Studio Microstrategy Magic Draw OMG CWM Standard OMG UML Standard Oracle BI Answers Oracle BI Enterprise Edition Oracle BI Server Oracle DAC Oracle Data Integrator Oracle Data Modeler Oracle Database Oracle Designer Oracle Hyperion Applications Oracle Hyperion Essbase Oracle Warehouse Builder Pivotal Greenplum PostgreSQL
QlikView SAP BO Crystal Reports SAP BO Designer SAP BO Desktop Intelligence SAP BO Repository SAP BO Data Integrator SAP BO Data Steward SAP Master Data Management SAP Sybase PowerDesigner SAP Sybase ASE Database SAS Data Integration Studio SAS BI Server SAS Information Map SAS Metadata Management SAS OLAP Server Select Sparx Architect Syncsort Tableau Talend Teradata Tigris Visible W3C DTD & XSD Schema
Operational Integration (Movement / Transformation) Metadata Harvesting (Glossary, Lineage & Impact Analysis) Oracle Database Oracle Exadata Oracle Big Data Appliance Oracle TimesTen Oracle OLAP Oracle Business Intelligence Oracle BI Applications Oracle E-Business Suite Oracle JD Edwards Enterprise One Oracle JD Edwards World Oracle Fusion Applications Oracle Governance Risk and Compliance Oracle Fusion AIA Oracle Retail Applications Oracle Agile BI / DW Oracle Agile PLM for Process Oracle iFlex FlexCUBE Oracle iFlex Mantas Oracle Hyperion Applications Oracle PeopleSoft Oracle Siebel CRM / OnDemand Oracle Communications Oracle WebLogic Server Oracle Coherence Data Grid Oracle SOA Suite Oracle Enterprise Service Bus
+ open APIs and standards based meta-model
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle Big Data Connectors
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle Big Data Connectors
Data Load Oracle Loader for Hadoop
Data Access Oracle SQL Connector for
HDFS
R Analytics Oracle R Advanced Analytics
on Hadoop
Oracle Data Integrator Knowledge Modules
XML/XQuery Oracle XQuery on Hadoop
XQuery R Client
Optimized for Hadoop: Maximise parallelism
Fast performance Analyze data on Hadoop using
familiar client tools
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle XQuery for Hadoop
• OXH is a transformation engine for Big Data
• XQuery language executed on the Map/Reduce framework
124
Acquire – Organize – Analyze
Oracle Big Data Connectors
Oracle Data Integrator Oracle
Loader for Hadoop
XQuery
for $ln in
text:collection()
let $f :=
tokenize($ln)
where $f[1] = 'x'
return
text:put($f[2])
Map/Reduce
Execution Plan
M/R
M/R
M/R
M/R
Map/Reduce
Worker Nodes
HDFS
OXH
Engine
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Linearly Scale a Robust Set of R Algorithms
Leverage MapReduce for R Calculations
Compute Intensive Parallelism for Simulations
128
R Analytics leveraging Hadoop and HDFS Oracle R Connector for Hadoop
HDFS
Hadoop
Oracle R Client
MAP MAP MAP MAP
REDUCE REDUCE
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Integrated R environment
• Native R MapReduce
• Native R HDFS Access
Improved productivity
129
Running R on Hadoop Oracle R Connector for Hadoop
ORE
Client Host
R Engine
Hadoop Cluster
Software
R Engine
MapReduce Nodes
HDFS
Oracle Big Data Appliance
Oracle Exadata
R Engine ORE
ORHC ORHC
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle Loader for Hadoop
• Parallel load, optimized for Hadoop
• Automatic load balancing
• Convert to Oracle format on Hadoop
– Save database CPU
• Load specific Hive partitions
• Kerberos authentication
• Load directly into In-Memory table
JSON Log
files Hive
Text Parquet Avro Sequence
files
Compressed
files And more …
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle SQL Connector for HDFS
OSCH
Hive Text
OSCH
OSCH
OSCH
External
Table
create table customer_address
( ca_customer_id number(10,0)
, ca_street_number char(10)
, ca_state char(2)
, ca_zip char(10))
organization external (
TYPE ORACLE_LOADER
DEFAULT DIRECTORY DEFAULT_DIR
ACCESS PARAMETERS
(…)
PREPROCESSOR “HDFS_BIN_PATH:hdfs_stream”)
LOCATION (‘addr1’, ‘addr2’, ‘addr3’))
• Parallel query and load
• Load into database or query in place
• Access text or Hive over text
• Access compressed data
• Access specific Hive partitions
• Kerberos authentication Compressed
files
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle SQL Connector for HDFS
• Includes tool to generate external table
• Performance on Engineered Systems
– 15 TB/hour load time
• Query and load Oracle Data Pump files
– Binary file in Oracle format
– Uses less database CPU cycles during query/load
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 135
Oracle In-Database Unified Analytics Platform
XML Relational OLAP Spatial
Data Layer RDF Media
Parallel Processing Engine
Oracle R Enterprise
Oracle Data Mining
Text and Search
Spatial Analytics
SQL Analytics
Oracle MapReduce
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 136
In-Database Map/Reduce Oracle Database
Reduce
Table
Map
Map Reduce
Table K V
timestamp userid pageid
10:00:00 12345 A73_2
10:00:02 8901 A74_3
10:00:03 12345 A73_3
10:01:12 12345 A74_4
session userid pageid duration
0 12345 A73_2 3
0 12345 A73_3 70
0 12345 A74_4 12
1 8901 A74_3 89
MapReduce within the Oracle Database:
select session, userid, pageid, duration
from table(oracle_map_reduce.reducer(cursor(
select * from table(oracle_map_reduce.mapper(cursor(
select * from clicks))) map_result)));
=> Works on internal and external data sources
=> Leverage PL/SQL skills for big data analytics
=> High efficiency through parallel pipelined infrastructure
=> In-database execution allows for fast query performance
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
R code and/or SQL
Models run in-database
Avoid Data Movement
Processes large data sets
Uses the power of Oracle Database 11g, 12c and Exadata
Same code, much faster
137
Oracle Advance Analytics Oracle R Enterprise Approach
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
More Powerful Together
Business Intelligence and Information Discovery
Optimized for Exalytics In-Memory
Machine
Analysis Problems Measure, Analyze, Report
Discovery Problems Investigate, Explore, Understand
Unstructured Data Diverse, textual,
uncertain quality
Structured Data Modeled and
conforming
Oracle Business Intelligence
Proven Answers to Known Questions
Oracle Endeca
Information Discovery
Fast Answers to New Questions Insights yield new
metrics to monitor,
data to integrate
New questions
require exploration,
new information;
Leverage existing
investments
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 139
Extend Business Analytics with Unstructured Data
Oracle Endeca Information Discovery
Social Media Content Systems,
Files, Email
Websites
Unstructured Data
Big Data
Oracle Endeca Information Discovery Best platform for Unstructured Analytics
Endeca Server Hybrid Search/Analytical Database
Flexible Data Model
Oracle Business Intelligence Best platform for integrated ROLAP and MOLAP
BI Server + OLAP Common Enterprise Information Model
OLTP & ODS
Systems
Enterprise Applications
(Oracle, SAP, Others)
Data Warehouse
& Data Marts
Structured Data
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Evolution of Analytical SQL
• Introduction of “window” functions
• Enhanced window functions (percentile, etc)
• Rollup, grouping sets, cube
• Statistical functions
• SQL model clause
• Partition Outer Join
• In-database Data Mining
• SQL Pivot
• Recursive WITH
• ListAgg, Nth value window
• Pattern matching
• Top N clause
• Approx Count distinct
• JSON support
8i 9i 10g 11g 12c
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Barriers to Big Data Adoption Complexity
• Skills
– Lack tools and training to exploit Big Data
– IT Operations ability administer and manage Big Data
• Integration – Adding Big Data to existing architecture is complex
– Too much effort required in data preparation
• Security
– No clear route to governance or enforcement
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Data Warehouses
Business Analytics
Evolution of Big Data Analytics in the Enterprise
Transactional Applications
Operational Reporting
Social Media
Internet of Things
73°
Big Data Platform
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Big Data Analytics Challenge Separate silos of information to analyze
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Data Analytics Challenge Separate data access interfaces
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Data Analytics Challenge No comprehensive SQL interface across Oracle, Hadoop and NoSQL
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
What customers want Rich, comprehensive SQL access to all enterprise data
NoSQL
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
What gives Exadata extreme performance?
Oracle Database 12c
SQL
Offload Query to Exadata Storage Servers
Small data subset quickly returned
Hadoop & NoSQL
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Introducing Oracle Big Data SQL Massively Parallel SQL Query across Oracle, Hadoop and NoSQL
Oracle Database 12c
Offload Query to Exadata Storage Servers
Small data subset quickly returned
Offload Query to Data Nodes
SQL
data subset
SQL
Hadoop & NoSQL
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Storage Layer
Oracle Confidential – Internal/Restricted/Highly Restricted 149
Oracle Big Data SQL: A New Hadoop Processing Engine
Filesystem (HDFS) NoSQL Databases
(Oracle NoSQL DB, Hbase)
Resource Management (YARN, cgroups)
Processing Layer
MapReduce and Hive
Spark Impala Search Big Data
SQL
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Big Data SQL
150
SELECT w.sess_id, c.name FROM web_logs w, customers c WHERE w.source_country = ‘Brazil’ AND w.cust_id = c.customer_id;
Relevant SQL runs on BDA nodes
10’s of Gigabytes of Data
Only columns and rows needed to answer query are returned
Hadoop Cluster
B B B
Big Data SQL
Oracle Database
CUSTOMERS WEB_LOGS
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Big Data SQL
151
SELECT w.sess_id, c.name FROM web_logs w, customers c WHERE w.source_country = ‘Brazil’ AND w.cust_id = c.customer_id;
Relevant SQL runs on BDA nodes
10’s of Gigabytes of Data
Only columns and rows needed to answer query are returned
Hadoop Cluster
B B B
Big Data SQL
Oracle Database
CUSTOMERS WEB_LOGS
SQL Push Down in Big Data SQL
• Hadoop Scans on Unstructured Data • WHERE Clause Evaluation • Column Projection • Bloom Filters for Better Join Performance • JSON Parsing, Data Mining Model Evaluation
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Why Make Big Data a Divided World?
VS
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Unified Big Data Environment
VS &
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Securing Big Data
• Increasingly, Big Data solutions are capturing sensitive information must be protected and audited
• This is no different than critical data stored in an RDBMS
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Enhanced Big Data Security
Authenticate users with secure Kerberos protocol
Authorize access to data with fine grained controls
Audit activity and access with Oracle Audit Vault and Database Firewall
Encrypt data as it flows thru the system*
*Planned for v2.3.2
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle Big Data Management System
SOU
RC
ES
DATA RESERVOIR DATA WAREHOUSE
Oracle Database
Oracle Industry Models
Oracle Advanced
Analytics
Oracle Spatial & Graph
Big Data Appliance
Apache Flume
Oracle GoldenGate
Oracle Event Processing
Cloudera Hadoop
Oracle NoSQL
Oracle R Advanced Analytics for Hadoop
Oracle R Distribution
Oracle Database
In-Memory, Multi-tenant
Oracle Industry Models
Oracle Advanced Analytics
Oracle Spatial & Graph
Exadata
Oracle GoldenGate
Oracle Event Processing
Oracle Data Integrator
Oracle Big Data Connectors
Oracle Data Integrator
ORACLE BIG DATA SQL
B
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle Big Data Platform Big Data Management System
z BY INDUSTRY & LINE OF BUSINESS
BIG
DA
TA
AP
PLI
CA
TIO
NS
DISCOVERY
BU
SIN
ESS
AN
ALY
TIC
S
BUSINESS ANALYTICS
DATA RESERVOIR
BIG
DA
TA
MA
NA
GEM
ENT
DATA WAREHOUSE
SOU
RC
ES
ORACLE BIG DATA SQL
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Big Data Why Oracle?
158
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Unified Data Platform
Advanced Query & Analysis Full Power of SQL and Advanced Analytics
Transparent to Applications No Changes to Application Code
Single View of All Data Unified Metadata Across RDBMS & Hadoop
Fastest Performance Utilize SQL Processing Across the Platform
Leverage Existing Skills Lower Cost & Complexity of Big Data Adoption
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 160
Questions & Answers
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 161