big data infrastructure – the oracle way. · pdf filecd /opt/oracle/bdamaamoth mammoth...
Post on 08-Mar-2018
218 Views
Preview:
TRANSCRIPT
BASEL BERN BRUGG DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. GENEVA HAMBURG COPENHAGEN LAUSANNE MUNICH STUTTGART VIENNA ZURICH
Big Data Infrastructure –The Oracle Way.Daniel Steiger
About ...
Big Data Infrastructure - The Oracle Way2 17.11.2016
Daniel SteigerPrincipal Consultant @ Trivadis
Oracle DBA and IT Infrastructure Architect
Program Manager IT Infrastructure Optimization
Co-Author "Der Oracle DBA", Hanser Verlag
Speaker and Teacher
Our company.
Big Data Infrastructure - The Oracle Way3 17.11.2016
Trivadis is a market leader in IT consulting, system integration, solution engineeringand the provision of IT services focusing on andtechnologiesin Switzerland, Germany, Austria and Denmark. We offer our services in the followingstrategic business fields:
Trivadis Services takes over the interacting operation of your IT systems.
O P E R A T I O N
COPENHAGEN
MUNICH
LAUSANNEBERN
ZURICHBRUGG
GENEVA
HAMBURG
DÜSSELDORF
FRANKFURT
STUTTGART
FREIBURG
BASEL
VIENNA
With over 600 specialists and IT experts in your region.
Big Data Infrastructure - The Oracle Way4 17.11.2016
14 Trivadis branches and more than600 employees
200 Service Level Agreements
Over 4,000 training participants
Research and development budget:CHF 5.0 million
Financially self-supporting andsustainably profitable
Experience from more than 1,900 projects per year at over 800customers
Agenda
Big Data Infrastructure - The Oracle Way5 17.11.2016
1. Introduction
2. Oracle Big Data Infrastructure
3. Oracle Big Data Software
4. BDA Setup
5. Use Case
6. Summary
Big Data Infrastructure - The Oracle Way6 17.11.2016
Introduction
Big Data Infrastructure - The Oracle Way7 17.11.2016
2006 2008 2011 2012
Oracle Makes Big Data Appliance Move With Cloudera
Oracle Rolls Out 'Big Data Appliance'
Foundation of Cloudera
Hadoop is born from Apache Nutch 197
2006 2008 2011 2012
Big Data Infrastructure - The Oracle Way8 17.11.2016
About the Current State of Big Data Technology
"Cloudera is eight; Apache Hadoop is ten. Big data has gone from zero to how-did-that-happen huge. The bestiary is bigger than ever,
too: new projects like Apache Kudu, Apache Impala (incubating), Apache Kafka and Apache Spark define the future of big data and
analytics, extending the core Hadoop platform to handle streaming, real-time and advanced analytics."
Mike Olson, Cloudera CSO and Co-Founder, Aug. 25, 2016
Data Lakes and Reservoirs
Big Data Infrastructure - The Oracle Way9 17.11.2016
Since the data doesn’t just sit there until it evaporates but eventually flows to various applications, we should think of this as a “data reservoir” rather than a “data lake.”http://blogs.informatica.com
Data Reservoir Functions
Big Data Infrastructure - The Oracle Way10 17.11.2016
Source: Architecting Data Lakes, 2016 O’Reilly Media, Inc.
Ingestion Storage/Retention Processing Access
Oracle Big Data Management System Architecture
Big Data Infrastructure - The Oracle Way11 17.11.2016
Schema-on-readRaw data
Complex processingHuge volume –
at low cost
Schema-on-writeCleansed data
Complex integrationLarge volume –
at moderate cost
Oracle's Big Data Solution
A complete and optimized solution for big data
Tight integration with Exadata, Exalogic, Exalytics and SPARC Supercluster using Infiniband network
Single-vendor support for both hardware and software
Big Data Infrastructure - The Oracle Way17.11.201612
Big Data Infrastructure - The Oracle Way13 17.11.2016
Oracle Big Data Infrastructure
The Big Data Appliance X6-2 Hardware
Big Data Infrastructure - The Oracle Way14 17.11.2016
Per Node (X6-2):
2 x 22-Core (2.2GHz) Intel ® Xeon ® E5-2699 v4
8 x 32GB DDR4-2400 Memory (max. 768GB)
12 x 8TB 7,200 RPM High Capacity SAS Drives
2 x QDR 40Gb/sec InfiniBand Ports
4 x 10 Gb Ethernet Ports, 1 x ILOM Ethernet Port
RAM to CPU Ratio:
ODA X6-2M: 38 GB per Core
MiniCluster S7-2: 32 GB per Core
BDA: 17.5 GB per Core*
Starter Rack: 6 x nodes
Full Rack: 18 x nodes
Up to 18 racks
* Cloudera recommendation for "Compute Intensive Workloads": 16 GB per core
Big Data Appliance Network Connectivity
Big Data Infrastructure - The Oracle Way15 17.11.2016
Source: Oracle Big Data Appliance: Datacenter Network Integration, Oracle White Paper, 2012
Oracle Big Data Appliance Software Stack (Release 4.6.0)
Big Data Infrastructure - The Oracle Way16 17.11.2016
Oracle Linux, Oracle Java JDK
MySQL Database Enterprise Server -Advanced Edition
Oracle SQL Connector for HDFS
Oracle XQuery for Hadoop
Oracle R Advanced Analytics for Hadoop
Oracle NoSQL Database (key-value)Community Edition (CE)
Enterprise Manager Plug-In
Cloudera Enterprise Data Hub Edition– Apache Hadoop (CDH)– Cloudera Impala– Cloudera Search (Apache Solr)– Apache HBase and Apache
Accumulo– Apache Spark– Apache Kafka– Cloudera Manager– Cloudera Navigator– Cloudera Backup and Disaster
Recovery (BDR)
Facilitate access to data stored in an Apache Hadoop cluster.
Available on either Oracle Big Data Appliance or a Hadoop cluster running on commodity hardware
– Oracle SQL Connector for HDFS
– Oracle Loader for Hadoop
– Oracle XQuery for Hadoop
– Oracle R Advanced Analytics for Hadoop
– Oracle Data Integrator
– Oracle DataSource for Hadoop (OD4H)
Note: The connectors are licensed separately from Oracle Big Data Appliance
Oracle Big Data Connectors
Source: Oracle ®
17.11.2016 Big Data Infrastructure - The Oracle Way17
Security for Data at Rest and Data in Motion
Big Data Infrastructure - The Oracle Way18 17.11.2016
Authentication through Kerberos
Authorization through Apache Sentry
Auditing through Oracle Audit Vault
Encryption for Data-at-Rest
Network Encryption
Big Data SQL adds
– Advanced Security on Hadoop & NoSQL: Masking and Redaction
– Virtual Private Database: Fine-grain Access Control
Administration with EM Cloud Control
Big Data Infrastructure - The Oracle Way19 17.11.2016
Plug-In for EM Cloud Control 12.1.0.4 and later
– Discover the components of a Big Data Appliance Network as managed targets
– Manage the HW and SW components
– Collect metrics to analyze the performance of the network and each BDS component
– Trigger alerts based on availability and system health
– Respond to warnings and incidents
Always (!) check My Oracle Support Doc ID 1570523.1, "Enterprise Manager for Oracle Big Data Appliance Frequently Asked Questions"
Big Data Infrastructure - The Oracle Way20 17.11.2016
Oracle Big Data Software
Oracle Big Data Software
Big Data Infrastructure - The Oracle Way21 17.11.2016
Oracle Big Data SQL
Oracle Big Data Discovery
Oracle Data Integrator for Big Data
Oracle GoldenGate for Big Data
Oracle Big Data SQL
Big Data Infrastructure - The Oracle Way22 17.11.2016
Query Data in RDBMS, Hadoop and NoSQL
Same query - but there are intelligent optimizations that push the queries down to the source
Tables in Hadoop or NoSQL databases are defined as external tables in Oracle(leveraging Hive metastore to determine both parallelism and read semantics)
Applying query optimizations to the data(Storage Indexes, Local filtering and Caching)
Oracle DataSource for Hadoop (OD4H)
Oracle Big Data SQL (cont.)
Big Data Infrastructure - The Oracle Way23 17.11.2016
Oracle Big Data SQL extends SmartScan capabilities (such as filter-predicate off-loads) to Oracle external tables with the installation of the Big Data SQL processing agent on the DataNodes of the Hadoop cluster. This technology enables the Hadoop cluster to discard a huge portion of irrelevant data – up to 99 percent of the total – and return much smaller result sets to the Oracle Database server.
Oracle Big Data SQL 3.0 can connect Oracle Database to the Hadoop environment on Oracle Big Data Appliance, other systems based on CDH (Cloudera's Distribution including Apache Hadoop), HDP (Hortonworks Data Platform), and potentially other non-CDH Hadoop systems
Oracle Big Data Discovery
Big Data Infrastructure - The Oracle Way24 17.11.2016
The Visual Face of Big Data
Uses the power of Apache Spark to process massive amounts of information
Uses Oracle Big Data SQL to query the data in HDFS without moving it at all
Oracle Data Integrator (ODI) for Big Data
Big Data Infrastructure - The Oracle Way25 17.11.2016
ODI for Big Data is used to transform and enrich data within the big data reservoir
ODI for Big Data generates native code that is then run on the underlying Hadoop platform without requiring any additional agents
Enable users to build business and data mappings without having to learn HiveQL, Pig Latin and Map Reduce
ODI separates the design interface to build logic and the physical implemen-tation layer to run the code
Oracle GoldenGate for Big Data
Big Data Infrastructure - The Oracle Way26 17.11.2016
Data Delivery to Big Data Targets
– Less invasive compared to ETL-Processes
– Real-Time Data for StreamingAnalytics
Release 12.2 (Dec. 2015)
Native Java Replication
Pluggable Formatting Architecture
– JSON, AVRO, XML, Delimited Text
Native Kerberos Support
Kafka Targets
Big Data Infrastructure - The Oracle Way27 17.11.2016
Big Data Appliance Setup
Well, first you have to move the box ...
Big Data Infrastructure - The Oracle Way28 17.11.2016 Source: kerryosborne
Safety and Compliance Guide
Site Checklist
Setup
Step 1 = PreinstallChecksStep 2 = SetupPuppetStep 3 = PatchFactoryImageStep 4 = CopyLicenseFilesStep 5 = CopySoftwareSourceStep 6 = CreateUsersStep 7 = SetupMountPointsStep 8 = SetupMySQL
cd /opt/oracle/BDAMaamothmammoth –s 1 cdh
Step 9 = InstallHadoopStep 10 = StartHadoopServicesStep 11 = InstallBDASoftwareStep 12 = SetupKerberosStep 13 = HDFSTransparentEncryptionStep 14 = SetupEMAgentStep 15 = SetupASRStep 16 = CleanupInstallStep 17 = CleanupSSHroot (Optional)
Mammoth is the utility that deploys software on Oracle's Big Data Appliance
Big Data Infrastructure - The Oracle Way17.11.201629
Install Big Data Discovery
At one node only
Takes a couple of minutes as RAID 6 is built locally
Some hints ...
– Cannot connect to mysql database => edit temporary password file
– Needs email adress during setup dialog
– Installation shows “finished successfully” ... but was not
bdacli enable bdd
Big Data Infrastructure - The Oracle Way17.11.201630
Patching
Patching means: Software to raise a pre-existing software release number
– E.g. CDH 5.5.1 to CDH 5.5.2
Example: Re-Image to 4.2.0 with Patch 22118555 (3.6G)
– JSON specs must exist at server to be reimaged
– Re-Imaging writes image to internal usb and boots from usb
BDA Configurator v4.4.0-1
BDA Patch 4.4.0
– P22537238_440_Linux-x86-64_1of3.zip Mammoth
– P22537238_440_Linux-x86-64_2of3.zip BDABaseImage-ol6-4.4.0_RELEASE.iso
– P22537238_440_Linux-x86-64_3of3.zip BDAExtras-ol6-4.4.0
Big Data Infrastructure - The Oracle Way17.11.201631
From our experience ...
A Big Data Appliance Admin needs a broad skill set
– Unix admin skills are mandatory (ssh, X-server, scp, networking, ...)
– Oracle Engineered System expertise helps a lot (Exadata, ODA, Infiniband, ...)
– Cloudera administration skills are usefull
Setup and patching:
– Always check for known issues on My Oracle Support (see references for Doc IDs)
– Check logfiles after every step
Pay attention to Infiniband Firmware Release on IB Switches when connecting Exadata and BDA (require exact same version)
Big Data Infrastructure - The Oracle Way17.11.201632
Big Data Infrastructure - The Oracle Way33 17.11.2016
Use Case
Use Case "Fraud Detection"
Big Data Infrastructure - The Oracle Way34 17.11.2016
Company:
Business Case: Fraud Detection
Motivation Statement:"Mit der BDA wollen wir unsere Analysen zur Betrugserkennung um zusätzlichen Dimensionen verfeinern. Die BDA erfasst zum Beispiel auch Sport-Performance-Kennzahlen wie die Laufleistung der einzelnen Spieler, die sie dann mit seiner durchschnittlichen Laufleistung vergleichen können. Krass untypische Leistungswerte können ein Hinweis auf vorab getroffene Absprachen sein, dem wir dann nachgehen."
Reference: Oracle Open World 2016
Reference: Computer World
Use Case Solution
Big Data Infrastructure - The Oracle Way35 17.11.2016
Big Data Appliance as "Data Reservoir"
Key arguments from customer perspective
– "Die Exadata und die BDA im Tandem bieten uns Integrationsvorteile, die wir mit Konkurrenzsystemen nicht so einfach erzielen können." *
– Fast start to "Big Data"
– Comprehensive software stack for data analytics
– Start small, grow on demand
– Ready for future (yet unknown) demands
*Reference: Computer World
Big Data Infrastructure - The Oracle Way36 17.11.2016
Summary
Summary
Big Data Infrastructure - The Oracle Way37 17.11.2016
The main technical advantage when deploying Big Data SQL on the Oracle Big Data Appliance is InfiniBand’s high bandwidth to other Oracle Engineered Systems
Other BDA exclusive features: Perfect Balance for reduce tasks
The Big Data Appliance provides a solid enterprise-class infrastructure (HW & SW)
Installation and patching procedures are not yet as mature on the BDA as on other engineered systems like Exadata
Leveraging the full potential of a BDA requires both Engineered System expertise and Data Analytics knowhow
Is the Big Data Appliance the Rigth Choice for You?
Big Data Infrastructure - The Oracle Way38 17.11.2016
Yes, if ....you need a fast start to production ready data analytics
you already run other Oracle engineered systems with Infiniband technology
your use case involves data in RDBMS, Haddop and NoSQL and you have high query performance demands
you have an important business case with unpredictable grow J
you like to stay with cloudera
Questions and responses…Daniel SteigerPrincipal Consultant
17.11.2016 Big Data Infrastructure - The Oracle Way39
daniel.steiger@trivadis.com
Trivadis @ DOAG 2016
Booth: 3rd Floor – next to the escalatorKnow how, T-Shirts, Contest and Trivadis Power to goWe look forward to your visitBecause with Trivadis you always win !
17.11.2016 Big Data Infrastructure - The Oracle Way40
Big Data Infrastructure - The Oracle Way41 17.11.2016
Links & References
Links and References (1)
Big Data Infrastructure - The Oracle Way42 17.11.2016
An Enterprise Architect’s Guide to Big Data – Reference Architecture Overview
– http://www.oracle.com/technetwork/topics/entarch/oracle-wp-big-data-refarch-2019930.pdf
Oracle Big Data Management System – Statement of Direction
– http://www.oracle.com/ocom/groups/public/@otn/documents/webcontent/2516729.pdf
Oracle Big Data Appliance Documentation
– https://docs.oracle.com/bigdata/bda46/
Oracle Big Data Lite Virtual Machine
– http://www.oracle.com/technetwork/database/bigdata-appliance/oracle-bigdatalite-2104726.html#wp
Links and References (2)
Big Data Infrastructure - The Oracle Way43 17.11.2016
Information Center: Oracle Big Data Appliance (My Oracle Support Doc ID 1445762.2)
http://blog.cloudera.com/blog/2013/08/how-to-select-the-right-hardware-for-your-new-hadoop-cluster/
Oracle Big Data SQL: One Fast Query, All Your Data
– https://blogs.oracle.com/datawarehousing/entry/oracle_big_data_sql_one
Links and References (3)
Owner’s Guide Owner's Guide Release 4 (4.4) E65664-03 January 2016
Oracle Big Data Appliance Patch Set Master Note, Doc ID 1485745.1
Information Center: Install/Upgrade/Configure Oracle BDA, Doc ID 1445745.2
Oracle BDA Base Image Version 4.2.0 for New Installations on OL6, Doc ID 2077858.1 (Base for BMR to finally reach 4.4.0)
Oracle Big Data Appliance Installation Frequently Asked Questions, Doc ID 1518939.1
Upgrading CDH, Doc ID 2109175.1
How to Enable/Disable Oracle Big Data Discovery on Oracle Big Data Appliance V4.3/OL6 with bdacli, Doc ID 2083079.1 (is also (not) valid for 4.4)
"bdacli enable bdd" Fails with "ERROR: Error getting mysql database status" on BDA 4.4.0 / BDD 1.1, Doc ID 2109175.1
Big Data Infrastructure - The Oracle Way17.11.201644
Big Data Infrastructure - The Oracle Way45 17.11.2016
Backup Slides
Oracle Big Data SQL Licensing
Big Data Infrastructure - The Oracle Way46 17.11.2016
All nodes within the Hadoop cluster that runs Oracle Big Data SQL must be licensed.
A separate license must be procured per disk per Hadoop cluster.
All disks within every node that is part of a cluster running Oracle Big Data SQL must be licensed.
Partial licensing within a node is not available. All nodes in the cluster are included.
Only the Hadoop cluster side (Oracle Big Data Appliance, or other) of an Oracle Big Data SQL installation is licensed and no additional license is required for the database server side.
BDA Prize List
Big Data Infrastructure - The Oracle Way47 17.11.2016
Big Data in the Cloud – Offering & Prizing
Big Data Infrastructure - The Oracle Way48 17.11.2016
Reference: https://cloud.oracle.com
BDA Specific Software Features
Big Data Infrastructure - The Oracle Way49 17.11.2016
Oracle NoSQL Database
– Oracle NoSQL Database is a distributed key-value database built on storage technology of Berkeley DB Java Edition.
– An intelligent driver on top of Berkeley DB keeps track of the underlying storage topology, shards the data and knows where data can be placed with the lowest latency
. Oracle R Support for Big Data
– The standard R distribution is installed on all nodes of Oracle Big Data Appliance
– Oracle R Connector for Hadoop provides R users with high-performance, native access to HDFS and the MapReduce programming framework
– Oracle R Enterprise is a separate package that provides real-time access to Oracle Database.
Big Data Preparation (Cloud Service)
Big Data Infrastructure - The Oracle Way50 17.11.2016
Self-service data preparation for domain experts
Ingest, prepare, enrich, and publish data with a unified cloud-based data wrangling solution
Unique combination of Natural Language Processing (NLP) with Machine Learning (ML)
Leverage Linked Open Data graph of domain knowledge
Powered by Apache Spark
See https://cloud.oracle.com/en_US/big-data-preparation
top related