biwa2015 - end-to-end hadoop development using odi, obiee and oracle big data
TRANSCRIPT
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
End-to-End Hadoop Development using OBIEE, ODI and Oracle Big Data Mark Rittman, CTO, Rittman Mead BIWA Summit, Jan 2015, San Francisco
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
About the Speaker
•Mark Rittman, Co-Founder of Rittman Mead •Oracle ACE Director, specialising in Oracle BI&DW •14 Years Experience with Oracle Technology •Regular columnist for Oracle Magazine •Author of two Oracle Press Oracle BI books •Oracle Business Intelligence Developers Guide •Oracle Exalytics Revealed •Writer for Rittman Mead Blog :http://www.rittmanmead.com/blog
•Email : [email protected] •Twitter : @markrittman
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
About Rittman Mead
•Oracle BI and DW Gold partner •Winner of five UKOUG Partner of the Year awards in 2013 - including BI •World leading specialist partner for technical excellence, solutions delivery and innovation in Oracle BI
•Approximately 80 consultants worldwide •All expert in Oracle BI and DW •Offices in US (Atlanta), Europe, Australia and India •Skills in broad range of supporting Oracle tools: ‣OBIEE, OBIA ‣ODIEE ‣Essbase, Oracle OLAP ‣GoldenGate ‣Endeca
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
The Oracle Reference DW Architecture c. 2013
•Centred on the Oracle RDBMS and structured data •All incoming data is modelled at load point, schemas assigned, stored in RDBMS layers •BI metadata layer and ability to federate at BI query stage •Data storage capacity limited by RDBMS
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
and now …this happened
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
Today’s Oracle Information Management ArchitectureActionable
Events
Event Engine Data Reservoir
Data Factory Enterprise Information Store
Reporting
Discovery Lab
Actionable Information
ActionableInsights
Input Events
Execution
Innovation
Discovery Output
Events & Data
Structured Enterprise Data
Other Data
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
Today’s Layered Data Warehouse Architecture
Virtu
aliz
atio
n &
Q
uery
Fed
erat
ion
Enterprise Performance Management
Pre-built & Ad-hoc BI Assets
Information Services
Data Ingestion
Information Interpretation
Access & Performance Layer
Foundation Data Layer
Raw Data Reservoir
Data Science
Data Engines & Poly-structured sources
Content
Docs Web & Social Media
SMS
Structured Data Sources
•Operational Data •COTS Data •Master & Ref. Data •Streaming & BAM
Immutable raw data reservoir Raw data at rest is not interpreted
Immutable modelled data. Business Process Neutral form. Abstracted from business process changes
Past, current and future interpretation of enterprise data. Structured to support agile access & navigation
Discovery Lab Sandboxes Rapid Development Sandboxes
Project based data stores to support specific discovery objectives
Project based data stored to facilitate rapid content / presentation delivery
Data Sources
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
The Oracle Data Warehousing Platform - 2014
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
Combining Oracle RDBMS with Hadoop + NoSQL
•High-value, high-density data goes into Oracle RDBMS •Better support for fast queries, summaries, referential integrity etc
•Lower-value, lower-density data goes into Hadoop + NoSQL ‣Also provides flexible schema, more agile development
•Successful next-generation BI+DW projects combine both - neither on their own is sufficient
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
Oracle’s Big Data Products
•Oracle Big Data Appliance ‣Optimized hardware for Hadoop processing ‣Cloudera Distribution incl. Hadoop ‣Oracle Big Data Connectors, ODI etc
•Oracle Big Data Connectors •Oracle Big Data SQL •Oracle NoSQL Database •Oracle Data Integrator •Oracle R Distribution •OBIEE, BI Publisher and Endeca Info Discovery
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
Oracle Big Data SQL
•Part of Oracle Big Data 4.0 (BDA-only) ‣Also requires Oracle Database 12c, Oracle Exadata Database Machine
•Extends Oracle Data Dictionary to cover Hive •Extends Oracle SQL and SmartScan to Hadoop •Extends Oracle Security Model over Hadoop ‣Fine-grained access control ‣Data redaction, data masking ‣Uses fast c-based readers where possible(vs. Hive MapReduce generation) ‣Map Hadoop parallelism to Oracle PQ ‣Big Data SQL engine works on top of YARN ‣Like Spark, Tez, MR2
Exadata Storage Servers
HadoopCluster
Exadata DatabaseServer
Oracle Big Data SQL
SQL Queries
SmartScan SmartScan
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
Productising the Next-Generation IM Architecture
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
Still a Key Role for Data Integration, and BI Tools•Fast, scaleable low-cost / flexible-schema data capture using Hadoop + NoSQL (BDA) •Long-term storage of the most important downstream data - Oracle RBDMS (Exadata) •Fast analysis + business-friendly interface : OBIEE, Endeca (Exalytics), RTD etc
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
OBIEE for Enterprise Analysis Across all Data Sources
•Dashboards, analyses, OLAP analytics, scorecards, published reporting, mobile
•Presented as an integrated business semantic model •Optional mid-tier query acceleration using Oracle Exalytics In-Memory Machine
•Access data from RBDMS, applications, Hadoop, OLAP, ADF BCs etc
Enterprise SemanticBusiness Model
Business PresentationLayer (Reports, Dashboards)
In-Memory Caching Layer
ApplicationSources
Hadoop /NoSQL Sources
DW / OLAP Sources
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
Bringing it All Together : Oracle Data Integrator 12c
•ODI provides an excellent framework for running Hadoop ETL jobs ‣ELT approach pushes transformations down to Hadoop - leveraging power of cluster
•Hive, HBase, Sqoop and OLH/ODCH KMs provide native Hadoop loading / transformation ‣Whilst still preserving RDBMS push-down ‣Extensible to cover Pig, Spark etc
•Process orchestration •Data quality / error handling •Metadata and model-driven
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
Typical RM Project BDA Topology
•Starter BDA rack, or full rack •Kerberos-secured usingincluded KDC server
• Integration with corporate LDAPfor Cloudera Manager, Hue etc
•Developer access through Hue,Beeline, R Studio
•End-user access throughOBIEE, Endeca and other tools ‣With final datasets usuallyexported to Exadata or Exalytics
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
Working with Oracle Big Data Appliance
•Don’t underestimate the value of “pre-integrated” - massive time-saver for client ‣No need to integrate Big Data Connectors, ODI Agent etc with HDFS, Hive etc etc
•Single support route - raise SR with Oracle, they will route to Cloudera if needed •Single patch process for whole cluster - OS, CDH etc etc •Full access to Cloudera Enterprise features •Otherwise … just another CDH cluster in terms of SSH access etc •We like it ;-)
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
Oracle Big Data Connectors
•Oracle-licensed utilities to connect Hadoop to Oracle RBDMS ‣Bulk-extract data from Hadoop to Oracle, or expose HDFS / Hive data as external tables ‣Run R analysis and processing on Hadoop ‣Leverage Hadoop compute resources to offload ETL and other work from Oracle RBDMS ‣Enable Oracle SQL to access and load Hadoop data
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
Working with the Oracle Big Data Connectors
•Oracle Loader for Hadoop, Oracle SQL Connector for HDFS - rarely used ‣Sqoop works both way (Oracle>Hadoop, Hadoop>Oracle) and is “good enough” ‣OSCH replaced by Oracle Big Data SQL for direct Oracle>Hive access
•Oracle R Advanced Analytics for Hadoop has been very useful though ‣Run MapReduce jobs from R ‣Run R functions across Hive tables
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
Oracle R Advanced Analytics for Hadoop Key Features
•Run R functions on Hive Dataframes •Write MapReduce functions in R
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
Data Loading into Hadoop
•Default load type is real-time, streaming loads, using Apache Flume, Apache Kafka etc ‣Batch / bulk loads only typically used to seed system
•Variety of sources including web log activity, event streams •Target is typically HDFS (Hive) or HBase •Data typically lands in “raw state” ‣Lots of files and events, need to be filtered/aggregated ‣Typically semi-structured (JSON, logs etc) ‣High volume, high velocity
-Which is why we use Hadoop rather thanRBDMS (speed vs. ACID trade-off)
‣Economics of Hadoop means its often possible toarchive all incoming data at detail level
Loading Stage
Real-Time Logs / Events
File / UnstructuredImports
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
GoldenGate for Continuous Streaming to Hadoop
•Oracle GoldenGate is also an option, for streaming RDBMS transactions to Hadoop •Leverages GoldenGate & HDFS / Hive Java APIs •Sample Implementations on MOS Doc.ID 1586210.1 (HDFS) and 1586188.1 (Hive) •Likely to be formal part of GoldenGate in future release - but usable now •Can also integrate with Flume for delivery to HDFS - see MOS Doc.ID 1926867.1
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
ODI on Hadoop - Big Data Projects Discover ETL Tools
•ODI provides an excellent framework for running Hadoop ETL jobs ‣ELT approach pushes transformations down to Hadoop - leveraging power of cluster
•Hive, HBase, Sqoop and OLH/ODCH KMs provide native Hadoop loading / transformation ‣Whilst still preserving RDBMS push-down ‣Extensible to cover Pig, Spark etc
•Process orchestration •Data quality / error handling •Metadata and model-driven
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
ODI on Hadoop - How Well Does It Work?
•Very good for set-based processing of Hadoop data (HiveQL) ‣Can run python, R etc scripts as procedures
•Brings metadata and team-based ETL development to Hadoop •Process orchestration, error-handling etc •Rapid innovation from the ODI Product Dev team - Spark KMs etc coming soon •But requires Hadoop devs to learn ODI, or add ODI developer to the project
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
Oracle Big Data SQL and ODI12c
•Hive, and MapReduce, are well suited to batch-type ETL jobs, but … •Not all join types are available in Hive - joins must be equality joins •Any data from external Oracle RDBMS sources has to be staged in Hadoop before joining •Limited set of HiveQL functions vs. Oracle SQL •Oracle-based mappings have to importHive data into DB before accessing it
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
Adding Big Data SQL to ODI Hadoop Mappings
•Makes it easy to map Hadoop (Hive) data into Oracle-based mappings •Gives Hive mappings the ability to use Oracle SQL including range of joins
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
Oracle Big Data SQL and Data Integration
•Gives us the ability to easily bring in Hadoop (Hive) data into Oracle-based mappings •Allows us to create Hive-based mappings that use Oracle SQL for transforms, joins •Faster access to Hive data for real-time ETL scenarios •Through Hive, bring NoSQL and semi-structured data access to Oracle ETL projects •For our scenario - join weblog + customer data in Oracle RDBMS, no need to stage in Hive
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
Custom KM : LKM Hive to Oracle (Big Data SQL)
•ODI12c Big Data SQL example on BigDataLite VM uses a custom KM for Big Data SQL ‣LKM Hive to Oracle (Big Data SQL) - KM code downloadable from java.net ‣Allows Hive+Oracle joins by auto-creating ORACLE_HIVE extttab definition to enable Big Data SQL Hive table access
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
Options for Sharing Hadoop Output with Wider Audience
•During the discovery phase of a Hadoop project, audience are likely technical ‣Most comfortable with data analyst tools, command-line, low-level access to the data
•During the exploitation phase, audience will be less technical ‣Emphasis on graphical tools, and integration with wider reporting toolset + metadata
•Three main options for visualising and sharing Hadoop data 1.Coming Soon - Oracle Big Data Discovery (Endeca on Hadoop) 2.OBIEE reporting against Hadoop
direct using Hive/Impala, or Oracle Big Data SQL 3.OBIEE reporting against an export of the
Hadoop data, on Exalytics / RDBMS
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
Coming Soon - Oracle Big Data Discovery
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
Coming Soon : Oracle Data Enrichment Cloud Service
•Cloud-based service for loading, enriching, cleansing and supplementing Hadoop data •Part of the Oracle Data Integration product family •Used up-stream from Big Data Discovery •Aims to solve the “data quality problem” for Hadoop
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
New in OBIEE 11.1.1.7 : Hadoop Connectivity through Hive
•MapReduce jobs are typically written in Java, but Hive can make this simpler •Hive is a query environment over Hadoop/MapReduce to support SQL-like queries •Hive server accepts HiveQL queries via HiveODBC or HiveJDBC, automaticallycreates MapReduce jobs against data previously loaded into the Hive HDFS tables
•Approach used by ODI and OBIEE to gain access to Hadoop data •Allows Hadoop data to be accessed just like any other data source
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
OBIEE 11.1.1.7 / HiveServer2 ODBC Driver Issue
•Most customers using BDAs are using CDH4 or CDH5 - which uses HiveServer2 •OBIEE 11.1.1.7 only ships/supports HiveServer1 ODBC drivers •But … OBIEE 11.1.1.7 on Windows can use the Cloudera HiveServer2 ODBC drivers ‣which isn’t supported by Oracle ‣but works!
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
Using Big Data SQL with OBIEE11g
•Preferred Hadoop access path for customers with Oracle Big Data Appliance is Big Data SQL •Oracle SQL Access to both relational, and Hive/NoSQL data sources •Exadata-type SmartScan against Hadoop datasets
•Response-time equivalent to Impala or Hive on Tez •No issues around HiveQL limitations • Insulates end-users around differencesbetween Oracle and Hive datasets
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
Finally … What Keeps the CIO Awake at Night
•Security and Privacy Regulations ‣Are we analysing and sharing data in compliance with privacy regulations?
-And if we are - would customers think our use of it is ethical? ‣Do I know if the data in my Hadoop cluster is *really* secure?
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
Hadoop Security “By Default”
•Connections between Hadoop services, and by users to services, aren’t authenticated •Security is fragmented : HDFS, Hive, OS user accounts, Hue, CM all separate models •No single place to define security policies, groups, access rights •No single tool to audit access and permissions •By default, everything is open and trusted - reflects roots in academia, R&D, marketing depts
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
Oracle Big Data SQL : Single RBDMS/Hadoop Security Model
•Potential to extend Oracle security model over Hadoop (Hive) data ‣Masking / Redaction ‣VPD ‣FGAC
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
Summary
•Hadoop and Oracle Big Data Appliance are increasingly appearing in BI+DW Projects •Gives DW projects the ability to store more data, cheaper and more flexibly than before •Enables non-relational (SQL) query tools and analysis techniques (R, Spark etc) •Extends BI’s capability to report and analyze across wider data sources •Maturity varies widely in terms of tool maturity, and Oracle integration with Hadoop •Trend is for Oracle to “productize” big data, creating tools + products around Oracle BDA •We are probably at early stages - but very interesting times to be an Oracle BI+DW dev!
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
Thank You for Attending!
•Thank you for attending this presentation, and more information can be found at http://www.rittmanmead.com
•Contact us at [email protected] or [email protected] •Look out for our book, “Oracle Business Intelligence Developers Guide” out now! •Follow-us on Twitter (@rittmanmead) or Facebook (facebook.com/rittmanmead)
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : [email protected] W : www.rittmanmead.com
End-to-End Hadoop Development using OBIEE, ODI and Oracle Big Data Mark Rittman, CTO, Rittman Mead BIWA Summit, Jan 2015, San Francisco