oracle big data governance webcast charts

28
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. Oracle Data Integration and Governance For Big Data Jeff Pollock Vice President, Oracle Data Integration & Governance Madhu Raviendran Nair Marketing Director, Oracle Data Integration & Governance Data Governance for the Big Data Reservoir

Upload: jeffrey-t-pollock

Post on 07-Jul-2015

554 views

Category:

Software


8 download

DESCRIPTION

Data governance for hadoop and big data

TRANSCRIPT

Page 1: Oracle Big Data Governance Webcast Charts

Copyright © 2014 Oracle and/or its affiliates. All rights reserved.

Oracle Data Integration and Governance For Big Data

Jeff PollockVice President, Oracle Data Integration & Governance

Madhu Raviendran NairMarketing Director, Oracle Data Integration & Governance

Data Governance for the Big Data Reservoir

Page 2: Oracle Big Data Governance Webcast Charts

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Get Fast Answers to New Questions

Create a Data Reservoir

Predict More,More Accurately

AccelerateData-Driven Action

Big Data Reservoir Drives Big ResultsBusiness Drivers for Big Data Initiatives

Oracle Big Data Governance 2

Page 3: Oracle Big Data Governance Webcast Charts

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Oracle For Big Data ReservoirOracle Data Integration Provides the Architectural Components

Oracle Big Data Governance 3

Staging Detail

Fast load

Fast load

Data Replication

Data Synchronization

Hadoop Data Transformation

HiveQL – Pig/Oozie - Spark

Sources

Data Reservoir

Sources

Oracle Data IntegratorOracle Data IntegratorGG to Flume

GG to Kafka

GG to Hive

Oracle GoldenGate

Page 4: Oracle Big Data Governance Webcast Charts

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

But What About Data Governance?

Oracle Big Data Governance 4

https://blogs.oracle.com/bigdata/entry/big_data_and_analytic_top

Page 5: Oracle Big Data Governance Webcast Charts

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

…to manage Risk/Compliance

Records retention

Rediscovery

Litigation support

Data access management

Information security and protection

Minimize corporate liability through proper governance of data

…to drive Business Value

Metadata discovery

Metadata & glossary cataloging

Data profiling

Data cleansing lifecycle

Data remediation

Maximize opportunity by ensuring trusted data is easily available for data driven business processes

5

The Data Governance Opportunity with Big Data

Oracle Big Data Governance

Solving business and IT data challenges

Page 6: Oracle Big Data Governance Webcast Charts

Copyright © 2014 Oracle and/or its affiliates. All rights reserved.

Big Data Governance MythsDo the same principles apply for Big Data and Traditional Data Governance?

Oracle Big Data Governance 6

Perception

1. Data Governance has reduced significance in Big Data

2. Data Reservoirs should always contain only raw data in full fidelity

3. Big Data and Hadoop architectures are black boxes

Reality

1. Big Data without governance and quality is just Big Bad Data

2. Data Reservoirs contains all data. Raw, formatted and enriched.

3. If you use the data (you will!), you need to govern it’s lifecycle.

Page 7: Oracle Big Data Governance Webcast Charts

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Data Governance and the Data Reservoir

Oracle Big Data Governance

Page 8: Oracle Big Data Governance Webcast Charts

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Big Data Governance 8

The Big Data Governance Problem

1 – How do we clean up the data lake?

2 – How do we keep the data reservoir clean?

Page 9: Oracle Big Data Governance Webcast Charts

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Data Governance is Not Easy, there is No Silver Bullet!

Oracle Big Data Governance 9

Data Governance

Metadata Management

Business Glossary

Data Profiling

Data Cleansing

Data Archiving

Data Privacy

PEOPLE

PROCESS TECHNOLOGY

…people and process first, …tools and capabilities next, …and, there is no magic!

“…the overall impact of poor-quality data on the whole dataset remains the same. In addition, much of the data that organizations use in a big data context comes from outside, or is of unknown structure and origin. This means that the likelihood of data quality issues is even higher than before. So data quality is actually more important in the world of big data."

- Ted Friedman, Gartnerhttp://www.gartner.com/newsroom/id/2854917

Page 10: Oracle Big Data Governance Webcast Charts

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Oracle Data Governance for the Data Reservoir Right Now

Oracle Big Data Governance 10

Data Governance

Metadata Management

Business Glossary

Data Profiling

Data Cleansing

Data Archiving

Data Privacy

Oracle Enterprise Metadata Management

Oracle Enterprise Data Quality delivers a complete, best-of-breed and business friendly approach to data cleansing resulting in trustworthy data for applications and to improve business reliability.

• Metadata Management – horizontal and semantic data lineage for all big data sources

• Business Glossary – simple tools to catalog, link and collaborate on business terms

Oracle Enterprise Data Quality

Oracle Enterprise Data Quality delivers a complete, best-of-breed and business friendly approach to data cleansing resulting in trustworthy data for applications and to improve business reliability.

• Profiling – simple to use data health check that can work with sample sets of all data

• Cleansing – validate, match and de-duplicate data records from any business application

Oracle Big Data SQL

Extends Oracle SQL to Hadoop and NoSQL and the security of Oracle Database to all your data. It also includes a unique Smart Scan service that minimizes data movement and maximizes performance.

• Data Privacy – leverage the Oracle DB security model on data that physically resides in Hadoop

• Archiving – Seamlessly locate aged data in a queryable data tables physically located in Hadoop

Page 11: Oracle Big Data Governance Webcast Charts

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Oracle Enterprise Metadata Management (OEMM)

Oracle Big Data Governance 11

• Metadata Management – horizontal and semantic data lineage for all big data sources

• Business Glossary – simple tools to catalog, link and collaborate on business terms

Business Data Catalog

Report to Source Lineage

Impact Analysis

Audit, Versioning & Diff Reports

Social/Collaboration Features

Annotations and Tagging

Comprehensive Harvesting 3rd Party BI Metadata

3rd Party ETL Metadata

3rd Party DB Metadata

3rd Party Modeling Tools

Big Data Metadata

Metadata Standards

Page 12: Oracle Big Data Governance Webcast Charts

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Value of Enterprise Metadata Management

Oracle Big Data Governance 12

ETL

BIDashboards

App

ETL

ETL

How was sales figure calculated?

What will happen if I change this

table?

What reports use the mainframe

data? Sys Admin

Executive

BI Developer

Where did this data

come from?

Application User

Which reports use this

customer data?

CDC

Data Reservoir

Data Steward

Can I trust the sources of this

customer data?

ETL

Developer

Solves significant pain points for wide variety of business consumers and technical staff

I want to design an experiment to measure the

success of a signup page. What data do I have?

Data Scientist

GG

Page 13: Oracle Big Data Governance Webcast Charts

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Metadata Management Use Cases

Oracle Big Data Governance

My dashboard does not match

this report…why?

Where did this data

come from?Where can I find

the data I need for analytics?

Which ETL mappings or BI Reports will be

affected by my column change?

What systems does the data flow

through?

13

Page 14: Oracle Big Data Governance Webcast Charts

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Simple Screens for both Business and IT User Profiles

Oracle Big Data Governance 14

Comprehensive Data Lineage for IT

Simple to Navigate All Metadata

Business / IT Collaboration

Search Driven Business Access

Page 15: Oracle Big Data Governance Webcast Charts

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

15

Vertical Lineage Links a business friendly set of terms to the

IT metadata and operational assets Capture Business Glossary, Taxonomy,

Ontology, Conceptual Models

Horizontal Column Level Links the data fields from Business Intelligence

Dashboards or Reports back to the Source Columns Schemas, BI View Layers, ETL Transformations,

Calculations, etc.

Oracle Big Data Governance

Ve

rtic

al L

ine

age

Horizontal Lineage

“NE_SALES”

“SALES”

“NAME” “ACCT_NAME”

“NORTH”

“AGG_TOTAL”

BI Fields to Source Columns

“FNAME|LNAME”

“Customer”

Biz Terms to IT

Two Crucial Styles of Metadata Management

Page 16: Oracle Big Data Governance Webcast Charts

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

ActionableEvents

Event Engine Data Reservoir

Data Factory Enterprise Information Store

Reporting

Discovery Lab

ActionableInformation

ActionableInsights

DataStreams

Execution

Innovation

Discovery Output

Events & Data

Data Flow View – Data Factory and Metadata Management

StructuredEnterprise Data

OtherData

Oracle Big Data Governance 16

Metadata Management and Business Glossary

Page 17: Oracle Big Data Governance Webcast Charts

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Comprehensive Data Integration & Governance Capabilities

Oracle Big Data Governance 17

Dynamic Data Movement– Low impact capture, stage in Hadoop– Continuous data availability

Data Transformation– Bulk data movement– Pushdown data processing

Data Federation– Virtualized Data Services

Data Quality & Verification– Fix quality at the source– Verify data consistency

Metadata Management– Lineage and Impact Analysis– Business Glossary Semantics

Data GovernanceFoundation

Oracle Data Integrator(Transformation)

Enterprise Data Quality(Profile, Cleanse, Match and De-duplicate)

FastLoad

Oracle GoldenGate(Movement)

Enterprise Metadata Management & Business Glossary(Business Glossary, Data Lineage, Impact Analysis and Data Provenance)

Data Service Integrator(Federation)

GoldenGate Veridata(Online Data Verification)

ELT Processingon Hadoop or SQL

Continuous Availability

Page 18: Oracle Big Data Governance Webcast Charts

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Oracle Enterprise Data Quality

Oracle Big Data Governance 18

• Profiling – simple to use data health check that can work with sample sets of all data

• Cleansing – validate, match and de-duplicate data records from any business application

Profile

Standardize

Match

Govern

Un

ifie

d W

ork

ben

chMarket-leading businessusability for all types of data

Unparalleled time-to-value, rapid deployments

High performance engine operates in real-time or batch

Out-of-the-box global knowledge-base for world-wide coverage

Foundation for comprehensivedata governance program

Page 19: Oracle Big Data Governance Webcast Charts

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Big Data Governance Lifecycle Tools

Oracle Big Data Governance 19

Operational Data FlowsBusiness Sources

Quality KPIs Case Management

Governance Cockpit for Data Stewards & Stakeholders

Exception Review

Metadata Management

Business Glossary

Design Time

Page 20: Oracle Big Data Governance Webcast Charts

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Enterprise-Wide Governance Board

Top US Payroll ProviderOracle Enterprise Data Quality for

Governance on 100m records per month

20

Page 21: Oracle Big Data Governance Webcast Charts

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Privacy and Deep Data Access with Oracle Big Data SQL

Oracle Big Data Governance 21

SELECT w.sess_id, c.name

FROM web_logs w, customers c

WHERE w.source_country = ‘Brazil’

AND w.cust_id = c.customer_id;

Relevant SQL runs on BDA nodes

10’s of Gigabytes of Data

Only columns and rows needed toanswer query are returned

Hadoop Cluster

B B B

Big Data SQL

Oracle Database

CUSTOMERSWEB_LOGS

• Data Privacy – leverage the Oracle DB security model on data that physically resides in Hadoop

• Archiving – Seamlessly locate aged data in a queryabledata tables physically located in Hadoop

Page 22: Oracle Big Data Governance Webcast Charts

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Oracle Does Big Data Integration & Governance Better

22

Dynamic Data Movement

NoETLEngine

Most Heterogeneous

vs.Batch Data Movement & Weak CDC Tools

ETL Engine H/W Alongside Hadoop

Proprietary Vendor Lock-in, Incomplete Metadata

vs.

vs.

Oracle Big Data Governance

Oracle Data Integration Governance vs. “Other Guys”

Business FriendlyGovernance Tools

Wide & Current3rd Party Support

Comprehensive Platform

vs.Mix and Match of 6+ Legacy Tools

Inflexible Metadata Models & Frameworks

Incomplete Governance Features

vs.

vs.

Data Governance

Metadata Management

Business Glossary

Data Profiling

Data Cleansing

Data Archiving

Data Privacy

Page 23: Oracle Big Data Governance Webcast Charts

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Most Heterogeneous, Deep 3rd Party Support

Oracle Big Data Governance 23

Hadoop HBase Hadoop Hive/Flume HP Enscribe HP NonStop HP Neoview Hypersonic SQL IBM DB2 i Series IBM DB2 UDB IBM DB2 z Series IBM Informix IBM Netezza JMS / MQ Microsoft Access Microsoft SQLServer MySQL Pivotal Greenplum PostgreSQL Salesforce.com SAP BW / BI SAP ERP / ECC SAS SQL/MP SQL/MX Sybase ASE Sybase IQ Teradata

Adaptive Altova Apache Hcatalog Apache Hive/HQL Borland CA ERwin Cloudera Impala COBOL Copybook DataStax Embarcadero EMC ProActivity GentleWare Google BigQuery Grandite Hadapt Hive Hortonworks Hive IBM Cognos IBM DB2 IBM DataStage IBM Discovery IBM Federation Server IBM Lotus Notes IBM Netezza IBM Rational Rose IBM Rational Architect Informatica Metadata Mgr. Informatica PowerCenter

CoSORT ISO SQL Standard (DDL) MapR Hadoop Hive MicroFocus Microsoft Access Microsoft Office Excel Microsoft Visio Microsoft SQL Server Microsoft SSIS Microsoft Visual Studio Microstrategy Magic Draw OMG CWM Standard OMG UML Standard Oracle BI Answers Oracle BI Enterprise Edition Oracle BI Server Oracle DAC Oracle Data Integrator Oracle Data Modeler Oracle Database Oracle Designer Oracle Hyperion Applications Oracle Hyperion Essbase Oracle Warehouse Builder Pivotal Greenplum PostgreSQL

QlikView SAP BO Crystal Reports SAP BO Designer SAP BO Desktop Intelligence SAP BO Repository SAP BO Data Integrator SAP BO Data Steward SAP Master Data Management SAP Sybase PowerDesigner SAP Sybase ASE Database SAS Data Integration Studio SAS BI Server SAS Information Map SAS Metadata Management SAS OLAP Server Select Sparx Architect Syncsort Tableau Talend Teradata Tigris Visible W3C DTD & XSD Schema

Operational Integration (Movement / Transformation) Metadata Harvesting (Glossary, Lineage & Impact Analysis) Oracle Database Oracle Exadata Oracle Big Data Appliance Oracle TimesTen Oracle OLAP Oracle Business Intelligence Oracle BI Applications Oracle E-Business Suite Oracle JD Edwards Enterprise One Oracle JD Edwards World Oracle Fusion Applications Oracle Governance Risk and Compliance Oracle Fusion AIA Oracle Retail Applications Oracle Agile BI / DW Oracle Agile PLM for Process Oracle iFlex FlexCUBE Oracle iFlex Mantas Oracle Hyperion Applications Oracle PeopleSoft Oracle Siebel CRM / OnDemand Oracle Communications Oracle WebLogic Server Oracle Coherence Data Grid Oracle SOA Suite Oracle Enterprise Service Bus

+ open APIs and standards based meta-model

Page 24: Oracle Big Data Governance Webcast Charts

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

…to manage Risk/Compliance

Records retention

Rediscovery

Litigation support

Data access management

Information security and protection

Minimize corporate liability through proper governance of data

…to drive Business Value

Metadata discovery

Metadata & glossary cataloging

Data profiling

Data cleansing lifecycle

Data remediation

Maximize opportunity by ensuring trusted data is easily available for data driven business processes

24

The Data Governance Opportunity with Big Data

Oracle Big Data Governance

Solving business and IT data challenges

Page 25: Oracle Big Data Governance Webcast Charts

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Oracle Simplifies Big Data Integration & Governance

Comprehensive Big Data Integration and Data Governance Platform

Appliance w/Hadoop Cluster

Analytic Tools

DI Tools and Connectors

Heterogeneous & Best of Breed

Differentiated and powerful DI capabilities for Teradata, Netezza, Microsoft, DB2, Sybase..

Faster Time to Value

Flexible configurations

OOTB performance with DI

Unified Mgmt - EM Plug-ins for Appliance and DI Tools

Single Support Contact –Hardware/Software/Networking and ASR

Oracle Big Data Governance 25

Page 26: Oracle Big Data Governance Webcast Charts

Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |

Join the Community

#ODI12c #GoldenGate12c #OEDQ #OEMM

Connect with Oracle on Social Media

OR connect via the web

Oracle Data Integration blog

blogs.oracle.com/dataintegrationOracle Data Integration Home Page

oracle.com/goto/dataintegration

Oracle Big Data Governance 26

Page 27: Oracle Big Data Governance Webcast Charts
Page 28: Oracle Big Data Governance Webcast Charts

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Big Data Governance 28