swimming across the data lake, lessons learned and keys to success

27
© 2016 Impetus Technologies - Confidential 1 Swimming Across the Data Lake Lessons Learned and Keys to Success Impetus Technologies Inc.

Upload: dataworks-summithadoop-summit

Post on 16-Apr-2017

584 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Swimming Across the Data Lake, Lessons learned and keys to success

© 2016 Impetus Technologies - Confidential1

Swimming Across the Data Lake

Lessons Learned and Keys to Success

Impetus Technologies Inc.

Page 2: Swimming Across the Data Lake, Lessons learned and keys to success

© 2016 Impetus Technologies - Confidential2

Our 40 Minutes Today

• Critical Trends – Lessons Learned • Solving the Big Data “DILEMMA”

– Data Democratization– Enterprise Metadata Management– Data Access– Self Service BI

• Migrating Workloads– Lift and Shift Automation

Page 3: Swimming Across the Data Lake, Lessons learned and keys to success

© 2016 Impetus Technologies - Confidential3

Source Credit- http://www.cioinsight.com/it-strategy/big-data/slideshows/big-problems-for-big-data

Year 2013

Hadoop in Gartner’s

Hype Cycle

State of Play – Big Data

Year 2014

Page 4: Swimming Across the Data Lake, Lessons learned and keys to success

© 2016 Impetus Technologies - Confidential4

State of Play – Big Data - 2015

Year 2013

2015: Gartner moves Big Data

out of Hype Cycle - it is REAL

Hadoop

Page 5: Swimming Across the Data Lake, Lessons learned and keys to success

© 2016 Impetus Technologies - Confidential5

"Through 2018, 90% of Hadoop installations will be useless as they are overwhelmed with information assets captured for uncertain use cases““Enterprises today are realizing about 15% of potential

ROI on BI investments “

Page 6: Swimming Across the Data Lake, Lessons learned and keys to success

© 2016 Impetus Technologies - Confidential6

Blueprint for a Modern Data Architecture

Landing and ingestion

Structured

Unstructured

ExternalSocial

MachineGeospatial

Time Series

Streaming

Provisioning, Workflow, Monitoring and Security

Enterprise

Data Lake

Predictiveapplicatio

ns

Exploration & discovery

Enterpriseapplications

Real-Time applications

Traditionaldata

repositories

RDBMS MPP

Governance, Information Lifecycle, Enterprise Meta Data Management

Page 7: Swimming Across the Data Lake, Lessons learned and keys to success

© 2016 Impetus Technologies - Confidential7

Don’t throw your users into the Data Lake!

Creating a data lake is only the beginning…

Page 8: Swimming Across the Data Lake, Lessons learned and keys to success

© 2016 Impetus Technologies - Confidential8

The Data Lake “DILEMMA”

Data

• Ingestion and Storage

• Governance• Security &

Compliance

Information Lifecycle

Management

• Lineage

Enterprise Metadata

Management

• Meta data discovery

• Ontology

Access

• Query Performance

• Search data

D IL EMM A

Effective use of the Data Lake as a true enterprise data reservoir introduces new challenges. We call these the Data Lake “DILEMMA”. Addressing these will help avoid turning the lake into a “data swamp” and inhibit or slow enterprise adoption.

Page 9: Swimming Across the Data Lake, Lessons learned and keys to success

© 2016 Impetus Technologies - Confidential9

Critical Trends – Planning for Success

"Through 2018, 90% of deployed data lakes will be useless as they are overwhelmed with information assets captured for

uncertain use cases“

Page 10: Swimming Across the Data Lake, Lessons learned and keys to success

© 2016 Impetus Technologies - Confidential10

Critical Trends – Planning for Success

Making insights and data in the lake readily discoverable, accessible and usable

“Visual data-discovery, an important enabler of end user self-service, will grow 2.5 x faster than the rest of the market, becoming by 2018

a requirement for all enterprises.”

Page 11: Swimming Across the Data Lake, Lessons learned and keys to success

© 2016 Impetus Technologies - Confidential11

The Path to Democracy: Data Discovery

Don’t tell them what they need. Help your stakeholders find what works best for them

Page 12: Swimming Across the Data Lake, Lessons learned and keys to success

© 2016 Impetus Technologies - Confidential12

The Path to Democracy: Data Discovery

• Identification of unknown data• Consolidation of enterprise data dictionary• Metadata capture throughout the data lifecycle• Search tools to help users find what they need• Tools to browse/sample available data sets• Collaboration tools for users to share their data insights

Page 13: Swimming Across the Data Lake, Lessons learned and keys to success

© 2016 Impetus Technologies - Confidential13

The Path to Democracy: Data AccessibilityLower adoption barriers for

your stakeholders. Getting the data they want should be fast

and easy

Page 14: Swimming Across the Data Lake, Lessons learned and keys to success

© 2016 Impetus Technologies - Confidential14

The Path to Democracy: Data Accessibility

• Easy to use, access request mechanisms• Monitored access approval workflows• Business rules for decisioning automation• Fast data provisioning, integrated with approval workflow• Automated data provisioning mechanisms

Page 15: Swimming Across the Data Lake, Lessons learned and keys to success

© 2016 Impetus Technologies - Confidential15

The Path to Democracy: Data Usability

Get the most out of your data lake. Make it simple

to use.

Page 16: Swimming Across the Data Lake, Lessons learned and keys to success

© 2016 Impetus Technologies - Confidential16

The Path to Democracy: Data Usability

• Virtualize data views to hide storage platform complexity• Provide business user friendly façades to technical tools• Help users visualize data • Link data views to business entities• Help users access the data lake through familiar paradigms• Wrap analytics algorithms into easy to use tools

Page 17: Swimming Across the Data Lake, Lessons learned and keys to success

© 2016 Impetus Technologies - Confidential17

Metadata Repository

Metadata Management – Unified Single View

• Automatic schema ingestion from heterogeneous data sources

• Data dictionary and catalog entity tagging

• Search for anything on data catalog(e.g., free text search)

• Increment update and automatic metadata synchronization

• Define and manage data catalogue entity’s lifecycle

• Applying machine learning algorithms for attribute identification, mappings, and entity consolidation

• Domain experts analyse, approve and customize the results. Provide suggestions to model.

• Domain based dictionary for attributes matching, classification and building composite

Page 18: Swimming Across the Data Lake, Lessons learned and keys to success

© 2016 Impetus Technologies - Confidential18

Access Big Data

Specific Query &

Reporting

SQL

Cross Dimensional

Fast Slice Dice and Drill

Down

OLAP

Data from MPP,

Relational and Hadoop

Data Virtualizati

on

Finding the “Needle in a Haystack”

Search

“Don’t Know What You

Don’t Know”

Self Service

Data Discovery

Page 19: Swimming Across the Data Lake, Lessons learned and keys to success

© 2016 Impetus Technologies - Confidential19

“By 2017, most business users and analysts in organizations will have access to self-service tools to prepare data for

analysis”

“Managed BI Self-Service Will Continue to Close the Business and Technology Gap.”

Critical Trends – Planning for Success

Self Service BI over Data Lake

Page 20: Swimming Across the Data Lake, Lessons learned and keys to success

© 2016 Impetus Technologies - Confidential20

Steps to Effective Self Service BI

Provision Cluster

Discover and Blend New

Sources

Data Access and

Exploration

Ingest and Transform

data

Security and

Governance

BI, Analytics

and Models

Page 21: Swimming Across the Data Lake, Lessons learned and keys to success

© 2016 Impetus Technologies - Confidential21

Blueprint for a Modern Data Architecture

Landing and

Ingestion

StructuredUnstructure

d

ExternalSocial

MachineGeospatial

Time Series

Streaming

Provisioning, Workflow, Monitoring and Governance

Enterprise

Data Lake

Data Federation/Virtualizati

on

Exploration & Discovery

Data WranglingReal-time Applications

TraditionalData

Repositories

RDBMS MPP

Enterprise Meta Data Management

Acce

lera

tor

s

Page 22: Swimming Across the Data Lake, Lessons learned and keys to success

© 2016 Impetus Technologies - Confidential22

Why are customers Migrating Workloads to Hadoop ?

• Free up capacity to contain costs– Immediate ROI on Hadoop– Contain expenditure on relational warehouse

• Create a multi-platform data warehouse environment– One of the strongest tactic in data architecture today– Create an “Adjunct” to the relational warehouse platform

• Get a platform better suited to advanced analytics• Setup for success

Page 23: Swimming Across the Data Lake, Lessons learned and keys to success

© 2016 Impetus Technologies - Confidential23

Manual migrations often come with million-dollar price tags

and years of business logic must be recoded, debugged,and vetted in Hadoop (RISK)

Page 24: Swimming Across the Data Lake, Lessons learned and keys to success

© 2016 Impetus Technologies - Confidential24

Challenges to Migrate Workloads to Hadoop

• Extremely Complicated Process– Manual Identification of Workloads to Migrate

• What Tables, Data, Queries, Dependent Queries– Figuring out the Data Model on Hadoop

• Where to store on Hadoop ? • Hadoop best practices are not known/missing

– Manual Migration • time taking and risky• Offload/Migration Validation/ QA problem

• Technology Readiness– ANSI standard SQL and other complex relational

technologies are not fully supported on Hadoop– Support for DW specific keywords and data types

Page 25: Swimming Across the Data Lake, Lessons learned and keys to success

© 2016 Impetus Technologies - Confidential25

“4 click” ParadigmConnect to supported

Data Warehouses

TeradataSQL

Server

SAP Hana and More on Roadmap

Oracle

Netezza DB2

• Full Intelligent Assessment & Identification of “Offload-able” Entities

– Analyze and Recommend off loadable queries and tables

– Recommend Query Engine to meet SLA’s

– Recommend Data Store

Tool Sets for Automated Migration• Data Migration

– Recommendation for data partitioning, clustering and buckets

– Migrate role based security– Data Validation

• Workload Transformation– Impetus UDF Library to support Source specific

keywords– Automatic conversation of SQL and PL/SQL scripts

• Workload Execution– Support for Multiple Query Engine – HIVEQL and

SparkSQL– Schedule execution for migrated code

Validate Migrated Workloads• Establish functional equivalence• Meet or exceed SLA’s

– Support for HiveQL and SparkSQL– Support Hive on Tez and Hive on Spark Engines– Built-in recommendation for partitioning, clustering

and number of buckets based on dataset.– Optimized parallelism (number of mappers) based on

data source size– Scale out on commodity hardware

Page 26: Swimming Across the Data Lake, Lessons learned and keys to success

© 2016 Impetus Technologies - Confidential26

Impetus Enabling the Modern Analytical Platform

Landing and

Ingestion

StructuredUnstructure

d

ExternalSocial

MachineGeospatial

Time Series

Streaming

Provisioning, Workflow, Monitoring and Governance

Enterprise

Data Lake

Data Federation/Virtualizatio

n

Exploration & Discovery

Data WranglingReal-time applications

TraditionalData

Repositories

RDBMS MPP

Enterprise Meta Data Management

Acce

lera

tor

s

KYVOS INSIGHTS

DATA BLENDING

WORKLOAD MIGRATION

METADATA & DISCOVERY

DATA GOVERNANCE (for Hadoop)DATA ACCESS

STREAM ANALYTIX

Page 27: Swimming Across the Data Lake, Lessons learned and keys to success

© 2016 Impetus Technologies - Confidential27

Thank you.Questions??