bdd data lake demo

33
NASDAQ: EDGW Business Analytics Solutions Start Here Integrated EPM, BI, and Big Data Solutions

Upload: jeremy-searls

Post on 22-Feb-2017

83 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: BDD Data Lake Demo

NASDAQ: EDGW

Business Analytics Solutions Start HereIntegrated EPM, BI, and Big Data Solutions

Page 2: BDD Data Lake Demo

2

Why is Microsoft Excel the most commonly used BI tool in the world?

Page 3: BDD Data Lake Demo

3

Everyone's an “expert”

Industry standard for spreadsheets 750 million users worldwide Over 30 years old

How many Excel “experts” does your organization have?

Excel is Familiar

Page 4: BDD Data Lake Demo

4

Ultimately, Excel puts Analysts in Control

“Show me the data and I’ll know it when I see it”

...Not just about data consumption, but data consumption and contribution

Analysts need to develop their own “personal” data modification techniques and mashups

Business Analysts don’t know how to provide reporting requirements until they get their hands on the data

Danny Brock
We could use a few bullet points here to elaborate....e.g.Analysts don't know how to provide reporting requirements until they work with the data.Analysts need to develop their own "personal" data modifications and their own mashups. (it isn't just data consumption, it is data consumption and transformation)
Jeremy Searls
excel is copying the data, not accessing the live data
Jeremy Searls
Emphasize control and how analysts contribute their own data (lookup tables, etc) small transformations used, control is necessary
Jeremy Searls
we think its an issue of control
Page 5: BDD Data Lake Demo

5

Despite Excel’s utility for analysts, three primary issues exist...

But Problems Arise

No Data Variety...

No Data Volume...

No Data Governance...

Johnathon Francis
thoughts on this? still need more changes?
Danny Brock
This probably needs a different visual treatment
Jeremy Searls
the way this slide is introduced is loose. not a good transition from previous slide. three fundamental issues with excel as the primary tool of choice. JUST the transition is the concern, everything after is good
Page 6: BDD Data Lake Demo

6

By the time you count to 60...

This data will be structured, semi-structured, and completely unstructured

Excel Doesn’t Accommodate Variety

More than 204 million emails will be sent

Billions of new sensor data points will be detected

Over 2 million Google search queries will be performed

684,000 bits of content shared on Facebook

More than 100,000 tweets will be sent

Page 7: BDD Data Lake Demo

7

Most companies in the US have at least 100,000 GBs of data stored

Excel Doesn’t Accommodate Volume

...Meanwhile, Excel is limited to just 1 million rows…

43 trillion GBs will be created by 2020

Enterprise data will grow 650% in the next five years

The world’s info now doubles every year and a half...

Page 8: BDD Data Lake Demo

Excel Doesn’t Allow for Governance

SpreadsheetsGive analysts control of the data, but security and integrity are lost as multiple “versions” of data are created

Data WarehouseDesigned to

provide a single version of truth for

analysts and facilitate

governance

IT wants governance … Business wants control

IT Analysts

Page 9: BDD Data Lake Demo

While a traditional warehouse may be able to handle expected volumes, it can’t...

Is Your Current Warehouse the Solution?

Data Warehouse

CRM

ERP

etc.

ETL

Support rapid data development, ad hoc analysis

Answer unknown questions Quickly integrate new or unstructured data sources

Reporting

Page 10: BDD Data Lake Demo

A New Approach Is Required...

To give analysts control and access to data

To accommodate increased data variety

To scale your analytical capabilities

To complement the existing solutions

To create a centralized governed repository

Johnathon Francis
Is Scalable Storage/Compute a requirement or an "additional benefit"
Jeremy Searls
confusing graphic, is light blue answering dark blue?
Joseph Hendele
I understand Calder's grievance but I'm not sure if it's universal... there are no questions here and it's really not all that ambiguous once you read the text. That being said, I'm not opposed to redesigning this slide if just about everyone experienced the same confusion, etc. :)
Page 11: BDD Data Lake Demo

Enable Ad-hoc Analysis for the Business

Questions You’re Not Asking

Questions You’re Asking

Things you don’t know

Things you know

01101100 01100110 101011 00111011 01110011 01 1100 01101000 01100010 00 1101 01101100 01100110 0 01011 01100001 011100111000100 01101000 01100010 00111011 01101100 01100110 01101011 01100001 01110011 01100100

Ad-Hoc Analysis

● Heterogenous Data● Massive Compute● Ad-Hoc Analysis● Centralized Repository● Advanced Transform

...What your business needsTraditional Reporting

● Trusted KPIs● Historic Data● Scheduled Reports● Homogenous Data● Pixel Perfect

What your business has...

Jeremy Searls
revisit script, call out what theyre doing today and what they need to be doing. Bring back the idea of control, often don't know the data until they see it. Be the flashlight/mag glass, test hypothesis, follow upon a hunch
Page 12: BDD Data Lake Demo

Enable Discovery Before Reporting

Data Lake

Data Warehouse

00111011 01101100 01100110 101011

00111011 01100001 01110011 011100

01101011 01101000 01100010 00 1101 00111011 01101100 01100110 0 01011 00111011 01100001 011100111000100 01101011 01101000 01100010 001110110 00111011 01101100 01100110 011010111

CRMERP

Conform

Archive

Ad-Hoc Analysis Reporting

New Data Sources

Existing Data Sources

Copy/Ingest

Danny Brock
Can we break up the sources on the left into existing sources and new sources?
Jeremy Searls
_Marked as resolved_
Jeremy Searls
_Re-opened_
Jeremy Searls
raw data goes here, not making copies of warehouse data
Jeremy Searls
need a compelling reason that they complement
Jeremy Searls
quintessential slide of the entire presentation, emerging best practice. Putting the data lake "in front" of the data warehouse. Understand the data before we do something with it. "This is an emerging best practices that we're communicating to our customers"
Page 13: BDD Data Lake Demo

13

Load all types of existing data into the lake “as is”

Step 1 - Fill the Lake

Data Variety

Centralized RepositoryIncorporate New Data Sources

One Centralized Repository

• Eliminates Data Silos• Improves Data Integration• Promotes Data Governance

• Social Media• Transactions• Unstructured• Sensor Data• “As-is” Data

Jeremy Searls
Loading more than transactional data
Jeremy Searls
"write as is, read as necessary"
Page 14: BDD Data Lake Demo

00111011 01101100 01100110 10101100 00111011 01100001 01110011 10011100 01101011 01101000 01100010 00101101

Step 2 - Add a Discovery Layer

Give analysts control and access to the data

Select a Data Discovery tool that is right for your business

Analyst Control Software Agnostic

• Total autonomy• Ad-hoc analysis• Personalized mash-ups• Single version of the “truth”

• Oracle Big Data Discovery• Datameer• Platfora• Open Source

Read the fine print: Be wary of tools that promise ad-hoc analysis, but only enable data consumption or visualization

Danny Brock
Might be helpful to talk about the "as-is" data as a raw material that analysts need access to
Jeremy Searls
Large print?
Jeremy Searls
Options other than hadoop?
Page 15: BDD Data Lake Demo

Step 3 - Graduate to the Warehouse

Augment Existing Solutions

Lake + Warehouse quicker time-to-value, more data, more capability

Migrate crucial insights to the warehouse

Leverage existing reports/create new ones

Archive back into the data lake

Identify data quality issues quickly

Build transforms at massive scale

Jeremy Searls
revisit the point: want to understand the data before we graduate it
Jeremy Searls
lake gives the opportunity to "fail fast" without significant investment
Page 16: BDD Data Lake Demo

The Bigger Picture

Scalable Storage and Compute

Tech Replacement

Massive Transform Capabilities

New Advanced Analytics

Introduce a repository that can house all your

organization’s data, at scale, with no risk of data loss

Lay the foundation for new “untapped” analytical

capabilities like predictive, machine learning, search, and

real-time alerting

Over time, reduce the size and cost of your warehouse by re-platforming some reporting onto the data lake

Deliver powerful, performant transforms leveraging the massive compute power of the data lake

Jeremy Searls
more explicit about tech replacement, "In time you can reduce the size of your warehouse by doing some of the reporting in the lake
Page 17: BDD Data Lake Demo

17

Scenario:

Flipflops Resort is located in the heart of the caribbean and is a popular tourist destination

Their marketing team would like to better understand the impact of social sentiment on sales

How might this play out in the “real world”?

10101011 01101100 0110 01101011 1011 01100001 0011 01100100

10010101 0 010101100111011 01101100 01100110 01101011 00111011 01100001 01110011 01100100 01101011 01101000 01100010 00111011

01 01010 00111011 01101100 01100110 01101011 00111011 01100001 01110011 01100100

10 1010 11000111011 01101100 01100110 01101011 00111011 01100001 01110011 01100100 01101011 01101000 01100010 00111011

01111 1 00100111011 01101100 011 01101011 00111011 011 01110011 01100100 011 01101000 01100010 001

0011 01000111011 01101100 01100110 01101011 00111011 01100001 01110011 01100100

1110 0101 11100111011 01101100 01100110 01101011 00111011 01100001 01110011 01100100 01101011 01101000 01100010 00111011

Page 18: BDD Data Lake Demo

Currently, Flipflops uses database file dumps in excel format to gather any insights...

This can be very time consuming and does not promote the inclusion of new data sources

Current Strategy

Page 19: BDD Data Lake Demo

“Does our resort’s weather impact social media sentiment?”

Discovery Starts With a Question

Page 20: BDD Data Lake Demo

Need to ingest data from sources and formats that may not be not structured in a spreadsheet friendly way

Limitations of Current Practices

Obtaining this data can be a labor intensive process

Page 21: BDD Data Lake Demo

Semi-Structured Data Example

Page 22: BDD Data Lake Demo

It's clear that Excel does not handle semi-structured data well, and doesn’t support unstructured data at all

Attempting to draw insights from this data, or joining additional data sources to draw any correlations would be difficult at best

This is where we can utilize discovery and the data lake to answer our question

Outgrowing Excel

Page 23: BDD Data Lake Demo

Piping Outside Data to the Lake

We’re focused on social media sentiment, so let’s grab some tweets and weather data, and put it into our lake

New Data Sources

23

Data Lake

Data Warehouse

00111011 0110110001100110 101011 0011101101100001 01110011 01110001101011 01101000 01100010 00 1101 00111011 01101100 01100110 0 01011 010100 111100 100010

Ad-Hoc Analysis

Reporting

Page 24: BDD Data Lake Demo

24

Piping More Data to the Lake

Data Lake

Data Warehouse

Ad-Hoc Analysis Reporting

Additionally, let’s leverage existing marketing and booking data to help answer our question

Existing Data Sources

00111011 0110110001100110 101011 0011101101100001 01110011 01110001101011 01101000 01100010 00 1101 00111011 01101100 01100110 0 01011 010100 111100 100010

Page 25: BDD Data Lake Demo

View of our Data Lake through a web interface called Hue

Note the variety of file types that can be stored

Hue Lake View

Data Lake

Page 26: BDD Data Lake Demo

26

Analysis on top of Lake

We are now ready to start our discovery phase and will use an analytical tool on top of our lake to visualize any insights

Data Lake

Data Warehouse

Ad-Hoc Analysis

Reporting

Existing Data Sources

New Data Sources

00111011 0110110001100110 101011 0011101101100001 01110011 01110001101011 01101000 01100010 00 1101 00111011 01101100 01100110 0 01011 010100 111100 100010

Page 27: BDD Data Lake Demo

Diving into the Lake

With a variety of both open source and proprietary tools

available, we can quickly view our data and gather potential

insights

Page 28: BDD Data Lake Demo

28

Options for Discovery

28

Ad-Hoc Analysis

There are many different ways to analyze the data in the lake

00111011 0110110001100110 101011 0011101101100001 01110011 01110001101011 01101000 01100010 00 1101 00111011 01101100 01100110 0 01011 010100 111100 100010

Page 29: BDD Data Lake Demo

Data Lake Demo

Page 30: BDD Data Lake Demo

Incorporating New Insights

Any insights we discover could be included in a traditional data

warehouse and integrated into regular reporting

Data Warehouse Reporting

New Data Fields/Sources

Data Lake

Page 31: BDD Data Lake Demo

Data Lake• Centralized access to heterogeneous

data• Powerful data transformations • Easily join data sets together• Ability to visualize fields within

moments of upload• Garnish insights into data without

significant time investment• Maintain data integrity

Demo Recap

Microsoft Excel• Local access to homogeneous data• Slow data transformations, data loaded

onto local machine• Tedious joining of data sets• Visualizations must be built and configured

for new data sets• Gathering data insights may involve notable

amount of staff time• Loss of data governance and integrity

A comparison of what we accomplished using a data lake:

Page 32: BDD Data Lake Demo

32

Next Steps

So What Now?

1. Let Ranzal help your organization understand how to best move forward with an “Analytics Roadmap”

2.) Start small with your data lake. Let Ranzal implement the first

solution to deliver real ROI. This is often Infrastructure Replacement, Active Archive, and/or ETL Offload

Jeremy Searls
#1 talk about gap analysis, let us show you "how to get from a to b"
Jeremy Searls
#2 Overall vision that we've painted is a LOT of work. "we do this all the time, we can start small, few tried and true ways to to see ROI" lake first, then mature
Page 33: BDD Data Lake Demo

33

Contact Information

Edgewater Ranzal108 Corporate Park Drive, Suite 105

White Plains, NY 10604Tel (914) 253-6600

Email: [email protected]

45 Beech Street, Suite 109London EC2Y 8ADUnited KingdomTel +44 (0) 2033 717 174

130 S. Jefferson St.Suite 101Chicago, IL 60661Tel (847) 269-3524

200 Harvard Mill SquareSuite 210Wakefield, MA 01880Tel (781) 246-3343