beyond the data lake - matthias korn, technical consultant at data virtuality

Post on 08-Jan-2017

716 Views

Category:

Data & Analytics

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

US Office:1355 Market Street, #488San Francisco, CA 94103

German Office:Katharinenstr. 1504109 Leipzig, Germany

Beyond the Data LakeSimplifying data integration for the modern age

Matthias Korn | Head of Presalesmatthias.korn@datavirtuality.de

Variety is The Challenge

Gartner 2014: “VARIETY is the biggest challenge.”

“When asked about the dimensions of data organizations strugglewith most, 49% answered variety, while 35%answered volume and 16% velocity.”

1996 - Variety already was a major challenge…

Integration using the Data Warehouse

Data is integrated by copying it into a central repositoryApproach: ETL process (Extract/Transform/Load)Structure is applied on the way into the repositoryBI users query Data Marts

Why do so many DWH projects fail: ETL

Inflexible; costly modifications

Labour-intensive setup and maintenance

Over 50% failure rate*

Slow data-to-actionable-insights (6 to 9+ months)

2016 – Variety is Getting Dramatic

Where does the complexity come from?

Big Data• Machine data, unstructured

data, social data, streaming data, IoT, etc.

Cloud data• APIs, cloud data platforms etc.

Data Lake – getting some data in pretty easy…

Clickstream Data

Sensor Data

Server logs

Unique identifie

r provide

d

Metadata tags

provied

Original data

structure

Databases Web APIs

…still challenges with other data

Integration using the Data Lake

Data is integrated by copying it into a central repository

Approach: ELT process (Extract/Load/Transform)

Data loaded in the original structure

For Data Scientists rather than for BI users

BI users query Data Marts: wait, didn‘t they do this before already?

Data Lake and DWH

Both physical data integrationBoth require significant upfront effort to create and fill with dataBoth miss agility from BI user‘s point of view

Reasons for physical data integration

Query all data with same languageModel data with same languageHigh performance

The Logical Data WarehouseIntroduced by Gartner in 2012New data management architecture for analyticsUses repositories just like the EDWAdds distributed processes like Data LakeAdds virtualization of data sources for business agilityRemoves the obstacle of physical data integration

Logical Data Warehouse (LDW)

What does the Logical Data Warehouse do?

LDW knows where the data is stored instead of copying itCombines different technologies for different usecases

• big data processing• Classical BI• Agile business analytics

Advantages of the Logical Data Warehouse

Real time data available and ready for analysisImmediately productiveFlexible Logical Data ModelPermissions, governanceAPIs, WebservicesDecoupling business layer and tech layer

Technology Map

ConclusionLogical Data Warehouse holds enormous promiseUnified data architecture for both Big Data and classical BI usecasesFlexibility and real-time access give an advantageExplore->Use->Optimize instead of Build->Test->Use

provide quicker time to solutionWe dataconomy

US Office:1355 Market Street, #488San Francisco, CA 94103

German Office:Katharinenstr. 1504109 Leipzig, Germany

Thanks for your attention

Backup 1 : Example data flow in an LDW

Distributed queryBI frontend aware of all data sources - creates SQL statementPerformance optimization engine replicates data only if needed

Backup 2: Competitive LandscapeAcquired

top related