beyond the data lake - matthias korn, technical consultant at data virtuality
TRANSCRIPT
US Office:1355 Market Street, #488San Francisco, CA 94103
German Office:Katharinenstr. 1504109 Leipzig, Germany
Beyond the Data LakeSimplifying data integration for the modern age
Matthias Korn | Head of [email protected]
Variety is The Challenge
Gartner 2014: “VARIETY is the biggest challenge.”
“When asked about the dimensions of data organizations strugglewith most, 49% answered variety, while 35%answered volume and 16% velocity.”
1996 - Variety already was a major challenge…
Integration using the Data Warehouse
Data is integrated by copying it into a central repositoryApproach: ETL process (Extract/Transform/Load)Structure is applied on the way into the repositoryBI users query Data Marts
Why do so many DWH projects fail: ETL
Inflexible; costly modifications
Labour-intensive setup and maintenance
Over 50% failure rate*
Slow data-to-actionable-insights (6 to 9+ months)
2016 – Variety is Getting Dramatic
Where does the complexity come from?
Big Data• Machine data, unstructured
data, social data, streaming data, IoT, etc.
Cloud data• APIs, cloud data platforms etc.
Data Lake – getting some data in pretty easy…
Clickstream Data
Sensor Data
Server logs
Unique identifie
r provide
d
Metadata tags
provied
Original data
structure
Databases Web APIs
…still challenges with other data
Integration using the Data Lake
Data is integrated by copying it into a central repository
Approach: ELT process (Extract/Load/Transform)
Data loaded in the original structure
For Data Scientists rather than for BI users
BI users query Data Marts: wait, didn‘t they do this before already?
Data Lake and DWH
Both physical data integrationBoth require significant upfront effort to create and fill with dataBoth miss agility from BI user‘s point of view
Reasons for physical data integration
Query all data with same languageModel data with same languageHigh performance
The Logical Data WarehouseIntroduced by Gartner in 2012New data management architecture for analyticsUses repositories just like the EDWAdds distributed processes like Data LakeAdds virtualization of data sources for business agilityRemoves the obstacle of physical data integration
Logical Data Warehouse (LDW)
What does the Logical Data Warehouse do?
LDW knows where the data is stored instead of copying itCombines different technologies for different usecases
• big data processing• Classical BI• Agile business analytics
Advantages of the Logical Data Warehouse
Real time data available and ready for analysisImmediately productiveFlexible Logical Data ModelPermissions, governanceAPIs, WebservicesDecoupling business layer and tech layer
Technology Map
ConclusionLogical Data Warehouse holds enormous promiseUnified data architecture for both Big Data and classical BI usecasesFlexibility and real-time access give an advantageExplore->Use->Optimize instead of Build->Test->Use
provide quicker time to solutionWe dataconomy
US Office:1355 Market Street, #488San Francisco, CA 94103
German Office:Katharinenstr. 1504109 Leipzig, Germany
Thanks for your attention
Backup 1 : Example data flow in an LDW
Distributed queryBI frontend aware of all data sources - creates SQL statementPerformance optimization engine replicates data only if needed
Backup 2: Competitive LandscapeAcquired