tdwi adf session

36
Datenbereitstellung zur freien Analyse in einer modernen- & hybriden Welt Jens Kröhnert, 22.06.2015 P-TSP Microsoft Principal Consultant Oraylis

Upload: jens-kr

Post on 13-Apr-2017

116 views

Category:

Documents


5 download

TRANSCRIPT

P-TSP Vorstellung IoT Projekt ORAYLIS

Datenbereitstellung zur freien Analyse in einer modernen- & hybriden Welt

Jens Krhnert, 22.06.2015P-TSP MicrosoftPrincipal Consultant Oraylis

Where does Big Data come from?

One of the sources of new data types: sensors, senors, sensorsGPSProximity SensorAmbient Light Sensor3-Axis AccelerometerMagnetometerGyroscopic SensorWifiCamera(s)UI (senses user interactions)iBeacon

4IT Innovation is a major driver for Business InnovationDisruptive Digital Transformation ahead

5Dream for decades will it ever come true?

6Dream for decades will it ever come true?

6

7

Kevin Kelly, founder of Wired Magazine:Singularity is the point at which "all the change in the last million years will be superseded by the change in the next five minutes."

Social need to participate in this technology to have a chance to understand, develop and innovate the innovations to protect liberty and humanity

7

8Every industry is now a software industry where they are building these systems of intelligence provide SaaS services that go along with your products. - Nadella

Youre going to reason over that data, youre going to build applications, youre going to do analytics and predictions. Youre going to provide SaaS services that go along with your products.Make sure everyone inside your organization has the power to access these insights and then have the power to act on those insights.

8

Dr. A. Kuntze GmbH at a glance9

Specialist for innovative solutions in measurement and control technology for water analysis 70 years of experience in manufacturing instruments, sensorsand systems Made in GermanyFounded in 194521 EmployeesHeadquarter in Meerbusch, Germany

OLTP, ERP, LOB, ...

Devices, social, sensors, web

BI toolsData martsAppsDashboards

TRANSFORM

ETL tool

(SSIS, etc.)

EXTRACT

Original data

Transformed data

INGEST

Original data

LOAD(SQL Sever, Teradata, etc.)EDW

Scale-out, storage, and compute

(HDFS, Blob storage, etc.)

TRANSFORM AND LOADStreaming data

(On-premise and in the cloud)Azure Data Factory ETL vs EL & TL & TL &

For the longest time, producing a DW that could be used for analytics required only a traditional ETL process. However, with the confluence of trends impacting the traditional DW, new approaches to analytics are required to support business transformations and competitive advantage.

Existing ETL processes there, were not proposing to replace. Want to help handle new data / types / quality levels and iterate over quickly. Tools out there that do each of these approaches, but not a good way to span across them both.

to show block blending both.Problems with blending these approaches in a manageable way for all of the sources and processing environments required.Managing all of the new types and shapes of data. Relational. Non-relational, cloud-born data.Maximizing the benefits of integrating Hadoop and relationalWriting and managing custom code to piece together a system supporting new and different shapes of data, speeds, and processing environments.How do you manage and monitor it?How do you recover from failures?How do you orchestration information production in a repeatable way?Need a reliable and complete view of analytics infrastructure. How to understand data lineage and what will be impacted if something is changed?

6/23/201510 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Evolving approaches to analytics Azure Data FactoryBI ToolsData MartsData Lake(s)

DashboardsAppsData Hub(Storage & Compute)Data Sources(Import From)Data Connector:Import from source to HubData Connector: Import/Export among HubsData Hub(Storage & Compute)Data Sources(Import From)Data Connector:Import from source to HubData Connector:Export from Hub to data store

Pipeline

Pipeline

For the longest time, producing a DW that could be used for analytics required only a traditional ETL process. However, with the confluence of trends impacting the traditional DW, new approaches to analytics are required to support business transformations and competitive advantage.

Existing ETL processes there, were not proposing to replace. Want to help handle new data / types / quality levels and iterate over quickly. Tools out there that do each of these approaches, but not a good way to span across them both.

to show block blending both.Problems with blending these approaches in a manageable way for all of the sources and processing environments required.Managing all of the new types and shapes of data. Relational. Non-relational, cloud-born data.Maximizing the benefits of integrating Hadoop and relationalWriting and managing custom code to piece together a system supporting new and different shapes of data, speeds, and processing environments.How do you manage and monitor it?How do you recover from failures?How do you orchestration information production in a repeatable way?Need a reliable and complete view of analytics infrastructure. How to understand data lineage and what will be impacted if something is changed?

6/23/201511 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Coordination Monitoring & MgmtData LineageEvolving approaches to analytics Azure Data Factory

Azure Blob Storage

Sensor JSON FilesReference Data

On Premises Data MartSensor Data

Master Data

Batch Sensor Data

Visualize

Data Set(Collection of files, DB table, etc)Activity: a processing step (Hadoop job, custom code, ML model, etc)

Pipeline: a sequence of activities (logical group)

Data SourcesIngestTransform & AnalyzePublishSensor DetailsPredict Maintenance

Transform, Combine, etc

Analyze

Move

Azure Data Factory Concept Focus Transform with HDInsight

14Techn. Key Enabler for Big Data Distributors like Hortonworks Enterprise Ready

15Techn. Key Enabler for Big Data Distributors like Hortonworks Enterprise Ready

Techn. Key Enabler for Big Data HDInsight by Microsoft

Cluster of machines running Hadoop at Yahoo!16

HDInsight On Premise/Boxed or As a Service How big is your cluster?

Azure HDInsight now customizable for a variety of Hadoop projects Now you have the ability to customize your Azure HDInsight clusters with projects available from the Apache Hadoop ecosystem. By using the Script Action feature, Hadoop clusters can be modified in arbitrary ways using custom scripts. To demonstrate the power of this capability, weve documented the process for installing Spark and R modules.

HDInsight adds a deeper Visual Studio experience To help developers using Visual Studio easily incorporate the benefits of big data within their applications, weve added a deeper tooling experience forHDInsight in the most recent version of the Azure SDK. Developers can use this extension to visualize and query their Hadoop clusters, as well as manage applications that integrate with Hadoop directly in Visual Studio. Learn more.17

HDInsight Hadoop as a ServiceSpin up / Expand Cluster when neededSpin down / Shrink when not neededPay per useStore Data in cheap Blob StoreMount Blob Store in your local EnvironmentConnect through PolybaseOrchestrate with Azure Data Factory

Azure Blob Storage

Sensor JSON FilesReference Data

On Premises Data MartSensor Data

Master Data

Batch Sensor Data

Visualize

Data Set(Collection of files, DB table, etc)Activity: a processing step (Hadoop job, custom code, ML model, etc)

Pipeline: a sequence of activities (logical group)

Data SourcesIngestTransform & AnalyzePublishSensor DetailsPredict Maintenance

Transform, Combine, etc

Analyze

Move

Data Factory Concept Ingest: Batch or Real-Time (Stream Analytics)

Canonical Stream Analytics Pattern

Presentation and action

Storage andBatch Analysis

StreamAnalysis

Ingestion

Collection

Event productionEvent hubs

Cloud gateways(web APIs)

Field gateways

ApplicationsLegacy IOT (custom protocols)DevicesIP-capable devices(Windows/Linux)Low-power devices (RTOS)

Search and queryData analytics(Power BI)

Web/thick client dashboards

Event HubsSQL DBStorage TablesPower BIStorage Blobs

Stream Analytics

Devices to take actionMachineLearning

more to come

Tech Ready 15 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.6/23/201520

Reference DataSeamless correlation of event streams with reference dataStatic or slowly-changing data stored in blobs

CSV and JSON files in Azure Blobs;scanned for new snapshots on a settable cadence

JOIN (INNER or LEFT OUTER) between streams and reference data sources

Reference data appears like another input:

SELECT myRefData.Name, myStream.Value FROM myStreamJOIN myRefDataON myStream.myKey = myRefData.myKey

Build 2015 2015 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.6/23/2015 2:25 PM21

Multiple steps, multiple outputsWITH Step1 AS ( SELECT Count(*) AS CountTweets, TopicFROM TwitterStream PARTITION BY PartitionIdGROUP BY TumblingWindow(second, 3), Topic, PartitionId),Step2 AS ( SELECT Avg(CountTweets) FROM Step1GROUP BY TumblingWindow(minute, 3) )SELECT * INTO Output1 FROM Step1SELECT * INTO Output2 FROM Step2SELECT * INTO Output3 FROM Step2A query can have multiple steps to enable pipeline executionA step is a sub-query defined using WITH (common table expression)Can be used to develop complex queries more elegantly by creating a intermediary named resultCreates unit of execution for scaling out when PARTITION BY is usedEach steps output can be sent to multiple output targets using INTO

Tech Ready 156/23/201522

Azure Blob Storage

Sensor JSON FilesReference Data

On Premises Data MartSensor Data

Master Data

Batch Sensor Data

Visualize

Data Set(Collection of files, DB table, etc)Activity: a processing step (Hadoop job, custom code, ML model, etc)

Pipeline: a sequence of activities (logical group)

Data SourcesIngestTransform & AnalyzePublishSensor DetailsPredict Maintenance

Transform, Combine, etc

Analyze

Move

Data Factory Concept

24Azure Data Factory Demo

Azure Data Factory Howto

Azure Data Factory Key BenefitsConnect Cloud and On Premise Data SourcesSupports Hive, Pig & C# processingAutomatic Hadoop (HDInsight) Cluster ManagementRetries for transient failures, configurable timeout policies & alertingMonutor data pipeline in one placeVisually track data lineage Full historical accounting of job execution, system health and dependencies in a single monitoring dashboard

Coordination Monitoring & MgmtData Lineage

27IoT Project Dr. Kuntze - Architecture and Demo

Event HubOutput Consumer GroupStream AnalyticsInput Event Hub Consumer GroupInput Reference DataComplex Event Processing (SQL)Output Power BIOuput Blob StorageHDInsightNightly BatchInput Blob Storage (JSON)Output Blob Storage (csv)Data FactoryWorkflow ManagementPower BIReal Time DatasetBatch Dataset over Power Query Blob Storage

28Batch Load to Serving Layer Power BI

Power BI with Real-Time & Batch, mobile & Data Discovery

Show average of time taken by state29

31From the Realtime Enterprise to the Predictive EnterprisePastPresentFutureMonitoring BI(Realtime)Traditional BI(Batch)Predictive BI(Machine Learning)

Lambda Architecture

rearview mirror31

32Next Steps Dr. Kuntze Provide Services that go along with your productsPredictive Maintenance und -AlarmingCloud based Data Mining: Azure ML (Machine Learning)

Machine LearningAzure ML and Stream Analytics are integrated

Azure ML can publish web endpoints for operationalized models

Azure Stream Analytics can bind custom function names to such web endpoints

Example: apply bound function event-by-event

sentiment mapped to endpoint/API key

SELECT text, sentiment(text) AS scoreFROM myStream

Predictive Alarming

Build 2015 2015 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.6/23/2015 2:25 PM33

Jens Krhnert (ORAYLIS GmbH) (JK) -

Machine LearningAzure ML and Data Factory are integrated

Predictive Maintenance

Build 2015 2015 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.6/23/2015 2:25 PM34

Jens Krhnert (ORAYLIS GmbH) (JK) -

Think BigStart SmallDiscover VisualScale Fast

Take away

36Go try

Microsoft Azure kostenlos testen:http://azure.microsoft.com/de-de/pricing/free-trial/

Hortonworks Download HDP Sandbox (Hadoop):http://hortonworks.com/products/hortonworks-sandbox/#install

ORAYLIS Blog: BI & Big Data Themenhttp://blog.oraylis.de/

Download Power BI Designer (kostenlos): https://www.powerbi.com/dashboards/downloads/

ORAYLIS TV: Jens Krhnert: Power BI Seriehttps://www.youtube.com/user/oraylisbi

ORAYLIS BI Guide (Workshops u.a. fr Big Data mit Hadoop)http://www.oraylis.de/loesungen/oraylis-bi-guide