tdwi adf session
TRANSCRIPT
P-TSP Vorstellung IoT Projekt ORAYLIS
Datenbereitstellung zur freien Analyse in einer modernen- & hybriden Welt
Jens Krhnert, 22.06.2015P-TSP MicrosoftPrincipal Consultant Oraylis
Where does Big Data come from?
One of the sources of new data types: sensors, senors, sensorsGPSProximity SensorAmbient Light Sensor3-Axis AccelerometerMagnetometerGyroscopic SensorWifiCamera(s)UI (senses user interactions)iBeacon
4IT Innovation is a major driver for Business InnovationDisruptive Digital Transformation ahead
5Dream for decades will it ever come true?
6Dream for decades will it ever come true?
6
7
Kevin Kelly, founder of Wired Magazine:Singularity is the point at which "all the change in the last million years will be superseded by the change in the next five minutes."
Social need to participate in this technology to have a chance to understand, develop and innovate the innovations to protect liberty and humanity
7
8Every industry is now a software industry where they are building these systems of intelligence provide SaaS services that go along with your products. - Nadella
Youre going to reason over that data, youre going to build applications, youre going to do analytics and predictions. Youre going to provide SaaS services that go along with your products.Make sure everyone inside your organization has the power to access these insights and then have the power to act on those insights.
8
Dr. A. Kuntze GmbH at a glance9
Specialist for innovative solutions in measurement and control technology for water analysis 70 years of experience in manufacturing instruments, sensorsand systems Made in GermanyFounded in 194521 EmployeesHeadquarter in Meerbusch, Germany
OLTP, ERP, LOB, ...
Devices, social, sensors, web
BI toolsData martsAppsDashboards
TRANSFORM
ETL tool
(SSIS, etc.)
EXTRACT
Original data
Transformed data
INGEST
Original data
LOAD(SQL Sever, Teradata, etc.)EDW
Scale-out, storage, and compute
(HDFS, Blob storage, etc.)
TRANSFORM AND LOADStreaming data
(On-premise and in the cloud)Azure Data Factory ETL vs EL & TL & TL &
For the longest time, producing a DW that could be used for analytics required only a traditional ETL process. However, with the confluence of trends impacting the traditional DW, new approaches to analytics are required to support business transformations and competitive advantage.
Existing ETL processes there, were not proposing to replace. Want to help handle new data / types / quality levels and iterate over quickly. Tools out there that do each of these approaches, but not a good way to span across them both.
to show block blending both.Problems with blending these approaches in a manageable way for all of the sources and processing environments required.Managing all of the new types and shapes of data. Relational. Non-relational, cloud-born data.Maximizing the benefits of integrating Hadoop and relationalWriting and managing custom code to piece together a system supporting new and different shapes of data, speeds, and processing environments.How do you manage and monitor it?How do you recover from failures?How do you orchestration information production in a repeatable way?Need a reliable and complete view of analytics infrastructure. How to understand data lineage and what will be impacted if something is changed?
6/23/201510 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Evolving approaches to analytics Azure Data FactoryBI ToolsData MartsData Lake(s)
DashboardsAppsData Hub(Storage & Compute)Data Sources(Import From)Data Connector:Import from source to HubData Connector: Import/Export among HubsData Hub(Storage & Compute)Data Sources(Import From)Data Connector:Import from source to HubData Connector:Export from Hub to data store
Pipeline
Pipeline
For the longest time, producing a DW that could be used for analytics required only a traditional ETL process. However, with the confluence of trends impacting the traditional DW, new approaches to analytics are required to support business transformations and competitive advantage.
Existing ETL processes there, were not proposing to replace. Want to help handle new data / types / quality levels and iterate over quickly. Tools out there that do each of these approaches, but not a good way to span across them both.
to show block blending both.Problems with blending these approaches in a manageable way for all of the sources and processing environments required.Managing all of the new types and shapes of data. Relational. Non-relational, cloud-born data.Maximizing the benefits of integrating Hadoop and relationalWriting and managing custom code to piece together a system supporting new and different shapes of data, speeds, and processing environments.How do you manage and monitor it?How do you recover from failures?How do you orchestration information production in a repeatable way?Need a reliable and complete view of analytics infrastructure. How to understand data lineage and what will be impacted if something is changed?
6/23/201511 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Coordination Monitoring & MgmtData LineageEvolving approaches to analytics Azure Data Factory
Azure Blob Storage
Sensor JSON FilesReference Data
On Premises Data MartSensor Data
Master Data
Batch Sensor Data
Visualize
Data Set(Collection of files, DB table, etc)Activity: a processing step (Hadoop job, custom code, ML model, etc)
Pipeline: a sequence of activities (logical group)
Data SourcesIngestTransform & AnalyzePublishSensor DetailsPredict Maintenance
Transform, Combine, etc
Analyze
Move
Azure Data Factory Concept Focus Transform with HDInsight
14Techn. Key Enabler for Big Data Distributors like Hortonworks Enterprise Ready
15Techn. Key Enabler for Big Data Distributors like Hortonworks Enterprise Ready
Techn. Key Enabler for Big Data HDInsight by Microsoft
Cluster of machines running Hadoop at Yahoo!16
HDInsight On Premise/Boxed or As a Service How big is your cluster?
Azure HDInsight now customizable for a variety of Hadoop projects Now you have the ability to customize your Azure HDInsight clusters with projects available from the Apache Hadoop ecosystem. By using the Script Action feature, Hadoop clusters can be modified in arbitrary ways using custom scripts. To demonstrate the power of this capability, weve documented the process for installing Spark and R modules.
HDInsight adds a deeper Visual Studio experience To help developers using Visual Studio easily incorporate the benefits of big data within their applications, weve added a deeper tooling experience forHDInsight in the most recent version of the Azure SDK. Developers can use this extension to visualize and query their Hadoop clusters, as well as manage applications that integrate with Hadoop directly in Visual Studio. Learn more.17
HDInsight Hadoop as a ServiceSpin up / Expand Cluster when neededSpin down / Shrink when not neededPay per useStore Data in cheap Blob StoreMount Blob Store in your local EnvironmentConnect through PolybaseOrchestrate with Azure Data Factory
Azure Blob Storage
Sensor JSON FilesReference Data
On Premises Data MartSensor Data
Master Data
Batch Sensor Data
Visualize
Data Set(Collection of files, DB table, etc)Activity: a processing step (Hadoop job, custom code, ML model, etc)
Pipeline: a sequence of activities (logical group)
Data SourcesIngestTransform & AnalyzePublishSensor DetailsPredict Maintenance
Transform, Combine, etc
Analyze
Move
Data Factory Concept Ingest: Batch or Real-Time (Stream Analytics)
Canonical Stream Analytics Pattern
Presentation and action
Storage andBatch Analysis
StreamAnalysis
Ingestion
Collection
Event productionEvent hubs
Cloud gateways(web APIs)
Field gateways
ApplicationsLegacy IOT (custom protocols)DevicesIP-capable devices(Windows/Linux)Low-power devices (RTOS)
Search and queryData analytics(Power BI)
Web/thick client dashboards
Event HubsSQL DBStorage TablesPower BIStorage Blobs
Stream Analytics
Devices to take actionMachineLearning
more to come
Tech Ready 15 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.6/23/201520
Reference DataSeamless correlation of event streams with reference dataStatic or slowly-changing data stored in blobs
CSV and JSON files in Azure Blobs;scanned for new snapshots on a settable cadence
JOIN (INNER or LEFT OUTER) between streams and reference data sources
Reference data appears like another input:
SELECT myRefData.Name, myStream.Value FROM myStreamJOIN myRefDataON myStream.myKey = myRefData.myKey
Build 2015 2015 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.6/23/2015 2:25 PM21
Multiple steps, multiple outputsWITH Step1 AS ( SELECT Count(*) AS CountTweets, TopicFROM TwitterStream PARTITION BY PartitionIdGROUP BY TumblingWindow(second, 3), Topic, PartitionId),Step2 AS ( SELECT Avg(CountTweets) FROM Step1GROUP BY TumblingWindow(minute, 3) )SELECT * INTO Output1 FROM Step1SELECT * INTO Output2 FROM Step2SELECT * INTO Output3 FROM Step2A query can have multiple steps to enable pipeline executionA step is a sub-query defined using WITH (common table expression)Can be used to develop complex queries more elegantly by creating a intermediary named resultCreates unit of execution for scaling out when PARTITION BY is usedEach steps output can be sent to multiple output targets using INTO
Tech Ready 156/23/201522
Azure Blob Storage
Sensor JSON FilesReference Data
On Premises Data MartSensor Data
Master Data
Batch Sensor Data
Visualize
Data Set(Collection of files, DB table, etc)Activity: a processing step (Hadoop job, custom code, ML model, etc)
Pipeline: a sequence of activities (logical group)
Data SourcesIngestTransform & AnalyzePublishSensor DetailsPredict Maintenance
Transform, Combine, etc
Analyze
Move
Data Factory Concept
24Azure Data Factory Demo
Azure Data Factory Howto
Azure Data Factory Key BenefitsConnect Cloud and On Premise Data SourcesSupports Hive, Pig & C# processingAutomatic Hadoop (HDInsight) Cluster ManagementRetries for transient failures, configurable timeout policies & alertingMonutor data pipeline in one placeVisually track data lineage Full historical accounting of job execution, system health and dependencies in a single monitoring dashboard
Coordination Monitoring & MgmtData Lineage
27IoT Project Dr. Kuntze - Architecture and Demo
Event HubOutput Consumer GroupStream AnalyticsInput Event Hub Consumer GroupInput Reference DataComplex Event Processing (SQL)Output Power BIOuput Blob StorageHDInsightNightly BatchInput Blob Storage (JSON)Output Blob Storage (csv)Data FactoryWorkflow ManagementPower BIReal Time DatasetBatch Dataset over Power Query Blob Storage
28Batch Load to Serving Layer Power BI
Power BI with Real-Time & Batch, mobile & Data Discovery
Show average of time taken by state29
31From the Realtime Enterprise to the Predictive EnterprisePastPresentFutureMonitoring BI(Realtime)Traditional BI(Batch)Predictive BI(Machine Learning)
Lambda Architecture
rearview mirror31
32Next Steps Dr. Kuntze Provide Services that go along with your productsPredictive Maintenance und -AlarmingCloud based Data Mining: Azure ML (Machine Learning)
Machine LearningAzure ML and Stream Analytics are integrated
Azure ML can publish web endpoints for operationalized models
Azure Stream Analytics can bind custom function names to such web endpoints
Example: apply bound function event-by-event
sentiment mapped to endpoint/API key
SELECT text, sentiment(text) AS scoreFROM myStream
Predictive Alarming
Build 2015 2015 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.6/23/2015 2:25 PM33
Jens Krhnert (ORAYLIS GmbH) (JK) -
Machine LearningAzure ML and Data Factory are integrated
Predictive Maintenance
Build 2015 2015 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.6/23/2015 2:25 PM34
Jens Krhnert (ORAYLIS GmbH) (JK) -
Think BigStart SmallDiscover VisualScale Fast
Take away
36Go try
Microsoft Azure kostenlos testen:http://azure.microsoft.com/de-de/pricing/free-trial/
Hortonworks Download HDP Sandbox (Hadoop):http://hortonworks.com/products/hortonworks-sandbox/#install
ORAYLIS Blog: BI & Big Data Themenhttp://blog.oraylis.de/
Download Power BI Designer (kostenlos): https://www.powerbi.com/dashboards/downloads/
ORAYLIS TV: Jens Krhnert: Power BI Seriehttps://www.youtube.com/user/oraylisbi
ORAYLIS BI Guide (Workshops u.a. fr Big Data mit Hadoop)http://www.oraylis.de/loesungen/oraylis-bi-guide