2015 05 23 sqltalks debarchan real time analysis

9
BUILDING A REAL TIME ANALYTIC DASHBOARD USING BIGDATA AND HADOOP USING AZURE SERVICES Debarchan Sarkar @debarchans Support Escalation Engineer, Microsoft

Upload: madhu-kaaja

Post on 15-Jan-2016

222 views

Category:

Documents


0 download

DESCRIPTION

2015 05 23 SQLTalks Debarchan Real Time Analysis

TRANSCRIPT

Page 1: 2015 05 23 SQLTalks Debarchan Real Time Analysis

BUILDING A REAL TIME ANALYTIC DASHBOARD USING BIGDATA AND HADOOP USING AZURE SERVICES

Debarchan Sarkar @debarchans Support Escalation Engineer, Microsoft

Page 2: 2015 05 23 SQLTalks Debarchan Real Time Analysis

Azure Services used Cloud Services (Worker Role) - Run highly scalable custom code on a

Platform as a Service (PaaS) environment using the technology you choose, such as C#, Java, PHP, Python, Node.js, or something else.

Machine Learning - Applied machine learning, which means that in minutes your model is live as a fully managed web service that can connect to any data, anywhere.

Event Hub - Highly scalable publish-subscribe event ingestor that can intake millions of events per second so that you can process and analyze the massive amounts of data produced by your connected devices and applications.

Stream Analytics - Provides out-of-the-box integration with Event Hubs and processes ingested events in real-time, comparing multiple real-time streams or comparing real-time streams together with historical values and models.

Page 3: 2015 05 23 SQLTalks Debarchan Real Time Analysis

Azure Services used…….continued

Blobs – Native storage infrastructure for Azure.

Web App – Formerly known as Azure Web Sites. Platform as a service (PaaS) which allows publishing Web apps running on multiple frameworks and written in different programming languages (.NET, node.js, php, Python and Java).

HDInsight – 100% open source compatible Hadoop as a service on Azure built on top of Hortonworks Data Platform (HDP).

PowerBI - Place where you can pull together a full solution that gives an “at a glance” view of visualizations and insights that are available for a set - or sets of data. It also is the place to enable operations such as refreshing the data for the solution.

Page 4: 2015 05 23 SQLTalks Debarchan Real Time Analysis

Logical Architecture

Page 5: 2015 05 23 SQLTalks Debarchan Real Time Analysis

Analysis on Data at motion Azure Worker Role, Tweet Publisher connects to Twitter Streaming API with

its own Twitter Application ID and a list of keywords to track.

The worker role then calls an Azure Machine Learning web service to perform sentiment analysis on each tweet that it receives and assigns a sentiment score. It also does simple topic detection and then send the tweet to an Azure Event Hub called tweetsin

Page 6: 2015 05 23 SQLTalks Debarchan Real Time Analysis

Analysis on Data at motion…….continued

There are three Azure Stream Analytics jobs monitoring the tweetsin event hub:

The first job, TweetsArchive archives all the tweets to Azure Blob Storage. The second job, TweetsByTopic counts the number of tweets along with the average sentiment for each topic within the last 5 seconds. The aggregated results are saved to another Azure Event Hub called TweetsOut2

The second job, TweetsSummary counts the number of tweets along with the average sentiment for all tweets within the last 5 seconds. It sends the result to an Azure Event Hub, call TweetsOut1.

Page 7: 2015 05 23 SQLTalks Debarchan Real Time Analysis

Analysis on Data at motion…….continued

There is another Azure Worker Role, Tweet Consumer that constantly polls the two event hubs, TweetsOut1 and TweetsOut2, for any data. If there’s anything in these event hubs, it picks it up, then send it on to the real-time dashboard using **SignalR

The real-time dashboard is an Azure Web Site written in simple HTML5 page that connects to the SignalR host, TweetConsumer Worker Role each time it is loaded in a browser

**ASP.NET SignalR is a new library for ASP.NET developers that makes it incredibly simple to add real-time web functionality to your applications.

Page 8: 2015 05 23 SQLTalks Debarchan Real Time Analysis

Analysis on Data at rest On a daily basis, you can use an Azure Automation Runbook to spin

up a new HDInsight cluster, execute a Hive script, then shut down the cluster

There’s an Excel workbook that uses Power Query to connect to HDInsight, bring the Hive tables into Power Pivot, and then visualize using Power View

Finally, the Excel workbook is uploaded to Power BI Preview (on Azure) and then configured for data refresh. It can use various PowerBI services like Q&A to provide insights on the data with a few clicks and key presses

Page 9: 2015 05 23 SQLTalks Debarchan Real Time Analysis

Questions/Feedback/More info@debarchans

https://www.facebook.com/groups/bigdatalearnings/

https://www.youtube.com/user/Debarchans

http://twitterbigdata.codeplex.com/