utad - jornadas de informática - potential of big data

Post on 17-Feb-2017

136 Views

Category:

Software

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Potential of Big DataMarco António SilvaSolution Architectmadasi@microsoft.com

What is Big Data?

• AnalysisSome issues you already had to take care off

• Analysis• Transportation

Some issues you already had to take care off

• Analysis• Transportation• Access Control

Some issues you already had to take care off

• Analysis• Transportation• Access Control• Replication

Some issues you already had to take care off

• Analysis• Transportation• Access Control• Replication• Storage

Some issues you already had to take care off

• Analysis• Transportation• Access Control• Replication• Storage• Data Quality

Some issues you already had to take care off

New Generation of Data

The EMC Digital Universe Study launched its seventh edition. According to the study, by 2020, the amount of data in our digital universe is expected to grow from 4.4 trillion GB to 44 trillion GB

According to IBM, "2.5 exabytes - that's 2.5 billion gigabytes (GB) - of data was generated every day in 2012. That's big by anyone's standards. "About 75% of data is unstructured, coming from sources such as text, voice and video.“

How big is BIG?

How big is BIG?

Connected “Things” by 2020:26 billionGartner

Market for IoT by 2020:$1.9 trillionIDC

“Big Data as high volume, velocity and variety information assets that demand cost-effective, innovative forms of information processing for

enhanced insight and decision making.”

Gartner

Define Big Data

The Five “V”s of Big DataVolume Velocity Variety Veracity Value

Data at Rest

Data in Motion

Data in Many Forms

Data in Doubt

Data into Money

Giga, Tera, Exabyte’s of existing data to be stored and processed

Streaming data that requires fast analysis and response

Relational, Structured, Unstructured, Text, Audio, Video…

Data inconsistency, incompleteness, ambiguity, latency, noise, errors,…

New business models, insights and products can be created from the data

€ €

€€

Turning Big Data into Value

Volume

Velocity

Variety

Veracity

€ €

€€

€€

€€

Data Sources Analyse the Data• ERP• CRM• Inventory• Finance• Social Media• Logs• Video+Audi

o• Sensors• …

• Predictive Analysis

• Text Analysis• Sentiment

Analysis• Image

Processing• Computer

Vision• Voice Analysis• …

The Tools

HDInsight on Azure

• Based on Hortonworks Data Platform• Available in Windows and Linux flavors• Scale elastically

Reliable, scalable, distributed computing

HDInsight <academic_mode = “on” />MapReduce is a framework for processing parallelizable problems across huge datasets using a large number of computers (nodes), collectively referred to as a clustermap (in_key, in_value) -> list(out_key, intermediate_value) reduce (out_key, list(intermediate_value)) -> list(out_value)

MapReduce explained

Read linesfrom file

Convert line to

Key-Value Pair(s)

Filter (by

key/value)

Combine Values with

similar Keys

Shuffle data

across nodes

for reduces by Key

Sort by Key

Aggregate(reduce)

Filter (based on aggregated value)

Write results to file

Map Reduce

MapReduce Hello World

Deer Bear RiverCar Car RiverDeer Car Bear

Deer Bear River

Car Car River

Deer Car Bear

Deer, 1Bear, 1River, 1

Car, 1Car, 1River,

1

Deer, 1

Car, 1Bear,

1

Bear, 1

Bear, 1

Car, 1Car, 1Car, 1

Deer, 1

Deer, 1

River, 1River, 1

Bear, 2

Car, 3Deer,

2River,

2

Input Splitting Mapping Shuffling

Bear, 2

Car, 3

Deer, 2

River, 2

Reducing Finalresult

• Pig is a high level scripting language that is used with Apache Hadoop

• Excels at describing data analysis problems as data flows

• Is complete in that you can do all the required data manipulations with Pig

Pig knows Latin

Azure HDInsight

Windows Azure Blob Storage (WABS) Distributed File System

Applications (by cluster type)Spark

Spark Spark

Streaming Spark MLlib

Storm

Storm Kafka

HBase

HBase Zookeeper

….

Hadoop HDFS APIs MapReduce Sqoop Pig Hive (Tez) Mahout Oozie

Yet Another Resource Negotiator (YARN)

Acquisition Azure Data

Factory

Stream Processing

• Steam Analytics• Event Hub

Machine Learning Azure Machine

Learning

NoSQL Table Storage DocumentDB

Cortana Intelligence SuiteTransform data into intelligent action

Personal Digital Assistant – Cortana

Perceptual Intelligence

Preconfigured Solutions

Dashboards and Visualizations

Machine Learning and Analytics

Big Data Store

Information Management

Business ScenariosRecommendations,

customer churn,forecasting, etc.

Perceptual IntelligenceFace, vision

Speech, text

Personal Digital Assistant

Cortana

Dashboards and Visualizations

Power BI

Cortana Intelligence SuiteTransform data into intelligent action

DATA

Business apps

Custom apps

Sensors and devices

INTELLIGENCE ACTION

People

Automated Systems

Big Data Stores

Azure Data Lake store

Azure SQL Data Warehouse

Information Management

Azure Data Factory

Azure Data Catalog

Azure Event Hub

Machine Learning

and Analytics

Azure Machine Learning

Azure HDInsight (Hadoop and Spark)

Azure Stream Analytics

Azure Data Lake analytics service

Pay for performance

Operational efficiency

Smart buildings

Predictive maintenance

Supply chain management

Lifetime customer value

Personalized offers

Product recommendation

Fraud detection

Credit risk management

Customer Acquisition

Cross-sell and upsell

Loyalty programs

Marketing mix optimization

Cortana Intelligence scenariosEXAMPLE SOLUTIONS

Sales and marketing

Finance and risk

Customer and channel

Operations and workforce

Azure Stream AnalyticsProcess real-time data in Azure using a simple SQL languageConsumes millions of real-time events from Event Hub collected from devices, sensors, infrastructure, and applications

Performs time-sensitive analysis using SQL-like language against multiple real-time streams and reference data

Outputs to persistent stores, dashboards or back to devices

Point of Service Devices

Self CheckoutStations

Kiosks

Smart Phones

Slates/Tablets

PCs/Laptops

Servers

Digital Signs

DiagnosticEquipmentRemote Medical

MonitorsLogic

Controllers

SpecializedDevicesThin

Clients

Handhelds

Security

POS Terminals

AutomationDevices

VendingMachines

Kinect

ATM

Stream Analytics

Azure Data FactoryFully managed service to support orchestration of data movement and processing

Connect to relational or non-relational data that is on-premises or in the cloud

Single pane of glass to monitor and manage data processing pipelines.

Publish to Power BI

Compose and orchestrate data services at scale

No SQLDB

Blob

C#

MapReduceTrusted data

BI & analyticsHive

Pig

Stored Procedures

VM

Azure Machine Learning

ML Algorithms are best of breed and embrace OSS• MS + R + Python + BYOA

ML Studio for productive development• Faster experiments results in faster improvements• Visual Workflows & ML Experiments

ML Operationalization to remove deployment friction• Build entire ML Apps & Deploy as Cloud APIs

ML Gallery• Provide ML applications like apps in an ‘app store’• Publish/consume APIs in a 2 sided market

Help organizations eliminate undifferentiated heavy lifting

Powerful predictive analytics in AzureAzure Machine Learning

Azure Data CatalogEnable enterprise-wide self-service data source registration and discoveryA metadata repository that allow users to register, enrich, understand, discover, and consume data sources

Delivers differentiated value though‒ Data source discovery; rather than data

discovery ‒ Support for data from any source; Structured and

unstructured, on premises and in the cloud‒ Publishing, discovery and consumption through

any tool ‒ Annotation crowdsourcing: empowering any

user to capture and share their knowledge.

This, while allowing IT to maintain control and oversight

Power BI

Excel BI Investments

Power Map with custom maps allows deeper geospatial explorations and storytelling

Power Query brings modern data discovery, connectivity, shaping and publishing to Excel

Analysis Services connectivity for Power View allows users to leverage existing IT investments

Support for more sophisticated data models in Power Pivot – date and calc tables, many-to-many relationships, etc

Power Map w/ Custom

Maps

Power Query

Power BI investments

Power BI dashboards and KPIs for monitoring the health of your business

New data visualizations and touch-optimized exploration in HTML5

Power BI mobile apps across devices including iPad and iPhone

Support for new data sources including SalesForce.com, Dynamics CRM online and SQL Server Analysis Services

Dashboard

Tree Map

A hyper scale repository for big data analytic workloadsIntroducing Azure Data Lake Store

• Hadoop File System compatible with HDFS™• Integrated with HDInsight, Revolution R, Hortonworks, Cloudera• Based on YARN

• Petabyte-sized files• No size limits to data in single account•Massive throughput to increase performance

• AAD based access control• Data management

Devices

Azure Data Lake Analytics ServiceA new distributed analytics service

Built on Apache YARNScales dynamically with the turn of a dialPay by the querySupports Azure AD for access control, roles, and integration with on-prem identity systemsBuilt with U-SQL to unify the benefits of SQL with the power of C# Processes data across Azure

37

Stream Analytics

TransformIngest

Example overall data flow and Architecture

Web logs

Present & decide

IoT, Mobile Devices etc.

Social Data

Event Hubs HDInsight

Azure Data Factory

Azure SQL DB

Azure Data Lake

Azure Machine Learning

(Fraud detection etc.)

Power BI

Web dashboards

Mobile devices

DW / Long-term storage

Predictive analytics

Event & data producers

Azure SQL DW

How can I develop Faster?

Cortana Intelligence Preconfigured Solutions

Customer Churn

Product Recommendation

Sentiment Analysis

From zero to finished, analytical apps and scenariosPre-Configured Solutions designed to help customers jumpstart the creation of analytics solution

Allows customers to accelerates the process of building analytical apps

Go from zero to sample app in minutes, from sample app to finished solution in a week

Cognitive Services

Cortana Intelligence Gallery

What type of Problems can I Solve with these?

The Internet of Things – ManufacturingGLOBAL OPERATIONS

I can see my production line status and recommend adjustments to better manage operational cost.

I know when to deploy the right resources for predictive maintenance to minimize equipment failures and reduce service cost.

I gain insight into usage patterns from multiple customers and track equipment deterioration, enabling me to reengineer products for better performance.

MANUFACTURING PLANT

Aggregate product data, customer sentiment, and other third-party syndicated data to identify and correct quality issues.

Manage equipment remotely, using temperature limits and other settings to conserve energy and reduce costs.

Monitor production flow in near-real time to eliminate waste and unnecessary work in process inventory.

GLOBAL FACILITY INSIGHT

Implement condition-based maintenance alerts to eliminate machine down-time and increase throughput.

THIRD-PARTY LOGISTICS

Provide cross-channel visibility into inventories to optimize supply and reduce shared costs in the value chain.

CUSTOMER SITE

Transmits operational information to the partner (e.g. OEM) and to field service engineers for remote process automation and optimization.

Management

R&D

Field Service

The Internet of Things – Retail

Marketing

101 0

0 101

1 010

0 10 1

000 0

1 01 0

111 0

1 01 0

1 001

0 11 0

1 001

0 10 0

1 01 0

111 0

1 010

1 001

0 10 1

101 0

101 1

0 10 0

101 1

0 1

001 0

1 010

MOBILE EXPERIENCE

STORE PURCHASE HISTORY:

Dog food

M T W Th F

WeatherData

1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 1 1 0 1 0 0 1 0 1 0 1 1 0 0 1 0 1 1 0 1 0 0 1 0 1 0 0 0 0 1 0 1 0 1 0 1 0 1 0 1 1 0 1 0 0 1 0 1 1 0 1 0 0 1 0

1 0 1 0

Merchandizing

IN-STORE SHOPPING

OnlineBehavior

Shopping Route

REFLECTION

BESTDEAL

INSPIRATION,DISCOVERY,

PRE-SHOPPING

PurchaseHistory

RIGHT OFFER, RIGHT TIME, RIGHT PLACE

IoT DATA FUELS CUSTOMER AND PRODUCT INSIGHTS

100

100

100

100

100

1001

011

010

0101

011

001

011

010

010

100

001

010

101

010

110

100

1011

0100

101

010

Dog<3@outlook.comCHECKOUT

Retail

200ft

Have youseen these!

We’re ready for the rain! #ShoppingSuccess

42

The Internet of Things – Hospitality & Travel

Save money with more accurate arrival time predictions

Provide a seamless traveler experience from the curb to the gate, and enable context-sensitive notifications

Provide guests with a connected tablet to control room settings, request services, and provide feedback—and save their preferences

Centrally manage critical station assets—everything from communication and security networks to escalators and HVAC control systems

Send reports and sensor data to maintenance crews for faster turnaround

Configure notifications on employee devices of restaurant equipment maintenance needs

Manage inventory in near real time, and monitor food storage temperatures and expirations

NEW GATE:B7

25% off

ON TIME

How to Start?

Links and References• Azure Portal• https://

azure.microsoft.com• Learn• http://

channel9.msdn.com• http://

build.microsoft.com

• Try• IoT Suite• https://www.azureiotsuite.com/• Cognitive Services• https://www.microsoft.com/cognitive-

services• Cortana Analytics Suite• https://www.microsoft.com/en-us/

server-cloud/cortana-intelligence-suite/

• …

Thank You

Marco António Silvamadasi@microsoft.com

© Microsoft Corporation. All rights reserved.

top related