utad - jornadas de informática - potential of big data

49
Potential of Big Data Marco António Silva Solution Architect [email protected]

Upload: marco-silva

Post on 17-Feb-2017

135 views

Category:

Software


0 download

TRANSCRIPT

Page 1: UTAD - Jornadas de Informática - Potential of Big Data

Potential of Big DataMarco António SilvaSolution [email protected]

Page 2: UTAD - Jornadas de Informática - Potential of Big Data

What is Big Data?

Page 3: UTAD - Jornadas de Informática - Potential of Big Data
Page 4: UTAD - Jornadas de Informática - Potential of Big Data

• AnalysisSome issues you already had to take care off

Page 5: UTAD - Jornadas de Informática - Potential of Big Data

• Analysis• Transportation

Some issues you already had to take care off

Page 6: UTAD - Jornadas de Informática - Potential of Big Data

• Analysis• Transportation• Access Control

Some issues you already had to take care off

Page 7: UTAD - Jornadas de Informática - Potential of Big Data

• Analysis• Transportation• Access Control• Replication

Some issues you already had to take care off

Page 8: UTAD - Jornadas de Informática - Potential of Big Data

• Analysis• Transportation• Access Control• Replication• Storage

Some issues you already had to take care off

Page 9: UTAD - Jornadas de Informática - Potential of Big Data

• Analysis• Transportation• Access Control• Replication• Storage• Data Quality

Some issues you already had to take care off

Page 10: UTAD - Jornadas de Informática - Potential of Big Data

New Generation of Data

Page 11: UTAD - Jornadas de Informática - Potential of Big Data

The EMC Digital Universe Study launched its seventh edition. According to the study, by 2020, the amount of data in our digital universe is expected to grow from 4.4 trillion GB to 44 trillion GB

According to IBM, "2.5 exabytes - that's 2.5 billion gigabytes (GB) - of data was generated every day in 2012. That's big by anyone's standards. "About 75% of data is unstructured, coming from sources such as text, voice and video.“

How big is BIG?

Page 12: UTAD - Jornadas de Informática - Potential of Big Data

How big is BIG?

Connected “Things” by 2020:26 billionGartner

Market for IoT by 2020:$1.9 trillionIDC

Page 13: UTAD - Jornadas de Informática - Potential of Big Data

“Big Data as high volume, velocity and variety information assets that demand cost-effective, innovative forms of information processing for

enhanced insight and decision making.”

Gartner

Define Big Data

Page 14: UTAD - Jornadas de Informática - Potential of Big Data
Page 15: UTAD - Jornadas de Informática - Potential of Big Data

The Five “V”s of Big DataVolume Velocity Variety Veracity Value

Data at Rest

Data in Motion

Data in Many Forms

Data in Doubt

Data into Money

Giga, Tera, Exabyte’s of existing data to be stored and processed

Streaming data that requires fast analysis and response

Relational, Structured, Unstructured, Text, Audio, Video…

Data inconsistency, incompleteness, ambiguity, latency, noise, errors,…

New business models, insights and products can be created from the data

€ €

€€

Page 16: UTAD - Jornadas de Informática - Potential of Big Data

Turning Big Data into Value

Volume

Velocity

Variety

Veracity

€ €

€€

€€

€€

Data Sources Analyse the Data• ERP• CRM• Inventory• Finance• Social Media• Logs• Video+Audi

o• Sensors• …

• Predictive Analysis

• Text Analysis• Sentiment

Analysis• Image

Processing• Computer

Vision• Voice Analysis• …

Page 17: UTAD - Jornadas de Informática - Potential of Big Data

The Tools

Page 18: UTAD - Jornadas de Informática - Potential of Big Data

HDInsight on Azure

• Based on Hortonworks Data Platform• Available in Windows and Linux flavors• Scale elastically

Reliable, scalable, distributed computing

Page 19: UTAD - Jornadas de Informática - Potential of Big Data

HDInsight <academic_mode = “on” />MapReduce is a framework for processing parallelizable problems across huge datasets using a large number of computers (nodes), collectively referred to as a clustermap (in_key, in_value) -> list(out_key, intermediate_value) reduce (out_key, list(intermediate_value)) -> list(out_value)

Page 20: UTAD - Jornadas de Informática - Potential of Big Data

MapReduce explained

Read linesfrom file

Convert line to

Key-Value Pair(s)

Filter (by

key/value)

Combine Values with

similar Keys

Shuffle data

across nodes

for reduces by Key

Sort by Key

Aggregate(reduce)

Filter (based on aggregated value)

Write results to file

Map Reduce

Page 21: UTAD - Jornadas de Informática - Potential of Big Data

MapReduce Hello World

Deer Bear RiverCar Car RiverDeer Car Bear

Deer Bear River

Car Car River

Deer Car Bear

Deer, 1Bear, 1River, 1

Car, 1Car, 1River,

1

Deer, 1

Car, 1Bear,

1

Bear, 1

Bear, 1

Car, 1Car, 1Car, 1

Deer, 1

Deer, 1

River, 1River, 1

Bear, 2

Car, 3Deer,

2River,

2

Input Splitting Mapping Shuffling

Bear, 2

Car, 3

Deer, 2

River, 2

Reducing Finalresult

Page 22: UTAD - Jornadas de Informática - Potential of Big Data

• Pig is a high level scripting language that is used with Apache Hadoop

• Excels at describing data analysis problems as data flows

• Is complete in that you can do all the required data manipulations with Pig

Pig knows Latin

Page 23: UTAD - Jornadas de Informática - Potential of Big Data

Azure HDInsight

Windows Azure Blob Storage (WABS) Distributed File System

Applications (by cluster type)Spark

Spark Spark

Streaming Spark MLlib

Storm

Storm Kafka

HBase

HBase Zookeeper

….

Hadoop HDFS APIs MapReduce Sqoop Pig Hive (Tez) Mahout Oozie

Yet Another Resource Negotiator (YARN)

Acquisition Azure Data

Factory

Stream Processing

• Steam Analytics• Event Hub

Machine Learning Azure Machine

Learning

NoSQL Table Storage DocumentDB

Page 24: UTAD - Jornadas de Informática - Potential of Big Data
Page 25: UTAD - Jornadas de Informática - Potential of Big Data

Cortana Intelligence SuiteTransform data into intelligent action

Personal Digital Assistant – Cortana

Perceptual Intelligence

Preconfigured Solutions

Dashboards and Visualizations

Machine Learning and Analytics

Big Data Store

Information Management

Page 26: UTAD - Jornadas de Informática - Potential of Big Data

Business ScenariosRecommendations,

customer churn,forecasting, etc.

Perceptual IntelligenceFace, vision

Speech, text

Personal Digital Assistant

Cortana

Dashboards and Visualizations

Power BI

Cortana Intelligence SuiteTransform data into intelligent action

DATA

Business apps

Custom apps

Sensors and devices

INTELLIGENCE ACTION

People

Automated Systems

Big Data Stores

Azure Data Lake store

Azure SQL Data Warehouse

Information Management

Azure Data Factory

Azure Data Catalog

Azure Event Hub

Machine Learning

and Analytics

Azure Machine Learning

Azure HDInsight (Hadoop and Spark)

Azure Stream Analytics

Azure Data Lake analytics service

Page 27: UTAD - Jornadas de Informática - Potential of Big Data

Pay for performance

Operational efficiency

Smart buildings

Predictive maintenance

Supply chain management

Lifetime customer value

Personalized offers

Product recommendation

Fraud detection

Credit risk management

Customer Acquisition

Cross-sell and upsell

Loyalty programs

Marketing mix optimization

Cortana Intelligence scenariosEXAMPLE SOLUTIONS

Sales and marketing

Finance and risk

Customer and channel

Operations and workforce

Page 28: UTAD - Jornadas de Informática - Potential of Big Data

Azure Stream AnalyticsProcess real-time data in Azure using a simple SQL languageConsumes millions of real-time events from Event Hub collected from devices, sensors, infrastructure, and applications

Performs time-sensitive analysis using SQL-like language against multiple real-time streams and reference data

Outputs to persistent stores, dashboards or back to devices

Point of Service Devices

Self CheckoutStations

Kiosks

Smart Phones

Slates/Tablets

PCs/Laptops

Servers

Digital Signs

DiagnosticEquipmentRemote Medical

MonitorsLogic

Controllers

SpecializedDevicesThin

Clients

Handhelds

Security

POS Terminals

AutomationDevices

VendingMachines

Kinect

ATM

Stream Analytics

Page 29: UTAD - Jornadas de Informática - Potential of Big Data

Azure Data FactoryFully managed service to support orchestration of data movement and processing

Connect to relational or non-relational data that is on-premises or in the cloud

Single pane of glass to monitor and manage data processing pipelines.

Publish to Power BI

Compose and orchestrate data services at scale

No SQLDB

Blob

C#

MapReduceTrusted data

BI & analyticsHive

Pig

Stored Procedures

VM

Azure Machine Learning

Page 30: UTAD - Jornadas de Informática - Potential of Big Data

ML Algorithms are best of breed and embrace OSS• MS + R + Python + BYOA

ML Studio for productive development• Faster experiments results in faster improvements• Visual Workflows & ML Experiments

ML Operationalization to remove deployment friction• Build entire ML Apps & Deploy as Cloud APIs

ML Gallery• Provide ML applications like apps in an ‘app store’• Publish/consume APIs in a 2 sided market

Help organizations eliminate undifferentiated heavy lifting

Powerful predictive analytics in AzureAzure Machine Learning

Page 31: UTAD - Jornadas de Informática - Potential of Big Data

Azure Data CatalogEnable enterprise-wide self-service data source registration and discoveryA metadata repository that allow users to register, enrich, understand, discover, and consume data sources

Delivers differentiated value though‒ Data source discovery; rather than data

discovery ‒ Support for data from any source; Structured and

unstructured, on premises and in the cloud‒ Publishing, discovery and consumption through

any tool ‒ Annotation crowdsourcing: empowering any

user to capture and share their knowledge.

This, while allowing IT to maintain control and oversight

Page 32: UTAD - Jornadas de Informática - Potential of Big Data

Power BI

Page 33: UTAD - Jornadas de Informática - Potential of Big Data

Excel BI Investments

Power Map with custom maps allows deeper geospatial explorations and storytelling

Power Query brings modern data discovery, connectivity, shaping and publishing to Excel

Analysis Services connectivity for Power View allows users to leverage existing IT investments

Support for more sophisticated data models in Power Pivot – date and calc tables, many-to-many relationships, etc

Power Map w/ Custom

Maps

Power Query

Page 34: UTAD - Jornadas de Informática - Potential of Big Data

Power BI investments

Power BI dashboards and KPIs for monitoring the health of your business

New data visualizations and touch-optimized exploration in HTML5

Power BI mobile apps across devices including iPad and iPhone

Support for new data sources including SalesForce.com, Dynamics CRM online and SQL Server Analysis Services

Dashboard

Tree Map

Page 35: UTAD - Jornadas de Informática - Potential of Big Data

A hyper scale repository for big data analytic workloadsIntroducing Azure Data Lake Store

• Hadoop File System compatible with HDFS™• Integrated with HDInsight, Revolution R, Hortonworks, Cloudera• Based on YARN

• Petabyte-sized files• No size limits to data in single account•Massive throughput to increase performance

• AAD based access control• Data management

Devices

Page 36: UTAD - Jornadas de Informática - Potential of Big Data

Azure Data Lake Analytics ServiceA new distributed analytics service

Built on Apache YARNScales dynamically with the turn of a dialPay by the querySupports Azure AD for access control, roles, and integration with on-prem identity systemsBuilt with U-SQL to unify the benefits of SQL with the power of C# Processes data across Azure

37

Page 37: UTAD - Jornadas de Informática - Potential of Big Data

Stream Analytics

TransformIngest

Example overall data flow and Architecture

Web logs

Present & decide

IoT, Mobile Devices etc.

Social Data

Event Hubs HDInsight

Azure Data Factory

Azure SQL DB

Azure Data Lake

Azure Machine Learning

(Fraud detection etc.)

Power BI

Web dashboards

Mobile devices

DW / Long-term storage

Predictive analytics

Event & data producers

Azure SQL DW

Page 38: UTAD - Jornadas de Informática - Potential of Big Data

How can I develop Faster?

Page 39: UTAD - Jornadas de Informática - Potential of Big Data

Cortana Intelligence Preconfigured Solutions

Customer Churn

Product Recommendation

Sentiment Analysis

From zero to finished, analytical apps and scenariosPre-Configured Solutions designed to help customers jumpstart the creation of analytics solution

Allows customers to accelerates the process of building analytical apps

Go from zero to sample app in minutes, from sample app to finished solution in a week

Page 40: UTAD - Jornadas de Informática - Potential of Big Data

Cognitive Services

Page 41: UTAD - Jornadas de Informática - Potential of Big Data

Cortana Intelligence Gallery

Page 42: UTAD - Jornadas de Informática - Potential of Big Data

What type of Problems can I Solve with these?

Page 43: UTAD - Jornadas de Informática - Potential of Big Data

The Internet of Things – ManufacturingGLOBAL OPERATIONS

I can see my production line status and recommend adjustments to better manage operational cost.

I know when to deploy the right resources for predictive maintenance to minimize equipment failures and reduce service cost.

I gain insight into usage patterns from multiple customers and track equipment deterioration, enabling me to reengineer products for better performance.

MANUFACTURING PLANT

Aggregate product data, customer sentiment, and other third-party syndicated data to identify and correct quality issues.

Manage equipment remotely, using temperature limits and other settings to conserve energy and reduce costs.

Monitor production flow in near-real time to eliminate waste and unnecessary work in process inventory.

GLOBAL FACILITY INSIGHT

Implement condition-based maintenance alerts to eliminate machine down-time and increase throughput.

THIRD-PARTY LOGISTICS

Provide cross-channel visibility into inventories to optimize supply and reduce shared costs in the value chain.

CUSTOMER SITE

Transmits operational information to the partner (e.g. OEM) and to field service engineers for remote process automation and optimization.

Management

R&D

Field Service

Page 44: UTAD - Jornadas de Informática - Potential of Big Data

The Internet of Things – Retail

Marketing

101 0

0 101

1 010

0 10 1

000 0

1 01 0

111 0

1 01 0

1 001

0 11 0

1 001

0 10 0

1 01 0

111 0

1 010

1 001

0 10 1

101 0

101 1

0 10 0

101 1

0 1

001 0

1 010

MOBILE EXPERIENCE

STORE PURCHASE HISTORY:

Dog food

M T W Th F

WeatherData

1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 1 1 0 1 0 0 1 0 1 0 1 1 0 0 1 0 1 1 0 1 0 0 1 0 1 0 0 0 0 1 0 1 0 1 0 1 0 1 0 1 1 0 1 0 0 1 0 1 1 0 1 0 0 1 0

1 0 1 0

Merchandizing

IN-STORE SHOPPING

OnlineBehavior

Shopping Route

REFLECTION

BESTDEAL

INSPIRATION,DISCOVERY,

PRE-SHOPPING

PurchaseHistory

RIGHT OFFER, RIGHT TIME, RIGHT PLACE

IoT DATA FUELS CUSTOMER AND PRODUCT INSIGHTS

100

100

100

100

100

1001

011

010

0101

011

001

011

010

010

100

001

010

101

010

110

100

1011

0100

101

010

Dog<[email protected]

Retail

200ft

Have youseen these!

We’re ready for the rain! #ShoppingSuccess

42

Page 45: UTAD - Jornadas de Informática - Potential of Big Data

The Internet of Things – Hospitality & Travel

Save money with more accurate arrival time predictions

Provide a seamless traveler experience from the curb to the gate, and enable context-sensitive notifications

Provide guests with a connected tablet to control room settings, request services, and provide feedback—and save their preferences

Centrally manage critical station assets—everything from communication and security networks to escalators and HVAC control systems

Send reports and sensor data to maintenance crews for faster turnaround

Configure notifications on employee devices of restaurant equipment maintenance needs

Manage inventory in near real time, and monitor food storage temperatures and expirations

NEW GATE:B7

25% off

ON TIME

Page 46: UTAD - Jornadas de Informática - Potential of Big Data

How to Start?

Page 47: UTAD - Jornadas de Informática - Potential of Big Data

Links and References• Azure Portal• https://

azure.microsoft.com• Learn• http://

channel9.msdn.com• http://

build.microsoft.com

• Try• IoT Suite• https://www.azureiotsuite.com/• Cognitive Services• https://www.microsoft.com/cognitive-

services• Cortana Analytics Suite• https://www.microsoft.com/en-us/

server-cloud/cortana-intelligence-suite/

• …

Page 49: UTAD - Jornadas de Informática - Potential of Big Data

© Microsoft Corporation. All rights reserved.