big data - what's the big deal

Post on 07-Nov-2014

176 Views

Category:

Data & Analytics

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

This video is a recording of a tech talk where we explain the basics of Big Data. It has certainly been the buzzword in the IT industry and this is an effort towards a level 100 talk where people would learn about the history, basics, current market needs and in and out of Big Data.

TRANSCRIPT

BIG DATA – WHAT’S THE BIG DEAL

Debarchan SarkarMicrosoft Corporation

The call would start soon, please be on mute.Thanks for your time and patience.

WHO AM I

Debarchan, from Calcutta with an Indian heart and a global mind. .NET programmer who fell in love with Open Source, specifically Apache Hadoop Author. Community enthusiast. Cricket and music lover. One who gets really scared and bored if tomorrow is exactly like today. Known is a drop, the unknown is an ocean.

WHAT IS BIG DATA?

Data Complexity: Variety and Velocity

Terabytes

Gigabytes

Megabytes

Petabytes Big

DataLog files

Spatial & GPS coordinates

Data market feeds

eGov feeds

Weather

Text/image

Click stream

Wikis/blogs

Sensors/RFID/devices

Social sentiment

Audio/video

Web 2.0

Web Logs

Digital Marketing

Search Marketing

Recommendations

Advertising

Mobile

Collaboration

eCommerce

ERP/CRM

Payables

Payroll

Inventory

Contacts

Deal Tracking

Sales Pipeline

How do I optimize my fleet based on weather and traffic patterns?

SOCIAL & WEB ANALYTICS

LIVE DATA FEEDS

ADVANCED ANALYTICS

What’s the social sentiment for my brand or products

How do I better predict future outcomes?

A NEW SET OF QUESTIONS

COMMON BIG DATA CUSTOMER SCENARIOSGAIN COMPETITIVE ADVANTAGE BY MOVING FIRST AND FAST IN YOUR INDUSTRY

Web app optimization

Smart meter monitoring

Equipment monitoring

Advertising analysis

Life sciences research

Fraud detection

Healthcare outcomes

Weather forecasting

Natural resource exploration

Social network analysis

Churn analysis

Traffic flow optimization

IT infrastructure optimization

Legal discovery

THE BIG DATA LIFECYCLE

InsightManage Enrich

RelationalNon-Relational Streaming

MANAGE ANY DATA, ANY SIZE, ANYWHERE

010101010101010101101010101010101001010101010101101010101010

Unified Monitoring, Management & Security

Data Movement

Extremely large volume of unstructured web logsAd hoc analysis of logs to prototype patternsHadoop data cluster feeds large 24TB cubeBusiness users analyze cube data

6 PB Hadoop Cluster

24 TB SQL Server AS Cube

Microsoft BI Tools

E.g. STRUCTURED & UNSTRUCTURED DATA

InsightManage Enrich

THE BIG DATA LIFECYCLE

ENRICH BY CONNECTING TO THE WORLDS DATA

Discover

Combine

Refine

POWER OF COMBINING THE WORLDS DATA

Personal Data

OrganizationalData

CommunityData

WorldData

Value

E.g. VALUE OF EXTERNAL DATA

“When it comes to business intelligence, Microsoft SQL Server 2012 demonstrates that the platform has continued to advance and keep up with the innovations that are happening in big data."

David Mariani, Vice President of Engineering

Connects to more than 1 billion signals

Across 15 leading social networks, including Facebook

Generates a ‘Klout’ score for individual people, brands & partners

Enables analysis, targeting and social graphs

InsightManage Enrich

THE BIG DATA LIFECYCLE

INSIGHTS ON ANY DATA, ALL USERS, WHEREVER THEY ARE

RelationalNon-Relational Streaming

010101010101010101101010101010101001010101010101101010101010

BI Professionals Business AnalystsData Scientists

INSIGHTS FOR ALL USERS THROUGH FAMILIAR TOOLS

Advanced Analytics from Microsoft and 3rd parties

Self Service Analysis with PowerPivot & Power View

Interactivity & exploration with Hadoop data in Excel

PB TB GB

BI Professionals Business AnalystsData Scientists

16

• Application written in java for Big Data Processing• Uses the “Map-Reduce” Processing Paradigm• Characteristics: How is it different from traditional SQL

Server?1. Optimized for distributed storage and computing of data2. Highly-scalable (scale out model)3. Commodity HW-based4. Open Source

Þ Very low cost for acquisition and storage

Hadoop is for Big Data.

HadoopData Analytics

Dataflow

17

Distributed Storage(HDFS)

Query(Hive)

Distributed Processing(Map Reduce)

Scripting(Pig)

NoSQ

L Data

base

(HB

ase

)

Metadata(HCatalog)

Data

Inte

gra

tion

( OD

BC

/ SQ

OO

P/

REST)

Busin

ess In

tellig

ence

(E

xcel, Po

wer V

iew

…)

Machine Learning(Mahout)

Graph(Pegasus)

Stats processin

g(RHadoop

)

Pipelin

e /

workfl

ow

(Oozie

)

Log fi

le

aggre

gatio

n(Flu

me)

Active

D

irecto

ry (S

ecu

rity)Syste

m C

ente

r

The Hadoop Ecosystem

18

Welcome to the Zoo!HDInsightApache™Hadoop™on Windows

Azure Blob StorageLibHDFSFTPS

ActiveDirectory

Need to Know*

StreamInsight

JDBC Connector

Good to Know*

HCatalog OozieAmbari

Hadoop: The Definitive Guide 3rd Ed.- Tom White, O’Reilly Books

19

Feed us back

• Support Team’s blog: http://blogs.msdn.com/b/bigdatasupport/ • Facebook Page: https://www.facebook.com/MicrosoftBigData • Facebook Group: https://www.facebook.com/groups/bigdatalearnings/ • Twitter: @debarchans

Read more:• http://en.wikipedia.org/wiki/Hadoop• http://en.wikipedia.org/wiki/Big_data

Next Session:• Apache Hadoop – A deep dive

© 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

top related