big data - what's the big deal
DESCRIPTION
This video is a recording of a tech talk where we explain the basics of Big Data. It has certainly been the buzzword in the IT industry and this is an effort towards a level 100 talk where people would learn about the history, basics, current market needs and in and out of Big Data.TRANSCRIPT
BIG DATA – WHAT’S THE BIG DEAL
Debarchan SarkarMicrosoft Corporation
The call would start soon, please be on mute.Thanks for your time and patience.
WHO AM I
Debarchan, from Calcutta with an Indian heart and a global mind. .NET programmer who fell in love with Open Source, specifically Apache Hadoop Author. Community enthusiast. Cricket and music lover. One who gets really scared and bored if tomorrow is exactly like today. Known is a drop, the unknown is an ocean.
WHAT IS BIG DATA?
Data Complexity: Variety and Velocity
Terabytes
Gigabytes
Megabytes
Petabytes Big
DataLog files
Spatial & GPS coordinates
Data market feeds
eGov feeds
Weather
Text/image
Click stream
Wikis/blogs
Sensors/RFID/devices
Social sentiment
Audio/video
Web 2.0
Web Logs
Digital Marketing
Search Marketing
Recommendations
Advertising
Mobile
Collaboration
eCommerce
ERP/CRM
Payables
Payroll
Inventory
Contacts
Deal Tracking
Sales Pipeline
How do I optimize my fleet based on weather and traffic patterns?
SOCIAL & WEB ANALYTICS
LIVE DATA FEEDS
ADVANCED ANALYTICS
What’s the social sentiment for my brand or products
How do I better predict future outcomes?
A NEW SET OF QUESTIONS
COMMON BIG DATA CUSTOMER SCENARIOSGAIN COMPETITIVE ADVANTAGE BY MOVING FIRST AND FAST IN YOUR INDUSTRY
Web app optimization
Smart meter monitoring
Equipment monitoring
Advertising analysis
Life sciences research
Fraud detection
Healthcare outcomes
Weather forecasting
Natural resource exploration
Social network analysis
Churn analysis
Traffic flow optimization
IT infrastructure optimization
Legal discovery
THE BIG DATA LIFECYCLE
InsightManage Enrich
RelationalNon-Relational Streaming
MANAGE ANY DATA, ANY SIZE, ANYWHERE
010101010101010101101010101010101001010101010101101010101010
Unified Monitoring, Management & Security
Data Movement
Extremely large volume of unstructured web logsAd hoc analysis of logs to prototype patternsHadoop data cluster feeds large 24TB cubeBusiness users analyze cube data
6 PB Hadoop Cluster
24 TB SQL Server AS Cube
Microsoft BI Tools
E.g. STRUCTURED & UNSTRUCTURED DATA
InsightManage Enrich
THE BIG DATA LIFECYCLE
ENRICH BY CONNECTING TO THE WORLDS DATA
Discover
Combine
Refine
POWER OF COMBINING THE WORLDS DATA
Personal Data
OrganizationalData
CommunityData
WorldData
Value
E.g. VALUE OF EXTERNAL DATA
“When it comes to business intelligence, Microsoft SQL Server 2012 demonstrates that the platform has continued to advance and keep up with the innovations that are happening in big data."
David Mariani, Vice President of Engineering
Connects to more than 1 billion signals
Across 15 leading social networks, including Facebook
Generates a ‘Klout’ score for individual people, brands & partners
Enables analysis, targeting and social graphs
InsightManage Enrich
THE BIG DATA LIFECYCLE
INSIGHTS ON ANY DATA, ALL USERS, WHEREVER THEY ARE
RelationalNon-Relational Streaming
010101010101010101101010101010101001010101010101101010101010
BI Professionals Business AnalystsData Scientists
INSIGHTS FOR ALL USERS THROUGH FAMILIAR TOOLS
Advanced Analytics from Microsoft and 3rd parties
Self Service Analysis with PowerPivot & Power View
Interactivity & exploration with Hadoop data in Excel
PB TB GB
BI Professionals Business AnalystsData Scientists
16
• Application written in java for Big Data Processing• Uses the “Map-Reduce” Processing Paradigm• Characteristics: How is it different from traditional SQL
Server?1. Optimized for distributed storage and computing of data2. Highly-scalable (scale out model)3. Commodity HW-based4. Open Source
Þ Very low cost for acquisition and storage
Hadoop is for Big Data.
HadoopData Analytics
Dataflow
17
Distributed Storage(HDFS)
Query(Hive)
Distributed Processing(Map Reduce)
Scripting(Pig)
NoSQ
L Data
base
(HB
ase
)
Metadata(HCatalog)
Data
Inte
gra
tion
( OD
BC
/ SQ
OO
P/
REST)
Busin
ess In
tellig
ence
(E
xcel, Po
wer V
iew
…)
Machine Learning(Mahout)
Graph(Pegasus)
Stats processin
g(RHadoop
)
Pipelin
e /
workfl
ow
(Oozie
)
Log fi
le
aggre
gatio
n(Flu
me)
Active
D
irecto
ry (S
ecu
rity)Syste
m C
ente
r
The Hadoop Ecosystem
18
Welcome to the Zoo!HDInsightApache™Hadoop™on Windows
Azure Blob StorageLibHDFSFTPS
ActiveDirectory
Need to Know*
StreamInsight
JDBC Connector
Good to Know*
HCatalog OozieAmbari
Hadoop: The Definitive Guide 3rd Ed.- Tom White, O’Reilly Books
19
Feed us back
• Support Team’s blog: http://blogs.msdn.com/b/bigdatasupport/ • Facebook Page: https://www.facebook.com/MicrosoftBigData • Facebook Group: https://www.facebook.com/groups/bigdatalearnings/ • Twitter: @debarchans
Read more:• http://en.wikipedia.org/wiki/Hadoop• http://en.wikipedia.org/wiki/Big_data
Next Session:• Apache Hadoop – A deep dive
© 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.