microsoft big data essentials module 1 - introduction to big data
DESCRIPTION
Microsoft Big Data Essentials Module 1 - Introduction to Big Data. Saptak Sen, Microsoft Bill Ramos, Advaiya. Agenda. Why Big Data? Big Data Lambda Architecture Getting started with Windows Azure HDInsight Service. The Business Imperative. 1 . . 2 . . 3. . 4. . - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Microsoft Big Data Essentials Module 1 - Introduction to Big Data](https://reader035.vdocuments.net/reader035/viewer/2022062222/56816143550346895dd0b8d4/html5/thumbnails/1.jpg)
Microsoft Big Data EssentialsModule 1 - Introduction to Big Data
Saptak Sen, MicrosoftBill Ramos, Advaiya
![Page 2: Microsoft Big Data Essentials Module 1 - Introduction to Big Data](https://reader035.vdocuments.net/reader035/viewer/2022062222/56816143550346895dd0b8d4/html5/thumbnails/2.jpg)
• Why Big Data?
• Big Data Lambda Architecture
• Getting started with Windows Azure HDInsight Service
Agenda
![Page 3: Microsoft Big Data Essentials Module 1 - Introduction to Big Data](https://reader035.vdocuments.net/reader035/viewer/2022062222/56816143550346895dd0b8d4/html5/thumbnails/3.jpg)
The Business Imperative
1. 2. 4. 3. Human Fault Tolerance
Minimize CapEx Low Learning CurveHyper Scale on Demand
![Page 4: Microsoft Big Data Essentials Module 1 - Introduction to Big Data](https://reader035.vdocuments.net/reader035/viewer/2022062222/56816143550346895dd0b8d4/html5/thumbnails/4.jpg)
CAP Theorem
Consistency
C
Partition Tolerance
PAvailabili
ty
A
![Page 5: Microsoft Big Data Essentials Module 1 - Introduction to Big Data](https://reader035.vdocuments.net/reader035/viewer/2022062222/56816143550346895dd0b8d4/html5/thumbnails/5.jpg)
Big Data Lambda Architecture
![Page 6: Microsoft Big Data Essentials Module 1 - Introduction to Big Data](https://reader035.vdocuments.net/reader035/viewer/2022062222/56816143550346895dd0b8d4/html5/thumbnails/6.jpg)
Big Data Lambda Architecture• Batch layer• Stores master dataset• Compute arbitrary views
• Speed layer• Fast, incremental algorithms• Batch layer eventually
overrides speed layer
• Serving layer• Random access to batch
views• Updated by batch layer
Serving Layer
Speed Layer
Batch Layer
![Page 7: Microsoft Big Data Essentials Module 1 - Introduction to Big Data](https://reader035.vdocuments.net/reader035/viewer/2022062222/56816143550346895dd0b8d4/html5/thumbnails/7.jpg)
The Batch Layer
• Stores master dataset (in append mode)
• Unrestrained computation
• Horizontally scalable
• High latency
Incoming data
streamsMaster dataset
Batch views
![Page 8: Microsoft Big Data Essentials Module 1 - Introduction to Big Data](https://reader035.vdocuments.net/reader035/viewer/2022062222/56816143550346895dd0b8d4/html5/thumbnails/8.jpg)
The Speed Layer
• Stream processing of data
• Stores a limited window of data
• Dynamic computation
Real-time increments
Incoming data
streams
Process stream
Increment views
Real-time views
![Page 9: Microsoft Big Data Essentials Module 1 - Introduction to Big Data](https://reader035.vdocuments.net/reader035/viewer/2022062222/56816143550346895dd0b8d4/html5/thumbnails/9.jpg)
The Serving Layer
• Queries the batch and real-time views
• Merges the resultsReal-time views
Batch views
Querying and
mergingOutput
![Page 10: Microsoft Big Data Essentials Module 1 - Introduction to Big Data](https://reader035.vdocuments.net/reader035/viewer/2022062222/56816143550346895dd0b8d4/html5/thumbnails/10.jpg)
Microsoft Lambda Architecture Support Serving LayerSpeed LayerBatch Layer
Windows Azure HDInsightAzure Blob storageMapReduce, Hive, Pig, Oozie, SSIS
Federations in Windows Azure SQL Database Azure tablesMemcached/MongoDBSQL Server database engineSQL Server VM:• Columnstore
indexes• Analysis Services• StreamInsight
Azure Storage ExplorerMicrosoft ExcelPower QueryPowerPivot Power ViewPower MapReporting ServicesLINQ to HiveAnalysis Services
![Page 11: Microsoft Big Data Essentials Module 1 - Introduction to Big Data](https://reader035.vdocuments.net/reader035/viewer/2022062222/56816143550346895dd0b8d4/html5/thumbnails/11.jpg)
Serving LayerSpeed LayerBatch Layer
Apache Hadoop
Yahoo!
SQL Server Analysis Service (SSAS)Microsoft Excel and PowerPivotOther BI Tools and Custom Applications
Hadoop Data
Third Party Database
SQL Server Analysis Services
(SSAS Cube)
+Custom
Applications
SQL Server Connector (Hadoop Hive ODBC)
Staging Database
Microsoft Excel & PowerPivot for
Excel
![Page 12: Microsoft Big Data Essentials Module 1 - Introduction to Big Data](https://reader035.vdocuments.net/reader035/viewer/2022062222/56816143550346895dd0b8d4/html5/thumbnails/12.jpg)
Serving LayerSpeed LayerBatch Layer
Windows Azure HDInsight
Ferranti Computer Systems
Microsoft Dynamics AXSQL Server Analysis ServicesSQL Server Reporting Services
SQL Server (In-Memory OLTP)
Data Feed from Smart Meters
Reactive Extensions (Rx)SQL Server Database (In-Memory OLTP)
Reactive Extensions (Rx)
Windows Azure
HDInsight
SQL Server Analysis Services
SQL Server ReportingServices
Microsoft Dynamics
AX
![Page 13: Microsoft Big Data Essentials Module 1 - Introduction to Big Data](https://reader035.vdocuments.net/reader035/viewer/2022062222/56816143550346895dd0b8d4/html5/thumbnails/13.jpg)
Windows Azure Storage
![Page 14: Microsoft Big Data Essentials Module 1 - Introduction to Big Data](https://reader035.vdocuments.net/reader035/viewer/2022062222/56816143550346895dd0b8d4/html5/thumbnails/14.jpg)
Serving LayerSpeed LayerBatch Layer
Azure Blob storage
Windows AzureBlob storage
Demo 1: Setting up the Windows Azure storage account
Azure Storage Explorer
Azure Storage Explorer
![Page 15: Microsoft Big Data Essentials Module 1 - Introduction to Big Data](https://reader035.vdocuments.net/reader035/viewer/2022062222/56816143550346895dd0b8d4/html5/thumbnails/15.jpg)
Blob Storage Concepts• Store large amounts of
unstructured text or binary data with the fastest read performance
• Highly scalable, durable, and available file system
• Blobs can be exposed publically over HTTP
• Securely lock down permissions to blobs
BlobContainer
Account
Images
PIC01.JPG
Video
VID1.AVI
http://<account>.blob.core.windows.net/<container>/<blobname>
Pages/Blocks
Block/Page
Block/Page
PIC02.JPGContoso
![Page 16: Microsoft Big Data Essentials Module 1 - Introduction to Big Data](https://reader035.vdocuments.net/reader035/viewer/2022062222/56816143550346895dd0b8d4/html5/thumbnails/16.jpg)
Getting started with HDInsight Service
![Page 17: Microsoft Big Data Essentials Module 1 - Introduction to Big Data](https://reader035.vdocuments.net/reader035/viewer/2022062222/56816143550346895dd0b8d4/html5/thumbnails/17.jpg)
Demo 2: Setting up the Windows Azure HDInsight cluster
Windows Azure HDInsightAzure Blob storage
Windows AzureHDInsight
Windows AzureBlob storage
HDInsight Console
HDInsight Console
https://<ClusterName>.azurehdinsight.net/
Serving LayerSpeed LayerBatch Layer
![Page 18: Microsoft Big Data Essentials Module 1 - Introduction to Big Data](https://reader035.vdocuments.net/reader035/viewer/2022062222/56816143550346895dd0b8d4/html5/thumbnails/18.jpg)
Demo 3: Loading data into Windows Azure storage for use with HDInsight
Windows Azure HDInsightAzure Blob storage
Windows AzureHDInsight
Windows AzureBlob storage
HDInsight Console
HDInsight Console
https://<ClusterName>.azurehdinsight.net/
Serving LayerSpeed LayerBatch Layer
CSV files from local disk
![Page 19: Microsoft Big Data Essentials Module 1 - Introduction to Big Data](https://reader035.vdocuments.net/reader035/viewer/2022062222/56816143550346895dd0b8d4/html5/thumbnails/19.jpg)
Easy Access to Data, Big & Small
![Page 20: Microsoft Big Data Essentials Module 1 - Introduction to Big Data](https://reader035.vdocuments.net/reader035/viewer/2022062222/56816143550346895dd0b8d4/html5/thumbnails/20.jpg)
Easy Access to Data, Big & SmallSimplify access to public & corporate dataEasily preview, shape, & format your data
Combine and refine data across multiple sourcesGain insight across relational, unstructured, & semi-structured data
Common management of structured & unstructured dataQuery across relational DB & Hadoop with single T-SQL Query
Power QueryWindows Azure MarketplaceWindows Azure HDInsight ServiceParallel Data Warehouse with Polybase
![Page 21: Microsoft Big Data Essentials Module 1 - Introduction to Big Data](https://reader035.vdocuments.net/reader035/viewer/2022062222/56816143550346895dd0b8d4/html5/thumbnails/21.jpg)
Learn more• Getting Started with
HDInsighthttp://blogs.msdn.com/b/windowsazure/archive/2013/03/19/getting-started-with-hdinsight.aspx
• Azure HDInsight and Azure Storagehttp://blogs.msdn.com/b/windowsazure/archive/2013/03/21/azure-hdinsight-and-azure-storage.aspx
![Page 22: Microsoft Big Data Essentials Module 1 - Introduction to Big Data](https://reader035.vdocuments.net/reader035/viewer/2022062222/56816143550346895dd0b8d4/html5/thumbnails/22.jpg)
Questions?
![Page 23: Microsoft Big Data Essentials Module 1 - Introduction to Big Data](https://reader035.vdocuments.net/reader035/viewer/2022062222/56816143550346895dd0b8d4/html5/thumbnails/23.jpg)