where is my data - iltam session
DESCRIPTION
How do we manage and save our data in the cloud age. this session compared the alternatives that Azure and AWS provides and the use cases for which they give solutions to. Slides and Demos can also be found on my blog: http://blogs.microsoft.co.il/iblogger/2014/07/03/slides-and-demos-from-iltam-session02072014/TRANSCRIPT
![Page 1: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/1.jpg)
Tamir DresherSenior Software ArchitectJuly 2, 2014
Where is my Data? (In the Cloud)
![Page 2: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/2.jpg)
About Me
• Software architect, consultant and instructor• Software Engineering Lecturer @ Ruppin Academic Center• Technology addict• 10 years of experience• .NET and Native Windows Programming
@[email protected]://www.TamirDresher.com.
![Page 3: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/3.jpg)
Agenda
• Storage• Blob• Relational DB• NoSql DB• MapReduce
3
![Page 4: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/4.jpg)
Storage
4
Where is my data Storage
![Page 5: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/5.jpg)
5
Numbers – 1 Second is
• 1,132 Instagram photos uploaded
Where is my data Storage
![Page 6: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/6.jpg)
6
Numbers – 1 Second is
• 1,132 Instagram photos uploaded • 1,365 Tumblr posts
Where is my data Storage
![Page 7: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/7.jpg)
7
Numbers – 1 Second is
• 1,132 Instagram photos uploaded • 1,365 Tumblr posts • 7,241 Tweets sent
Where is my data Storage
![Page 8: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/8.jpg)
8
Numbers – 1 Second is
• 1,132 Instagram photos uploaded • 1,365 Tumblr posts • 7,241 Tweets sent • 44,512 Google searches
Where is my data Storage
![Page 9: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/9.jpg)
9
Numbers – 1 Second is
• 1,132 Instagram photos uploaded • 1,365 Tumblr posts • 7,241 Tweets sent • 44,512 Google searches• 84,921 YouTube videos viewed• http://www.internetlivestats.com/one-second/• http://onesecond.designly.com/
Where is my data Storage
![Page 10: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/10.jpg)
10
Storage Prices
![Page 11: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/11.jpg)
Types of information• Product catalogs• Employee data• User profiles• Images• Session state• Shopping cart• Game scores and state
11
• Social feeds• Query output results• Airline seating charts• Inventory management system• Game leaderboards• Performance counters• Weather • Stock quotes
Where is my data Storage
![Page 12: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/12.jpg)
12
Gartner Magic QuadrantIaaS PaaS
![Page 13: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/13.jpg)
North America Europe Asia Pacific
S. Central – U.S. Region
W. Europe Region
N. Central – U.S. Region
N. Europe Region
S.E. AsiaRegion
E. AsiaRegion
Data centers
Windows Azure Growing Global PresenceMicrosoft Azure Storage
East – U.S. Region
West – U.S. Region
Brazil South Region
Japan WestRegion
Japan EastRegion
China North
China East
Storage SLA – 99.99%52.56 minutes per year
http://azure.microsoft.com/en-us/support/legal/sla
![Page 14: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/14.jpg)
14
AZURE BLOBS
![Page 15: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/15.jpg)
What is a BLOB
• BLOB – Binary Large OBject• Storage for any type of entity such as binary files and text
documents• Distributed File Service (DFS)– Scalability and High availability
• BLOB file is distributed between multiple server and replicated at least 3 times
15
Where is my data BLOB
![Page 16: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/16.jpg)
Azure Blob Storage Concepts
BlobContainerAccount
http://<account>.blob.core.windows.net/<container>/<blobname>
Pages/ Blocks
contoso
PIC01.JPG
Block/Page
Block/Page
PIC02.JPG
images
VID1.AVIvideos
16
Where is my data BLOB
![Page 17: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/17.jpg)
Amazon Simple Storage Service(S3) Concepts
ObjectBucketAccount
http://<bucket>. s3.amazonaws.com/<object>
contoso
PIC01.JPG
PIC02.JPG
images
VID1.AVIvideos
17
Where is my data BLOB
![Page 18: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/18.jpg)
Blob Operations
18
PutBlobGetBlobDeleteBlobCopyBlobSnapshotBlob LeaseBlob Windows Azure
Storage
REST
Where is my data BLOB
![Page 19: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/19.jpg)
DEMOCreating a Blob
19
![Page 20: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/20.jpg)
BLOBS - Azure
• Block blob - up to 200 GB in size• Page blobs – up to 1 TB in size• Total Account Capacity - 500 TB
20
Where is my data BLOB
![Page 21: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/21.jpg)
21
BLOBS - AWS
• Object size – up to 5 TB• AWS account can own up to 100 buckets at a time, unlimited
objects• 99.999999999% durability, 99.99% availability• Reduced Redundancy Storage (RRS) - 99.99% durability and
99.99%• Amazon Glaciar - low-cost storage service as a storage option
for data archival.
Where is my data BLOB
![Page 22: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/22.jpg)
22
Pricing - AWS
• pay for what you use• Components:– Storage capacity used (per GB per month)– Data transfer out (per GB per month)– Requests (per n thousand requests per month)
• http://aws.amazon.com/s3/pricing/
Where is my data BLOB Pricing
![Page 23: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/23.jpg)
23
Pricing - Azure
• pay for what you use or 6,12 months plan• Components– Storage capacity used (per GB per month)– Replication option (LRS, GRS, RA-GRS)– Number of requests (per n thousand requests per month)– Data egress (per GB per month)
• http://azure.microsoft.com/en-us/pricing/details/storage/
Where is my data BLOB Pricing
![Page 24: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/24.jpg)
24
RELATIONAL DB
![Page 25: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/25.jpg)
Relational Database Service (RDS)
• MySQL, Oracle, or Microsoft SQL Server in the cloud• No administrative overheads• Dedicated Hardware• High Availability• pay-as-you-grow pricing• Familiar Development Model*
* Despite missing features and some limitations - http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_SQLServer.html
25
Where is my data Relational DB
![Page 26: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/26.jpg)
SQL Azure
• SQL Server in the cloud• No administrative overheads• Shared or Reserved (Dedicated) Hardware• High Availability• pay-as-you-grow pricing• Familiar Development Model*
* Despite missing features and some limitations - http://msdn.microsoft.com/en-us/library/ff394115.aspx
26
Where is my data Relational DB
![Page 27: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/27.jpg)
DEMOCreating and Using SQL Azure
27
![Page 28: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/28.jpg)
PricingSQL - Azure
28
Where is my data Relational DB
![Page 29: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/29.jpg)
Pricing - RDS
29
Where is my data Relational DB
• pay for what you use• Components:– Storage capacity used (per GB-month and per million I/O requests)– Deployment type - Single-AZ/Multi-AZ (AZ-Availabiity Zone)– DB instance hours (per hour)– Additional backup storage (per GB-month(– Data transfer in / out (per GB per month)
• http://aws.amazon.com/rds/pricing/
![Page 30: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/30.jpg)
Case Study - :// . /https haveibeenpwned com
30
Where is my data SQL Azure
![Page 31: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/31.jpg)
Case Study - :// . /https haveibeenpwned com
• http://www.troyhunt.com/2013/12/working-with-154-million-records-on.html
• How do I make querying 154 million email addresses as fast as possible?
• if I want 100GB of SQL Server and I want to hit it 10 million times, it’ll cost me $176 a month (now its ~20$)
31
Where is my data SQL Azure
![Page 32: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/32.jpg)
32
NoSql - Azure Tables, DynamoDB
![Page 33: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/33.jpg)
NoSql
• Relational technology has long been the dominant approach for data. • Large amount of data– Scaling across many servers is challenging.
• Different kind of data on Relational DB– JSON documents – Graphs
• ACID – Atomicity, Consistency, Isolation, Durability.• CAP - Consistency, Availability, Partition tolerance.• BASE - Basic Availability, Soft-state, Eventual consistency.
33
Where is my data NoSql
![Page 34: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/34.jpg)
NoSql
34
Where is my data NoSql
![Page 35: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/35.jpg)
Table Storage Concepts
EntityTableAccount
contoso
Name =…Email = …
Name =…EMailAdd=
customers
Photo ID =…Date =…
photos
Photo ID =…Date =…
35
Where is my data NoSql Azure Tables
![Page 36: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/36.jpg)
Table Storage
• Not RDBMS – No relationships between entities– NoSql
• Entity can have up to 255 properties - Up to 1MB per entity• Mandatory Properties for every entity– PartitionKey & RowKey (only indexed properties)
• Uniquely identifies an entity• Same RowKey can be used in different PartitionKey• Defines the sort order
– Timestamp - Optimistic Concurrency
• Strongly consistent36
Where is my data NoSql Azure Tables
![Page 37: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/37.jpg)
No Fixed Schema
FIRST LAST BIRTHDATE
Wade Wegner 2/2/1981
Nathan Totten 3/15/1965
Nick Harris May 1, 1976
FAV SPORT
Canoeing
37
Where is my data NoSql Azure Tables
![Page 38: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/38.jpg)
Table Object Model
• ITableEntity interface –PartitionKey, RowKey, Timestamp, and Etag properties– Implemented by TableEntity and DynamicTableEntity
38
// This class defines one additional property of integer type, // since it derives from TableEntity it will be automatically // serialized and deserialized. public class SampleEntity : TableEntity{ public int SampleProperty { get; set; } }
Where is my data NoSql Azure Tables
![Page 39: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/39.jpg)
Sample – Inserting an Entity into a Table
39
// You will need the following using statementsusing Microsoft.WindowsAzure.Storage;using Microsoft.WindowsAzure.Storage.Table;
// Create the table client.CloudTableClient tableClient = storageAccount.CreateCloudTableClient();CloudTable peopleTable = tableClient.GetTableReference("people");peopleTable.CreateIfNotExists();
// Create a new customer entity.CustomerEntity customer1 = new CustomerEntity("Harp", "Walter");customer1.Email = "[email protected]";customer1.PhoneNumber = "425-555-0101";
// Create an operation to add the new customer to the people table.TableOperation insertCustomer1 = TableOperation.Insert(customer1);
// Submit the operation to the table service.peopleTable.Execute(insertCustomer1);
Where is my data NoSql Azure Tables
![Page 40: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/40.jpg)
Retrieve
40
// Create the table client.CloudTableClient tableClient = storageAccount.CreateCloudTableClient();CloudTable peopleTable = tableClient.GetTableReference("people");
// Retrieve the entity with partition key of "Smith" and row key of "Jeff"TableOperation retrieveJeffSmith = TableOperation.Retrieve<CustomerEntity>("Smith", "Jeff");
// Retrieve entityCustomerEntity specificEntity = (CustomerEntity)peopleTable.Execute(retrieveJeffSmith).Result;
Where is my data NoSql Azure Tables
![Page 41: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/41.jpg)
Table Storage – Important Points
• Azure Tables can store TBs of data• Tables Operations are fast• Tables are distributed –PartitionKey defines the partition– A table might be stored in different partitions on different storage
devices.
41
Where is my data NoSql Azure Tables
![Page 42: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/42.jpg)
Pricing
42
Where is my data NoSql Azure Tables
![Page 43: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/43.jpg)
Case Study - :// . /https haveibeenpwned com
43
Where is my data NoSql Azure Tables
![Page 44: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/44.jpg)
Case Study - :// . /https haveibeenpwned com
• How do I make querying 154 million email addresses as fast as possible?
• [email protected] – the domain is the partition key and the alias is the row key
• if I want 100GB of storage and I want to hit it 10 million times, it’ll cost me $8 a month
• SQL Server will cost $176 a month - 22 times more expensive
44
Where is my data NoSql Azure Tables
![Page 45: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/45.jpg)
DynamoDB
• Item can have up to 64KB per entity• Item stored on SSDs and are replicated across multiple Availability
Zones in a Region• Item has a primary key can either be a single-attribute hash key or
a composite hash-range key• Supports secondary indexes
45
Where is my data NoSql AWS
DynamoDB
![Page 46: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/46.jpg)
DynamoDB
• Eventually-consistent reads (by default), and strongly-consistent reads (optional)
• Provisioned Throughput - the request throughput you want your table to be able to achieve– 10 units of Write Capacity (enough capacity to do up to 36,000 writes per hour)*– 50 units of Read Capacity (enough capacity to do up to 180,000 strongly
consistent reads, or 360,000 eventually consistent reads, per hour)
46
Where is my data NoSql AWS
DynamoDB
![Page 47: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/47.jpg)
Pricing
• Pay for what you use • Components:– Provisioned throughput capacity (per hour)– Indexed data storage (per GB per month)– Data transfer out (per GB per month)
• http://aws.amazon.com/dynamodb/pricing/
47
Where is my data NoSql AWS
DynamoDB
![Page 48: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/48.jpg)
DynamoDB
• Item can have up to 64KB per entity• Item stored on SSDs and are replicated across multiple Availability
Zones in a Region• Item has a primary key can either be a single-attribute hash key or
a composite hash-range key• Supports secondary indexes
48
Where is my data NoSql AWS
DynamoDB
![Page 49: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/49.jpg)
49
MapReduce on the Cloud
![Page 50: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/50.jpg)
Hadoop in the cloud
• Hadoop on Azure Cloud• Some Facts:– 2013 Global mobile data traffic reached 1.5 exabytes per month – Cisco predicts 1.1 zettabytes (1000 exabyte) of internet traffic in 2016
50
Where is my data MapReduce
![Page 51: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/51.jpg)
MapReduce – The BigData Power
• Map – takes input and output key;value pairs
51
(Key1,Value1)(Key2,Value2)::(Keyn,Valuen)
Where is my data MapReduce
![Page 52: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/52.jpg)
MapReduce – The BigData Power
• Reduce – take group of values per key and produce new group of values
52
Key1:[value1-1,Value1-2…]
Key2:[value2-1,Value2-2…]
Keyn:[valueN-1,ValueN-2…]
[new_value1-1,new_value1-2…]
[new_value2-1,new_value2-2…]
[new_valueN-1,new_valueN-2…]
: :
Where is my data MapReduce
![Page 53: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/53.jpg)
FIRST, STORE THE DATA
Server
ServerServer
MapReduce - How Does It Work?
Files
Server
Where is my data MapReduce
![Page 54: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/54.jpg)
SECOND, TAKE THE PROCESSING TO THE DATA
So How Does It Work?
// Map Reduce function in JavaScript
var map = function (key, value, context) {var words = value.split(/[^a-zA-Z]/);for (var i = 0; i < words.length; i++) {
if (words[i] !== "")context.write(words[i].toLowerCase(),1);}}};
var reduce = function (key, values, context) {var sum = 0;while (values.hasNext()) {sum += parseInt(values.next());
}context.write(key, sum);};
ServerServer
ServerServer
RUNTIME
Code
Where is my data MapReduce
![Page 55: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/55.jpg)
55
Elastic Map Reduce (EMR)
Where is my data MapReduce EMR
• Amazon Hadoop on the Cloud• Hortonworks and Microsoft Hadoop to Windows• Cluster of EC2 • Pricing:– hourly rate for every instance hour (by instance type)– Additional EMR price per EC2 instance– http://aws.amazon.com/elasticmapreduce/pricing/
![Page 56: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/56.jpg)
56
HDInsight
Where is my data MapReduce HDInsight
• MS Hadoop on (not only) Azure Cloud• Hortonworks and Microsoft Hadoop to Windows• Native integration with .NET
![Page 57: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/57.jpg)
Finding common friends
• Facebook shows you how many common friends you have with someone
• There were 1,310,000,000 active users in facebook with130 friends on average (01.01.2014)
• Calculating the mutual friends
57
Where is my data HDInsight
![Page 58: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/58.jpg)
Finding common friends
• We can represent Friend Relationship as:
• Note that a Friend relationship is Symmetrical – if A is a friend of B then B is a friend of A
58
Where is my data HDInsight
Someone [List of his\her friends]
Common Friends
![Page 59: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/59.jpg)
Example of Friends file
• U1 -> U2 U3 U4• U2 -> U1 U3 U4 U5• U3 -> U1 U2 U4 U5• U4 -> U1 U2 U3 U5• U5 -> U2 U3 U4
59
Where is my data HDInsight Common Friends
![Page 60: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/60.jpg)
Designing our MapReduce job
• Each line from the file will input line to the Mapper• The Mapper will output key-value pairs• Key: (user, friend)– Sorted, friend might be before user
• value: list of friends
60
Where is my data HDInsight Common Friends
![Page 61: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/61.jpg)
Designing our MapReduce job - Mapper
• Each line from the file will input line to the Mapper• The Mapper will output key-value pairs• Key: (user, friend)– Sorted, friend might be before user
• value: list of friends
• Having the key sorted will help us with the reducer, same pairs will be provided together
61
Where is my data HDInsight Common Friends
![Page 62: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/62.jpg)
Mapper Example
62
Where is my data HDInsight Common Friends
Mapper Output: Given the Line:
(U1 U2) U2 U3 U4(U1 U3) U2 U3 U4(U1 U4) U2 U3 U4
U1U2 U3 U4
![Page 63: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/63.jpg)
Mapper Example
63
Where is my data HDInsight Common Friends
Mapper Output: Given the Line:
(U1 U2) U2 U3 U4(U1 U3) U2 U3 U4(U1 U4) U2 U3 U4
U1U2 U3 U4
(U1 U2) -> U1 U3 U4 U5(U2 U3) -> U1 U3 U4 U5(U2 U4) -> U1 U3 U4 U5(U2 U5) -> U1 U3 U4 U5
U2 U1 U3 U4 U5
![Page 64: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/64.jpg)
Mapper Example – final result
64
Where is my data HDInsight Common Friends
Mapper Output: Given the Line:
(U1 U2) U2 U3 U4(U1 U3) U2 U3 U4(U1 U4) U2 U3 U4
U1U2 U3 U4
(U1 U2) -> U1 U3 U4 U5(U2 U3) -> U1 U3 U4 U5(U2 U4) -> U1 U3 U4 U5(U2 U5) -> U1 U3 U4 U5
U2 U1 U3 U4 U5
(U1 U3) -> U1 U2 U4 U5(U2 U3) -> U1 U2 U4 U5(U3 U4) -> U1 U2 U4 U5(U3 U5) -> U1 U2 U4 U5
U3 -> U1 U2 U4 U5
Mapper Output: Given the Line:(U1 U4) -> U1 U2 U3 U5(U2 U4) -> U1 U2 U3 U5(U3 U4) -> U1 U2 U3 U5(U4 U5) -> U1 U2 U3 U5
U4 -> U1 U2 U3 U5
(U2 U5) -> U2 U3 U4(U3 U5) -> U2 U3 U4(U4 U5) -> U2 U3 U4
U5 -> U2 U3 U4
![Page 65: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/65.jpg)
Designing our MapReduce job - Reducer
• The input for the reducer will be structured as:(friend1, friend2) (friend1 friends) (friend2 friends)
• The reducer will find the intersection between the lists• Output:
(friend1, friend2) (intersection of friend1 and friend2 friends)
65
Where is my data HDInsight Common Friends
![Page 66: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/66.jpg)
Reducer Example
66
Where is my data HDInsight Common Friends
Reducer Output: Given the Line:
(U1 U2) -> (U3 U4) (U1 U2) -> (U1 U3 U4 U5) (U2 U3 U4)(U1 U3) -> (U2 U4) (U1 U3) -> (U1 U2 U4 U5) (U2 U3 U4)(U1 U4) -> (U2 U3) (U1 U4) -> (U1 U2 U3 U5) (U2 U3 U4)(U2 U3) -> (U1 U4 U5) (U2 U3) -> (U1 U2 U4 U5) (U1 U3 U4 U5)(U2 U4) -> (U1 U3 U5) (U2 U4) -> (U1 U2 U3 U5) (U1 U3 U4 U5)(U2 U5) -> (U3 U4) (U2 U5) -> (U1 U3 U4 U5) (U2 U3 U4)(U3 U4) -> (U1 U2 U5) (U3 U4) -> (U1 U2 U3 U5) (U1 U2 U4 U5)(U3 U5) -> (U2 U4) (U3 U5) -> (U1 U2 U4 U5) (U2 U3 U4)(U4 U5) -> (U2 U3) (U4 U5) -> (U1 U2 U3 U5) (U2 U3 U4)
![Page 67: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/67.jpg)
Creating c# MapReduce
67
Where is my data HDInsight Common Friends
![Page 68: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/68.jpg)
Creating c# MapReduce - Mapper
68
Where is my data HDInsight Common Friends
public class CommonFriendsMapper:MapperBase{ public override void Map(string inputLine, MapperContext context) { var strings = inputLine.Split(new []{' '}, StringSplitOptions.RemoveEmptyEntries); if (strings.Any()) { var currentUser = strings[0]; var friends = strings.Skip(1); foreach (var friend in friends) { var keyArr = new[] {currentUser, friend}; Array.Sort(keyArr); var key = String.Join(" ", keyArr); context.EmitKeyValue(key, string.Join(" ",friends)); } } }}
![Page 69: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/69.jpg)
Creating c# MapReduce - Reduce
69
Where is my data HDInsight Common Friends
public class CommonFriendsReducer:ReducerCombinerBase{ public override void Reduce(string key, IEnumerable<string> strings, ReducerCombinerContext context) { var friendsLists = strings .Select(friendList => friendList.Split(' ')) .ToList(); var intersection = friendsLists[0].Intersect(friendsLists[1]); context.EmitKeyValue(key, string.Join(" ", intersection)); }}
![Page 70: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/70.jpg)
Creating c# MapReduce – Hadoop Job
70
Where is my data HDInsight Common Friends
HadoopJobConfiguration myConfig = new HadoopJobConfiguration(); myConfig.InputPath = "wasb:///example/data/friends/friends"; myConfig.OutputFolder = "wasb:////example/data/friends/output"; Environment.SetEnvironmentVariable("HADOOP_HOME", @"c:\hadoop"); Environment.SetEnvironmentVariable("Java_HOME", @"c:\hadoop\jvm"); var hadoop = Hadoop.Connect(clusterUri, clusterUserName, hadoopUserName, clusterPassword, azureStorageAccount, azureStorageKey, azureStorageContainer, createContinerIfNotExist); var jobResult = hadoop.MapReduceJob.Execute<CommonFriendsMapper, CommonFriendsReducer>(myConfig); int exitCode = jobResult.Info.ExitCode; // (0 – success, otherwise – failure)
![Page 71: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/71.jpg)
Pricing
71
Where is my data HDInsight
10 node cluster that will exist for 24 hours:• Secure Gateway Node - free.• head node - 15.36 USD per 24-hour day• 1 data node - 7.68 USD per 24-hour day• 10 data nodes - 76.80 USD per 24-hour day• Total: $92.16 USD
![Page 72: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/72.jpg)
72
WRAP UP
![Page 73: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/73.jpg)
Comparing the alternatives
73
Storage Type When Should you Use ImplicationsBLOB Unstructured data
Files- Application Logic Responsibility- Consider using HDInsight(Hadoop)
Relational DB Structured Relational DataACID transactions
- SQL DML+DDL- Could affect scalability- BI Abilities- Reporting
Azure Tables, DynamoDB
Structured DataLoose SchemaGeo Replication (High DR)Auto Sharding
- OData, REST- Application Logic- Responsibility(Multiple Schemas)
Where is my data Wrap Up
![Page 74: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/74.jpg)
What have we seen
• Blobs• Relational DB• NoSql• MapReduce in the Cloud
74
Where is my data Wrap Up
![Page 75: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/75.jpg)
What’s Next
• NoSql – MongoDB, Cassandra, CouchDB, RavenDB• Hadoop ecosystem – Hive, Pig, SQOOP, Mahout• Cache Options - Amazon ElastiCache, Azure Cache, InRole
Cache, Redis• http://blogs.msdn.com/b/windowsazure/• http://blogs.msdn.com/b/windowsazurestorage/• http://blogs.msdn.com/b/bigdatasupport/
75
Where is my data Wrap Up
![Page 76: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/76.jpg)
Presenter contact detailsc: +972-52-4772946t: @tamir_dreshere: [email protected]: TamirDresher.comw: www.codevalue.net