Download - Where Is My Data - ILTAM Session
![Page 1: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/1.jpg)
Tamir DresherSenior Software ArchitectJuly 2, 2014
Where is my Data? (In the Cloud)
![Page 2: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/2.jpg)
About Me
• Software architect, consultant and instructor• Software Engineering Lecturer @ Ruppin Academic Center• Technology addict• 10 years of experience• .NET and Native Windows Programming
@[email protected]://www.TamirDresher.com.
![Page 3: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/3.jpg)
Agenda
• Storage• Blob• Relational DB• NoSql DB• MapReduce
3
![Page 4: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/4.jpg)
Storage
4
Where is my data Storage
![Page 5: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/5.jpg)
5
Numbers – 1 Second is
• 1,132 Instagram photos uploaded
Where is my data Storage
![Page 6: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/6.jpg)
6
Numbers – 1 Second is
• 1,132 Instagram photos uploaded • 1,365 Tumblr posts
Where is my data Storage
![Page 7: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/7.jpg)
7
Numbers – 1 Second is
• 1,132 Instagram photos uploaded • 1,365 Tumblr posts • 7,241 Tweets sent
Where is my data Storage
![Page 8: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/8.jpg)
8
Numbers – 1 Second is
• 1,132 Instagram photos uploaded • 1,365 Tumblr posts • 7,241 Tweets sent • 44,512 Google searches
Where is my data Storage
![Page 9: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/9.jpg)
9
Numbers – 1 Second is
• 1,132 Instagram photos uploaded • 1,365 Tumblr posts • 7,241 Tweets sent • 44,512 Google searches• 84,921 YouTube videos viewed• http://www.internetlivestats.com/one-second/• http://onesecond.designly.com/
Where is my data Storage
![Page 10: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/10.jpg)
10
Storage Prices
![Page 11: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/11.jpg)
Types of information• Product catalogs• Employee data• User profiles• Images• Session state• Shopping cart• Game scores and state
11
• Social feeds• Query output results• Airline seating charts• Inventory management system• Game leaderboards• Performance counters• Weather • Stock quotes
Where is my data Storage
![Page 12: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/12.jpg)
12
Gartner Magic QuadrantIaaS PaaS
![Page 13: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/13.jpg)
North America Europe Asia Pacific
S. Central – U.S. Region
W. Europe Region
N. Central – U.S. Region
N. Europe Region
S.E. AsiaRegion
E. AsiaRegion
Data centers
Windows Azure Growing Global PresenceMicrosoft Azure Storage
East – U.S. Region
West – U.S. Region
Brazil South Region
Japan WestRegion
Japan EastRegion
China North
China East
Storage SLA – 99.99%52.56 minutes per year
http://azure.microsoft.com/en-us/support/legal/sla
![Page 14: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/14.jpg)
14
AZURE BLOBS
![Page 15: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/15.jpg)
What is a BLOB
• BLOB – Binary Large OBject• Storage for any type of entity such as binary files and text
documents• Distributed File Service (DFS)– Scalability and High availability
• BLOB file is distributed between multiple server and replicated at least 3 times
15
Where is my data BLOB
![Page 16: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/16.jpg)
Azure Blob Storage Concepts
BlobContainerAccount
http://<account>.blob.core.windows.net/<container>/<blobname>
Pages/ Blocks
contoso
PIC01.JPG
Block/Page
Block/Page
PIC02.JPG
images
VID1.AVIvideos
16
Where is my data BLOB
![Page 17: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/17.jpg)
Amazon Simple Storage Service(S3) Concepts
ObjectBucketAccount
http://<bucket>. s3.amazonaws.com/<object>
contoso
PIC01.JPG
PIC02.JPG
images
VID1.AVIvideos
17
Where is my data BLOB
![Page 18: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/18.jpg)
Blob Operations
18
PutBlobGetBlobDeleteBlobCopyBlobSnapshotBlob LeaseBlob Windows Azure
Storage
REST
Where is my data BLOB
![Page 19: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/19.jpg)
DEMOCreating a Blob
19
![Page 20: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/20.jpg)
BLOBS - Azure
• Block blob - up to 200 GB in size• Page blobs – up to 1 TB in size• Total Account Capacity - 500 TB
20
Where is my data BLOB
![Page 21: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/21.jpg)
21
BLOBS - AWS
• Object size – up to 5 TB• AWS account can own up to 100 buckets at a time, unlimited
objects• 99.999999999% durability, 99.99% availability• Reduced Redundancy Storage (RRS) - 99.99% durability and
99.99%• Amazon Glaciar - low-cost storage service as a storage option
for data archival.
Where is my data BLOB
![Page 22: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/22.jpg)
22
Pricing - AWS
• pay for what you use• Components:– Storage capacity used (per GB per month)– Data transfer out (per GB per month)– Requests (per n thousand requests per month)
• http://aws.amazon.com/s3/pricing/
Where is my data BLOB Pricing
![Page 23: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/23.jpg)
23
Pricing - Azure
• pay for what you use or 6,12 months plan• Components– Storage capacity used (per GB per month)– Replication option (LRS, GRS, RA-GRS)– Number of requests (per n thousand requests per month)– Data egress (per GB per month)
• http://azure.microsoft.com/en-us/pricing/details/storage/
Where is my data BLOB Pricing
![Page 24: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/24.jpg)
24
RELATIONAL DB
![Page 25: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/25.jpg)
Relational Database Service (RDS)
• MySQL, Oracle, or Microsoft SQL Server in the cloud• No administrative overheads• Dedicated Hardware• High Availability• pay-as-you-grow pricing• Familiar Development Model*
* Despite missing features and some limitations - http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_SQLServer.html
25
Where is my data Relational DB
![Page 26: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/26.jpg)
SQL Azure
• SQL Server in the cloud• No administrative overheads• Shared or Reserved (Dedicated) Hardware• High Availability• pay-as-you-grow pricing• Familiar Development Model*
* Despite missing features and some limitations - http://msdn.microsoft.com/en-us/library/ff394115.aspx
26
Where is my data Relational DB
![Page 27: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/27.jpg)
DEMOCreating and Using SQL Azure
27
![Page 28: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/28.jpg)
PricingSQL - Azure
28
Where is my data Relational DB
![Page 29: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/29.jpg)
Pricing - RDS
29
Where is my data Relational DB
• pay for what you use• Components:– Storage capacity used (per GB-month and per million I/O requests)– Deployment type - Single-AZ/Multi-AZ (AZ-Availabiity Zone)– DB instance hours (per hour)– Additional backup storage (per GB-month(– Data transfer in / out (per GB per month)
• http://aws.amazon.com/rds/pricing/
![Page 30: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/30.jpg)
Case Study - :// . /https haveibeenpwned com
30
Where is my data SQL Azure
![Page 31: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/31.jpg)
Case Study - :// . /https haveibeenpwned com
• http://www.troyhunt.com/2013/12/working-with-154-million-records-on.html
• How do I make querying 154 million email addresses as fast as possible?
• if I want 100GB of SQL Server and I want to hit it 10 million times, it’ll cost me $176 a month (now its ~20$)
31
Where is my data SQL Azure
![Page 32: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/32.jpg)
32
NoSql - Azure Tables, DynamoDB
![Page 33: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/33.jpg)
NoSql
• Relational technology has long been the dominant approach for data. • Large amount of data– Scaling across many servers is challenging.
• Different kind of data on Relational DB– JSON documents – Graphs
• ACID – Atomicity, Consistency, Isolation, Durability.• CAP - Consistency, Availability, Partition tolerance.• BASE - Basic Availability, Soft-state, Eventual consistency.
33
Where is my data NoSql
![Page 34: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/34.jpg)
NoSql
34
Where is my data NoSql
![Page 35: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/35.jpg)
Table Storage Concepts
EntityTableAccount
contoso
Name =…Email = …
Name =…EMailAdd=
customers
Photo ID =…Date =…
photos
Photo ID =…Date =…
35
Where is my data NoSql Azure Tables
![Page 36: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/36.jpg)
Table Storage
• Not RDBMS – No relationships between entities– NoSql
• Entity can have up to 255 properties - Up to 1MB per entity• Mandatory Properties for every entity– PartitionKey & RowKey (only indexed properties)
• Uniquely identifies an entity• Same RowKey can be used in different PartitionKey• Defines the sort order
– Timestamp - Optimistic Concurrency
• Strongly consistent36
Where is my data NoSql Azure Tables
![Page 37: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/37.jpg)
No Fixed Schema
FIRST LAST BIRTHDATE
Wade Wegner 2/2/1981
Nathan Totten 3/15/1965
Nick Harris May 1, 1976
FAV SPORT
Canoeing
37
Where is my data NoSql Azure Tables
![Page 38: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/38.jpg)
Table Object Model
• ITableEntity interface –PartitionKey, RowKey, Timestamp, and Etag properties– Implemented by TableEntity and DynamicTableEntity
38
// This class defines one additional property of integer type, // since it derives from TableEntity it will be automatically // serialized and deserialized. public class SampleEntity : TableEntity{ public int SampleProperty { get; set; } }
Where is my data NoSql Azure Tables
![Page 39: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/39.jpg)
Sample – Inserting an Entity into a Table
39
// You will need the following using statementsusing Microsoft.WindowsAzure.Storage;using Microsoft.WindowsAzure.Storage.Table;
// Create the table client.CloudTableClient tableClient = storageAccount.CreateCloudTableClient();CloudTable peopleTable = tableClient.GetTableReference("people");peopleTable.CreateIfNotExists();
// Create a new customer entity.CustomerEntity customer1 = new CustomerEntity("Harp", "Walter");customer1.Email = "[email protected]";customer1.PhoneNumber = "425-555-0101";
// Create an operation to add the new customer to the people table.TableOperation insertCustomer1 = TableOperation.Insert(customer1);
// Submit the operation to the table service.peopleTable.Execute(insertCustomer1);
Where is my data NoSql Azure Tables
![Page 40: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/40.jpg)
Retrieve
40
// Create the table client.CloudTableClient tableClient = storageAccount.CreateCloudTableClient();CloudTable peopleTable = tableClient.GetTableReference("people");
// Retrieve the entity with partition key of "Smith" and row key of "Jeff"TableOperation retrieveJeffSmith = TableOperation.Retrieve<CustomerEntity>("Smith", "Jeff");
// Retrieve entityCustomerEntity specificEntity = (CustomerEntity)peopleTable.Execute(retrieveJeffSmith).Result;
Where is my data NoSql Azure Tables
![Page 41: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/41.jpg)
Table Storage – Important Points
• Azure Tables can store TBs of data• Tables Operations are fast• Tables are distributed –PartitionKey defines the partition– A table might be stored in different partitions on different storage
devices.
41
Where is my data NoSql Azure Tables
![Page 42: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/42.jpg)
Pricing
42
Where is my data NoSql Azure Tables
![Page 43: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/43.jpg)
Case Study - :// . /https haveibeenpwned com
43
Where is my data NoSql Azure Tables
![Page 44: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/44.jpg)
Case Study - :// . /https haveibeenpwned com
• How do I make querying 154 million email addresses as fast as possible?
• [email protected] – the domain is the partition key and the alias is the row key
• if I want 100GB of storage and I want to hit it 10 million times, it’ll cost me $8 a month
• SQL Server will cost $176 a month - 22 times more expensive
44
Where is my data NoSql Azure Tables
![Page 45: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/45.jpg)
DynamoDB
• Item can have up to 64KB per entity• Item stored on SSDs and are replicated across multiple Availability
Zones in a Region• Item has a primary key can either be a single-attribute hash key or
a composite hash-range key• Supports secondary indexes
45
Where is my data NoSql AWS
DynamoDB
![Page 46: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/46.jpg)
DynamoDB
• Eventually-consistent reads (by default), and strongly-consistent reads (optional)
• Provisioned Throughput - the request throughput you want your table to be able to achieve– 10 units of Write Capacity (enough capacity to do up to 36,000 writes per hour)*– 50 units of Read Capacity (enough capacity to do up to 180,000 strongly
consistent reads, or 360,000 eventually consistent reads, per hour)
46
Where is my data NoSql AWS
DynamoDB
![Page 47: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/47.jpg)
Pricing
• Pay for what you use • Components:– Provisioned throughput capacity (per hour)– Indexed data storage (per GB per month)– Data transfer out (per GB per month)
• http://aws.amazon.com/dynamodb/pricing/
47
Where is my data NoSql AWS
DynamoDB
![Page 48: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/48.jpg)
DynamoDB
• Item can have up to 64KB per entity• Item stored on SSDs and are replicated across multiple Availability
Zones in a Region• Item has a primary key can either be a single-attribute hash key or
a composite hash-range key• Supports secondary indexes
48
Where is my data NoSql AWS
DynamoDB
![Page 49: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/49.jpg)
49
MapReduce on the Cloud
![Page 50: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/50.jpg)
Hadoop in the cloud
• Hadoop on Azure Cloud• Some Facts:– 2013 Global mobile data traffic reached 1.5 exabytes per month – Cisco predicts 1.1 zettabytes (1000 exabyte) of internet traffic in 2016
50
Where is my data MapReduce
![Page 51: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/51.jpg)
MapReduce – The BigData Power
• Map – takes input and output key;value pairs
51
(Key1,Value1)(Key2,Value2)::(Keyn,Valuen)
Where is my data MapReduce
![Page 52: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/52.jpg)
MapReduce – The BigData Power
• Reduce – take group of values per key and produce new group of values
52
Key1:[value1-1,Value1-2…]
Key2:[value2-1,Value2-2…]
Keyn:[valueN-1,ValueN-2…]
[new_value1-1,new_value1-2…]
[new_value2-1,new_value2-2…]
[new_valueN-1,new_valueN-2…]
: :
Where is my data MapReduce
![Page 53: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/53.jpg)
FIRST, STORE THE DATA
Server
ServerServer
MapReduce - How Does It Work?
Files
Server
Where is my data MapReduce
![Page 54: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/54.jpg)
SECOND, TAKE THE PROCESSING TO THE DATA
So How Does It Work?
// Map Reduce function in JavaScript
var map = function (key, value, context) {var words = value.split(/[^a-zA-Z]/);for (var i = 0; i < words.length; i++) {
if (words[i] !== "")context.write(words[i].toLowerCase(),1);}}};
var reduce = function (key, values, context) {var sum = 0;while (values.hasNext()) {sum += parseInt(values.next());
}context.write(key, sum);};
ServerServer
ServerServer
RUNTIME
Code
Where is my data MapReduce
![Page 55: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/55.jpg)
55
Elastic Map Reduce (EMR)
Where is my data MapReduce EMR
• Amazon Hadoop on the Cloud• Hortonworks and Microsoft Hadoop to Windows• Cluster of EC2 • Pricing:– hourly rate for every instance hour (by instance type)– Additional EMR price per EC2 instance– http://aws.amazon.com/elasticmapreduce/pricing/
![Page 56: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/56.jpg)
56
HDInsight
Where is my data MapReduce HDInsight
• MS Hadoop on (not only) Azure Cloud• Hortonworks and Microsoft Hadoop to Windows• Native integration with .NET
![Page 57: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/57.jpg)
Finding common friends
• Facebook shows you how many common friends you have with someone
• There were 1,310,000,000 active users in facebook with130 friends on average (01.01.2014)
• Calculating the mutual friends
57
Where is my data HDInsight
![Page 58: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/58.jpg)
Finding common friends
• We can represent Friend Relationship as:
• Note that a Friend relationship is Symmetrical – if A is a friend of B then B is a friend of A
58
Where is my data HDInsight
Someone [List of his\her friends]
Common Friends
![Page 59: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/59.jpg)
Example of Friends file
• U1 -> U2 U3 U4• U2 -> U1 U3 U4 U5• U3 -> U1 U2 U4 U5• U4 -> U1 U2 U3 U5• U5 -> U2 U3 U4
59
Where is my data HDInsight Common Friends
![Page 60: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/60.jpg)
Designing our MapReduce job
• Each line from the file will input line to the Mapper• The Mapper will output key-value pairs• Key: (user, friend)– Sorted, friend might be before user
• value: list of friends
60
Where is my data HDInsight Common Friends
![Page 61: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/61.jpg)
Designing our MapReduce job - Mapper
• Each line from the file will input line to the Mapper• The Mapper will output key-value pairs• Key: (user, friend)– Sorted, friend might be before user
• value: list of friends
• Having the key sorted will help us with the reducer, same pairs will be provided together
61
Where is my data HDInsight Common Friends
![Page 62: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/62.jpg)
Mapper Example
62
Where is my data HDInsight Common Friends
Mapper Output: Given the Line:
(U1 U2) U2 U3 U4(U1 U3) U2 U3 U4(U1 U4) U2 U3 U4
U1U2 U3 U4
![Page 63: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/63.jpg)
Mapper Example
63
Where is my data HDInsight Common Friends
Mapper Output: Given the Line:
(U1 U2) U2 U3 U4(U1 U3) U2 U3 U4(U1 U4) U2 U3 U4
U1U2 U3 U4
(U1 U2) -> U1 U3 U4 U5(U2 U3) -> U1 U3 U4 U5(U2 U4) -> U1 U3 U4 U5(U2 U5) -> U1 U3 U4 U5
U2 U1 U3 U4 U5
![Page 64: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/64.jpg)
Mapper Example – final result
64
Where is my data HDInsight Common Friends
Mapper Output: Given the Line:
(U1 U2) U2 U3 U4(U1 U3) U2 U3 U4(U1 U4) U2 U3 U4
U1U2 U3 U4
(U1 U2) -> U1 U3 U4 U5(U2 U3) -> U1 U3 U4 U5(U2 U4) -> U1 U3 U4 U5(U2 U5) -> U1 U3 U4 U5
U2 U1 U3 U4 U5
(U1 U3) -> U1 U2 U4 U5(U2 U3) -> U1 U2 U4 U5(U3 U4) -> U1 U2 U4 U5(U3 U5) -> U1 U2 U4 U5
U3 -> U1 U2 U4 U5
Mapper Output: Given the Line:(U1 U4) -> U1 U2 U3 U5(U2 U4) -> U1 U2 U3 U5(U3 U4) -> U1 U2 U3 U5(U4 U5) -> U1 U2 U3 U5
U4 -> U1 U2 U3 U5
(U2 U5) -> U2 U3 U4(U3 U5) -> U2 U3 U4(U4 U5) -> U2 U3 U4
U5 -> U2 U3 U4
![Page 65: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/65.jpg)
Designing our MapReduce job - Reducer
• The input for the reducer will be structured as:(friend1, friend2) (friend1 friends) (friend2 friends)
• The reducer will find the intersection between the lists• Output:
(friend1, friend2) (intersection of friend1 and friend2 friends)
65
Where is my data HDInsight Common Friends
![Page 66: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/66.jpg)
Reducer Example
66
Where is my data HDInsight Common Friends
Reducer Output: Given the Line:
(U1 U2) -> (U3 U4) (U1 U2) -> (U1 U3 U4 U5) (U2 U3 U4)(U1 U3) -> (U2 U4) (U1 U3) -> (U1 U2 U4 U5) (U2 U3 U4)(U1 U4) -> (U2 U3) (U1 U4) -> (U1 U2 U3 U5) (U2 U3 U4)(U2 U3) -> (U1 U4 U5) (U2 U3) -> (U1 U2 U4 U5) (U1 U3 U4 U5)(U2 U4) -> (U1 U3 U5) (U2 U4) -> (U1 U2 U3 U5) (U1 U3 U4 U5)(U2 U5) -> (U3 U4) (U2 U5) -> (U1 U3 U4 U5) (U2 U3 U4)(U3 U4) -> (U1 U2 U5) (U3 U4) -> (U1 U2 U3 U5) (U1 U2 U4 U5)(U3 U5) -> (U2 U4) (U3 U5) -> (U1 U2 U4 U5) (U2 U3 U4)(U4 U5) -> (U2 U3) (U4 U5) -> (U1 U2 U3 U5) (U2 U3 U4)
![Page 67: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/67.jpg)
Creating c# MapReduce
67
Where is my data HDInsight Common Friends
![Page 68: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/68.jpg)
Creating c# MapReduce - Mapper
68
Where is my data HDInsight Common Friends
public class CommonFriendsMapper:MapperBase{ public override void Map(string inputLine, MapperContext context) { var strings = inputLine.Split(new []{' '}, StringSplitOptions.RemoveEmptyEntries); if (strings.Any()) { var currentUser = strings[0]; var friends = strings.Skip(1); foreach (var friend in friends) { var keyArr = new[] {currentUser, friend}; Array.Sort(keyArr); var key = String.Join(" ", keyArr); context.EmitKeyValue(key, string.Join(" ",friends)); } } }}
![Page 69: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/69.jpg)
Creating c# MapReduce - Reduce
69
Where is my data HDInsight Common Friends
public class CommonFriendsReducer:ReducerCombinerBase{ public override void Reduce(string key, IEnumerable<string> strings, ReducerCombinerContext context) { var friendsLists = strings .Select(friendList => friendList.Split(' ')) .ToList(); var intersection = friendsLists[0].Intersect(friendsLists[1]); context.EmitKeyValue(key, string.Join(" ", intersection)); }}
![Page 70: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/70.jpg)
Creating c# MapReduce – Hadoop Job
70
Where is my data HDInsight Common Friends
HadoopJobConfiguration myConfig = new HadoopJobConfiguration(); myConfig.InputPath = "wasb:///example/data/friends/friends"; myConfig.OutputFolder = "wasb:////example/data/friends/output"; Environment.SetEnvironmentVariable("HADOOP_HOME", @"c:\hadoop"); Environment.SetEnvironmentVariable("Java_HOME", @"c:\hadoop\jvm"); var hadoop = Hadoop.Connect(clusterUri, clusterUserName, hadoopUserName, clusterPassword, azureStorageAccount, azureStorageKey, azureStorageContainer, createContinerIfNotExist); var jobResult = hadoop.MapReduceJob.Execute<CommonFriendsMapper, CommonFriendsReducer>(myConfig); int exitCode = jobResult.Info.ExitCode; // (0 – success, otherwise – failure)
![Page 71: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/71.jpg)
Pricing
71
Where is my data HDInsight
10 node cluster that will exist for 24 hours:• Secure Gateway Node - free.• head node - 15.36 USD per 24-hour day• 1 data node - 7.68 USD per 24-hour day• 10 data nodes - 76.80 USD per 24-hour day• Total: $92.16 USD
![Page 72: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/72.jpg)
72
WRAP UP
![Page 73: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/73.jpg)
Comparing the alternatives
73
Storage Type When Should you Use ImplicationsBLOB Unstructured data
Files- Application Logic Responsibility- Consider using HDInsight(Hadoop)
Relational DB Structured Relational DataACID transactions
- SQL DML+DDL- Could affect scalability- BI Abilities- Reporting
Azure Tables, DynamoDB
Structured DataLoose SchemaGeo Replication (High DR)Auto Sharding
- OData, REST- Application Logic- Responsibility(Multiple Schemas)
Where is my data Wrap Up
![Page 74: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/74.jpg)
What have we seen
• Blobs• Relational DB• NoSql• MapReduce in the Cloud
74
Where is my data Wrap Up
![Page 75: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/75.jpg)
What’s Next
• NoSql – MongoDB, Cassandra, CouchDB, RavenDB• Hadoop ecosystem – Hive, Pig, SQOOP, Mahout• Cache Options - Amazon ElastiCache, Azure Cache, InRole
Cache, Redis• http://blogs.msdn.com/b/windowsazure/• http://blogs.msdn.com/b/windowsazurestorage/• http://blogs.msdn.com/b/bigdatasupport/
75
Where is my data Wrap Up
![Page 76: Where Is My Data - ILTAM Session](https://reader034.vdocuments.net/reader034/viewer/2022051818/54b6ff264a795998388b4577/html5/thumbnails/76.jpg)
Presenter contact detailsc: +972-52-4772946t: @tamir_dreshere: [email protected]: TamirDresher.comw: www.codevalue.net