Download - Big time: Introducing Hadoop on Azure
![Page 1: Big time: Introducing Hadoop on Azure](https://reader033.vdocuments.net/reader033/viewer/2022061111/54556ce2b1af9f37608b484b/html5/thumbnails/1.jpg)
Big Data
![Page 2: Big time: Introducing Hadoop on Azure](https://reader033.vdocuments.net/reader033/viewer/2022061111/54556ce2b1af9f37608b484b/html5/thumbnails/2.jpg)
The problem is simple
• While the storage capacities of hard drives have increased massively over the years, access speeds—the rate at which data can be read from drives have not kept up.
• One typical drive from 1990 could store 1,370 MB of data and had a transfer speed of 4.4 MB/s
![Page 3: Big time: Introducing Hadoop on Azure](https://reader033.vdocuments.net/reader033/viewer/2022061111/54556ce2b1af9f37608b484b/html5/thumbnails/3.jpg)
• so you could read all the data from a full drive in around five minutes.
• Over 20 years later, one terabyte drives are the norm, but the transfer speed is around 100 MB/s, so it takes more than two and a half hours to read all the data off the disk.
![Page 4: Big time: Introducing Hadoop on Azure](https://reader033.vdocuments.net/reader033/viewer/2022061111/54556ce2b1af9f37608b484b/html5/thumbnails/4.jpg)
ParallelGo
![Page 5: Big time: Introducing Hadoop on Azure](https://reader033.vdocuments.net/reader033/viewer/2022061111/54556ce2b1af9f37608b484b/html5/thumbnails/5.jpg)
Cloud computing changes the way applications grow
http://journals.worldnomads.com/davidsgibson/photo/22804/664941/USA/Elephant-shaped-cloud!
![Page 6: Big time: Introducing Hadoop on Azure](https://reader033.vdocuments.net/reader033/viewer/2022061111/54556ce2b1af9f37608b484b/html5/thumbnails/6.jpg)
Yaniv Rodenski Senior Consultant, Sela Grouphttp://blogs.microsoft.co.il/blogs/roadanTwitter: @YRodenski
BIG-TIME:Introducing Hadoop on Azure
David GinzburgBig Data infrastructure consultantTwitter: @David_Ginzburg
![Page 7: Big time: Introducing Hadoop on Azure](https://reader033.vdocuments.net/reader033/viewer/2022061111/54556ce2b1af9f37608b484b/html5/thumbnails/7.jpg)
1
34
AGENDA
2
![Page 8: Big time: Introducing Hadoop on Azure](https://reader033.vdocuments.net/reader033/viewer/2022061111/54556ce2b1af9f37608b484b/html5/thumbnails/8.jpg)
Apache™ Hadoop™
•
•
•
![Page 9: Big time: Introducing Hadoop on Azure](https://reader033.vdocuments.net/reader033/viewer/2022061111/54556ce2b1af9f37608b484b/html5/thumbnails/9.jpg)
Apache™ Hadoop™
•
•
•
•
![Page 10: Big time: Introducing Hadoop on Azure](https://reader033.vdocuments.net/reader033/viewer/2022061111/54556ce2b1af9f37608b484b/html5/thumbnails/10.jpg)
Hadoop Distributed File System (HDFS)
HDFS Client
![Page 11: Big time: Introducing Hadoop on Azure](https://reader033.vdocuments.net/reader033/viewer/2022061111/54556ce2b1af9f37608b484b/html5/thumbnails/11.jpg)
Hadoop Distributed File System (HDFS)
HDFS Client
![Page 12: Big time: Introducing Hadoop on Azure](https://reader033.vdocuments.net/reader033/viewer/2022061111/54556ce2b1af9f37608b484b/html5/thumbnails/12.jpg)
Hadoop Distributed File System (HDFS)
HDFS Client
![Page 13: Big time: Introducing Hadoop on Azure](https://reader033.vdocuments.net/reader033/viewer/2022061111/54556ce2b1af9f37608b484b/html5/thumbnails/13.jpg)
MapReduce via WordCount
Hello World
Hello Azure
Goodbye Cruel World
1
1
1
1
1
1
1
1
1
1
1
1
1
1
2
2
1
1
1
![Page 14: Big time: Introducing Hadoop on Azure](https://reader033.vdocuments.net/reader033/viewer/2022061111/54556ce2b1af9f37608b484b/html5/thumbnails/14.jpg)
A new way to MapReduce
DEMO
![Page 15: Big time: Introducing Hadoop on Azure](https://reader033.vdocuments.net/reader033/viewer/2022061111/54556ce2b1af9f37608b484b/html5/thumbnails/15.jpg)
Hadoop MapReduce Processing
Input Split
Input Split
Input Split
Merge
![Page 16: Big time: Introducing Hadoop on Azure](https://reader033.vdocuments.net/reader033/viewer/2022061111/54556ce2b1af9f37608b484b/html5/thumbnails/16.jpg)
Hadoop MapReduce Processing
Job Client
![Page 17: Big time: Introducing Hadoop on Azure](https://reader033.vdocuments.net/reader033/viewer/2022061111/54556ce2b1af9f37608b484b/html5/thumbnails/17.jpg)
MapReduce TMI
Input Split
Partition, Sort,
and spill to disk
Buffer
Fetch
![Page 18: Big time: Introducing Hadoop on Azure](https://reader033.vdocuments.net/reader033/viewer/2022061111/54556ce2b1af9f37608b484b/html5/thumbnails/18.jpg)
MapReduce TMI
Sort
Output
Map Outpu
t
Map Outpu
t
Map Outpu
t
Map Outpu
t
Merge result
Merge result
![Page 19: Big time: Introducing Hadoop on Azure](https://reader033.vdocuments.net/reader033/viewer/2022061111/54556ce2b1af9f37608b484b/html5/thumbnails/19.jpg)
Partitioners
•
•
![Page 20: Big time: Introducing Hadoop on Azure](https://reader033.vdocuments.net/reader033/viewer/2022061111/54556ce2b1af9f37608b484b/html5/thumbnails/20.jpg)
Combiners
•
•
•
![Page 21: Big time: Introducing Hadoop on Azure](https://reader033.vdocuments.net/reader033/viewer/2022061111/54556ce2b1af9f37608b484b/html5/thumbnails/21.jpg)
The TeraSort Use case
•
•
•
••
•
•
•
•
•
•
•
•
![Page 22: Big time: Introducing Hadoop on Azure](https://reader033.vdocuments.net/reader033/viewer/2022061111/54556ce2b1af9f37608b484b/html5/thumbnails/22.jpg)
The TeraSort Use case
•
![Page 23: Big time: Introducing Hadoop on Azure](https://reader033.vdocuments.net/reader033/viewer/2022061111/54556ce2b1af9f37608b484b/html5/thumbnails/23.jpg)
Beginners Pitfalls
•
•
••
•
•
![Page 24: Big time: Introducing Hadoop on Azure](https://reader033.vdocuments.net/reader033/viewer/2022061111/54556ce2b1af9f37608b484b/html5/thumbnails/24.jpg)
Beginners Pitfalls
•
•
••
•
•
![Page 25: Big time: Introducing Hadoop on Azure](https://reader033.vdocuments.net/reader033/viewer/2022061111/54556ce2b1af9f37608b484b/html5/thumbnails/25.jpg)
Distinct Values Problem Statement
•
:// . . /2012/02/01/ -http highlyscalable wordpress com mapreduce/patterns
![Page 26: Big time: Introducing Hadoop on Azure](https://reader033.vdocuments.net/reader033/viewer/2022061111/54556ce2b1af9f37608b484b/html5/thumbnails/26.jpg)
Distinct Values Problem Statement
•
:// . . /2012/02/01/ -http highlyscalable wordpress com mapreduce/patterns
![Page 27: Big time: Introducing Hadoop on Azure](https://reader033.vdocuments.net/reader033/viewer/2022061111/54556ce2b1af9f37608b484b/html5/thumbnails/27.jpg)
Distinct Values Problem Statement
•
:// . . /2012/02/01/ -http highlyscalable wordpress com mapreduce/patterns
![Page 28: Big time: Introducing Hadoop on Azure](https://reader033.vdocuments.net/reader033/viewer/2022061111/54556ce2b1af9f37608b484b/html5/thumbnails/28.jpg)
Distinct Values Problem Statement
•
:// . . /2012/02/01/ -http highlyscalable wordpress com mapreduce/patterns
![Page 29: Big time: Introducing Hadoop on Azure](https://reader033.vdocuments.net/reader033/viewer/2022061111/54556ce2b1af9f37608b484b/html5/thumbnails/29.jpg)
Administrating Hadoop in the real world
DEMO
![Page 30: Big time: Introducing Hadoop on Azure](https://reader033.vdocuments.net/reader033/viewer/2022061111/54556ce2b1af9f37608b484b/html5/thumbnails/30.jpg)
Why did Microsoft choose Hadoop?
•
•
•
•
•
•
![Page 31: Big time: Introducing Hadoop on Azure](https://reader033.vdocuments.net/reader033/viewer/2022061111/54556ce2b1af9f37608b484b/html5/thumbnails/31.jpg)
Hadoop on Azure
•
•
•
•
•
•
•
•
![Page 32: Big time: Introducing Hadoop on Azure](https://reader033.vdocuments.net/reader033/viewer/2022061111/54556ce2b1af9f37608b484b/html5/thumbnails/32.jpg)
Using hadooponazure.com
DEMO
![Page 33: Big time: Introducing Hadoop on Azure](https://reader033.vdocuments.net/reader033/viewer/2022061111/54556ce2b1af9f37608b484b/html5/thumbnails/33.jpg)
Windows Azure Compute
•
Azure Role
Supporting service
Application
Configuration
![Page 34: Big time: Introducing Hadoop on Azure](https://reader033.vdocuments.net/reader033/viewer/2022061111/54556ce2b1af9f37608b484b/html5/thumbnails/34.jpg)
Hadoop on Azure Roles
•
Azure Role
Monitoring service (RdAdmin)
Hadoop services
Configuration
![Page 35: Big time: Introducing Hadoop on Azure](https://reader033.vdocuments.net/reader033/viewer/2022061111/54556ce2b1af9f37608b484b/html5/thumbnails/35.jpg)
Hadoop MapReduce Processing
Head Node
Name Node
Worker Node
Data Node
Worker Node
Data Node
Worker Node
Data Node
Worker Node
Data Node
Fabric Controller
Worker Node
Data Node
Worker Node
Data Node
Worker Node
Data Node
![Page 36: Big time: Introducing Hadoop on Azure](https://reader033.vdocuments.net/reader033/viewer/2022061111/54556ce2b1af9f37608b484b/html5/thumbnails/36.jpg)
Hadoop MapReduce Processing
Head Node
Name Node
Worker Node
Data Node
Worker Node
Data Node
Worker Node
Data Node
Worker Node
Data Node
Fabric Controller
Worker Node
Data Node
Worker Node
Data Node
Worker Node
Data Node
![Page 37: Big time: Introducing Hadoop on Azure](https://reader033.vdocuments.net/reader033/viewer/2022061111/54556ce2b1af9f37608b484b/html5/thumbnails/37.jpg)
Hadoop MapReduce Processing
Head Node
Name Node
Worker Node
Data Node
Worker Node
Data Node
Worker Node
Data Node
Worker Node
Data Node
Fabric Controller
Worker Node
Data Node
Worker Node
Data Node
Worker Node
Data Node
Worker Node
Data Node
![Page 38: Big time: Introducing Hadoop on Azure](https://reader033.vdocuments.net/reader033/viewer/2022061111/54556ce2b1af9f37608b484b/html5/thumbnails/38.jpg)
The Head Node Template
•
•
•
•
••
•
•
•
![Page 39: Big time: Introducing Hadoop on Azure](https://reader033.vdocuments.net/reader033/viewer/2022061111/54556ce2b1af9f37608b484b/html5/thumbnails/39.jpg)
The Worker Node Template
•
•
•
![Page 40: Big time: Introducing Hadoop on Azure](https://reader033.vdocuments.net/reader033/viewer/2022061111/54556ce2b1af9f37608b484b/html5/thumbnails/40.jpg)
Node VM Templates
•
•
HEAD NODE WORKER NODE
VM Template Extra Large Medium
Cores 8 2
Memory 14 GB 3.5 GB
HD 2 TB 489 GB
![Page 41: Big time: Introducing Hadoop on Azure](https://reader033.vdocuments.net/reader033/viewer/2022061111/54556ce2b1af9f37608b484b/html5/thumbnails/41.jpg)
Cloud Storage
•
•
![Page 42: Big time: Introducing Hadoop on Azure](https://reader033.vdocuments.net/reader033/viewer/2022061111/54556ce2b1af9f37608b484b/html5/thumbnails/42.jpg)
High Availability on Azure
Fabric Controller
Head Node
Name Node
Head Node
Name Node
Azure Storage
![Page 43: Big time: Introducing Hadoop on Azure](https://reader033.vdocuments.net/reader033/viewer/2022061111/54556ce2b1af9f37608b484b/html5/thumbnails/43.jpg)
Elastic MapReduce
•
•
•
![Page 44: Big time: Introducing Hadoop on Azure](https://reader033.vdocuments.net/reader033/viewer/2022061111/54556ce2b1af9f37608b484b/html5/thumbnails/44.jpg)
Elastic MapReduce
Storage Client
Amazon S3
Head Node
Jobtracker
Worker Node
Tasktracker
Worker Node
Tasktracker
Worker Node
Tasktracker
Azure Storage
![Page 45: Big time: Introducing Hadoop on Azure](https://reader033.vdocuments.net/reader033/viewer/2022061111/54556ce2b1af9f37608b484b/html5/thumbnails/45.jpg)
Elastic MapReduce
Storage Client
Amazon S3
Head Node
Jobtracker
Worker Node
Tasktracker
Worker Node
Tasktracker
Worker Node
Tasktracker
Azure Storage
Head Node
Jobtracker
Worker Node
Tasktracker
Worker Node
Tasktracker
Worker Node
Tasktracker
![Page 46: Big time: Introducing Hadoop on Azure](https://reader033.vdocuments.net/reader033/viewer/2022061111/54556ce2b1af9f37608b484b/html5/thumbnails/46.jpg)
Elastic MapReduce
Storage Client
Amazon S3
Azure Storage
$$ $ $ $$ $ $ $
![Page 47: Big time: Introducing Hadoop on Azure](https://reader033.vdocuments.net/reader033/viewer/2022061111/54556ce2b1af9f37608b484b/html5/thumbnails/47.jpg)
Using Elastic MapReduce
DEMO
![Page 48: Big time: Introducing Hadoop on Azure](https://reader033.vdocuments.net/reader033/viewer/2022061111/54556ce2b1af9f37608b484b/html5/thumbnails/48.jpg)
Azure Blob Considerations
•
•
•
•
•
![Page 49: Big time: Introducing Hadoop on Azure](https://reader033.vdocuments.net/reader033/viewer/2022061111/54556ce2b1af9f37608b484b/html5/thumbnails/49.jpg)
Storage Size Limitations
•
•
•
•
![Page 50: Big time: Introducing Hadoop on Azure](https://reader033.vdocuments.net/reader033/viewer/2022061111/54556ce2b1af9f37608b484b/html5/thumbnails/50.jpg)
IsotopeJS
•
•
•
•
![Page 51: Big time: Introducing Hadoop on Azure](https://reader033.vdocuments.net/reader033/viewer/2022061111/54556ce2b1af9f37608b484b/html5/thumbnails/51.jpg)
Using the JavaScript interactive console
DEMO
![Page 52: Big time: Introducing Hadoop on Azure](https://reader033.vdocuments.net/reader033/viewer/2022061111/54556ce2b1af9f37608b484b/html5/thumbnails/52.jpg)
Using Hive
DEMO
![Page 53: Big time: Introducing Hadoop on Azure](https://reader033.vdocuments.net/reader033/viewer/2022061111/54556ce2b1af9f37608b484b/html5/thumbnails/53.jpg)
Summary
•
•
•
![Page 54: Big time: Introducing Hadoop on Azure](https://reader033.vdocuments.net/reader033/viewer/2022061111/54556ce2b1af9f37608b484b/html5/thumbnails/54.jpg)
Q & A
![Page 55: Big time: Introducing Hadoop on Azure](https://reader033.vdocuments.net/reader033/viewer/2022061111/54556ce2b1af9f37608b484b/html5/thumbnails/55.jpg)
Resources
http://bit.ly/roadan My Blog
Apache™ Hadoop™http://hadoop.apache.org
http://www.hadooponazure.com
Hadoop on Azure
Tom Whitehttp://shop.oreilly.com/product/9780596521981.do
Hadoop: The Definitive Guide
http://www.windowsazure.com/en-us/develop/overviewWindows Azure Developer center
Thanks!Yaniv Rodenski
Twitter: @YRodenski