2014 feb 24_big_datacongress_hadoopsession1_hadoop101
DESCRIPTION
A hands on introduction to Hadoop by using the Hortonworks SandboxTRANSCRIPT
![Page 1: 2014 feb 24_big_datacongress_hadoopsession1_hadoop101](https://reader034.vdocuments.net/reader034/viewer/2022042714/54c6ba854a795932518b459a/html5/thumbnails/1.jpg)
HADOOP 101: AN INTRODUCTION TO HADOOP WITH THE HORTONWORKS SANDBOX
Adam Muise – Solu/on Architect, Hortonworks
![Page 2: 2014 feb 24_big_datacongress_hadoopsession1_hadoop101](https://reader034.vdocuments.net/reader034/viewer/2022042714/54c6ba854a795932518b459a/html5/thumbnails/2.jpg)
Who are we?
![Page 3: 2014 feb 24_big_datacongress_hadoopsession1_hadoop101](https://reader034.vdocuments.net/reader034/viewer/2022042714/54c6ba854a795932518b459a/html5/thumbnails/3.jpg)
Who is ?
![Page 4: 2014 feb 24_big_datacongress_hadoopsession1_hadoop101](https://reader034.vdocuments.net/reader034/viewer/2022042714/54c6ba854a795932518b459a/html5/thumbnails/4.jpg)
We do Hadoop
The leaders of Hadoop’s development
Community driven, Enterprise Focused
Drive Innova/on in the plaForm – We lead the roadmap
100% Open Source – Democra/zed Access to Data
![Page 5: 2014 feb 24_big_datacongress_hadoopsession1_hadoop101](https://reader034.vdocuments.net/reader034/viewer/2022042714/54c6ba854a795932518b459a/html5/thumbnails/5.jpg)
We do Hadoop successfully.
Support
Professional Services Training
![Page 6: 2014 feb 24_big_datacongress_hadoopsession1_hadoop101](https://reader034.vdocuments.net/reader034/viewer/2022042714/54c6ba854a795932518b459a/html5/thumbnails/6.jpg)
Enter the Hadoop.
hOp://www.fabulouslybroke.com/2011/05/ninja-‐elephants-‐and-‐other-‐awesome-‐stories/
………
![Page 7: 2014 feb 24_big_datacongress_hadoopsession1_hadoop101](https://reader034.vdocuments.net/reader034/viewer/2022042714/54c6ba854a795932518b459a/html5/thumbnails/7.jpg)
Hadoop was created because tradi/onal technologies never cut it
for the Internet proper/es like Google, Yahoo, Facebook, TwiOer,
and LinkedIn
![Page 8: 2014 feb 24_big_datacongress_hadoopsession1_hadoop101](https://reader034.vdocuments.net/reader034/viewer/2022042714/54c6ba854a795932518b459a/html5/thumbnails/8.jpg)
Tradi/onal architecture didn’t scale enough…
DB DB DB
SAN
App App App App
DB DB DB
SAN
App App App App DB DB DB
SAN
App App App App
![Page 9: 2014 feb 24_big_datacongress_hadoopsession1_hadoop101](https://reader034.vdocuments.net/reader034/viewer/2022042714/54c6ba854a795932518b459a/html5/thumbnails/9.jpg)
Databases can become bloated and useless
![Page 10: 2014 feb 24_big_datacongress_hadoopsession1_hadoop101](https://reader034.vdocuments.net/reader034/viewer/2022042714/54c6ba854a795932518b459a/html5/thumbnails/10.jpg)
Tradi/onal architectures cost too much at that volume…
$/TB
$pecial Hardware
$upercompu/ng
![Page 11: 2014 feb 24_big_datacongress_hadoopsession1_hadoop101](https://reader034.vdocuments.net/reader034/viewer/2022042714/54c6ba854a795932518b459a/html5/thumbnails/11.jpg)
So what is the answer?
![Page 12: 2014 feb 24_big_datacongress_hadoopsession1_hadoop101](https://reader034.vdocuments.net/reader034/viewer/2022042714/54c6ba854a795932518b459a/html5/thumbnails/12.jpg)
If you could design a system that would handle this, what would it
look like?
![Page 13: 2014 feb 24_big_datacongress_hadoopsession1_hadoop101](https://reader034.vdocuments.net/reader034/viewer/2022042714/54c6ba854a795932518b459a/html5/thumbnails/13.jpg)
It would probably need a highly resilient, self-‐healing, cost-‐efficient,
distributed file system…
Storage Storage Storage
Storage Storage Storage
Storage Storage Storage
![Page 14: 2014 feb 24_big_datacongress_hadoopsession1_hadoop101](https://reader034.vdocuments.net/reader034/viewer/2022042714/54c6ba854a795932518b459a/html5/thumbnails/14.jpg)
It would probably need a completely parallel processing framework that
took tasks to the data…
Storage Storage Storage
Storage Storage Storage
Storage Storage Storage Processing Processing Processing
Processing Processing Processing
Processing Processing Processing
![Page 15: 2014 feb 24_big_datacongress_hadoopsession1_hadoop101](https://reader034.vdocuments.net/reader034/viewer/2022042714/54c6ba854a795932518b459a/html5/thumbnails/15.jpg)
It would probably run on commodity hardware, virtualized machines, and
common OS plaForms
Storage Storage Storage
Storage Storage Storage
Storage Storage Storage Processing Processing Processing
Processing Processing Processing
Processing Processing Processing
![Page 16: 2014 feb 24_big_datacongress_hadoopsession1_hadoop101](https://reader034.vdocuments.net/reader034/viewer/2022042714/54c6ba854a795932518b459a/html5/thumbnails/16.jpg)
It would probably be open source so innova/on could happen as quickly
as possible
![Page 17: 2014 feb 24_big_datacongress_hadoopsession1_hadoop101](https://reader034.vdocuments.net/reader034/viewer/2022042714/54c6ba854a795932518b459a/html5/thumbnails/17.jpg)
It would need a cri/cal mass of users
![Page 18: 2014 feb 24_big_datacongress_hadoopsession1_hadoop101](https://reader034.vdocuments.net/reader034/viewer/2022042714/54c6ba854a795932518b459a/html5/thumbnails/18.jpg)
Apache Hadoop
Flume Ambari
HBase Falcon
MapReduce HDFS
Sqoop HCatalog
Pig
Hive
Storm YARN
Knox
Tez
![Page 19: 2014 feb 24_big_datacongress_hadoopsession1_hadoop101](https://reader034.vdocuments.net/reader034/viewer/2022042714/54c6ba854a795932518b459a/html5/thumbnails/19.jpg)
Hortonworks Data PlaForm
Flume Ambari
HBase Falcon
MapReduce HDFS
Sqoop HCatalog
Pig
Hive
Storm YARN
Knox
Tez
![Page 20: 2014 feb 24_big_datacongress_hadoopsession1_hadoop101](https://reader034.vdocuments.net/reader034/viewer/2022042714/54c6ba854a795932518b459a/html5/thumbnails/20.jpg)
We are going to learn how to work with Hadoop in less than an hour.
![Page 21: 2014 feb 24_big_datacongress_hadoopsession1_hadoop101](https://reader034.vdocuments.net/reader034/viewer/2022042714/54c6ba854a795932518b459a/html5/thumbnails/21.jpg)
To do this, we need to install Hadoop right?
![Page 22: 2014 feb 24_big_datacongress_hadoopsession1_hadoop101](https://reader034.vdocuments.net/reader034/viewer/2022042714/54c6ba854a795932518b459a/html5/thumbnails/22.jpg)
Nope.
![Page 23: 2014 feb 24_big_datacongress_hadoopsession1_hadoop101](https://reader034.vdocuments.net/reader034/viewer/2022042714/54c6ba854a795932518b459a/html5/thumbnails/23.jpg)
Enter the
Sandbox.
![Page 24: 2014 feb 24_big_datacongress_hadoopsession1_hadoop101](https://reader034.vdocuments.net/reader034/viewer/2022042714/54c6ba854a795932518b459a/html5/thumbnails/24.jpg)
The Sandbox is ‘Hadoop in a Can’. It contains one copy of each of the Master and Worker node processes used in a cluster, only in a single
virtual node.
Storage Storage Storage
Storage Storage Storage
Storage Storage Storage Processing Processing Processing
Processing Processing Processing
Processing Processing Processing
Processing Storage
Linux VM
![Page 25: 2014 feb 24_big_datacongress_hadoopsession1_hadoop101](https://reader034.vdocuments.net/reader034/viewer/2022042714/54c6ba854a795932518b459a/html5/thumbnails/25.jpg)
Gefng started with Sandbox VM: -‐ Pick your flavor of VM at…
hOp://www.hortonworks.com/sandbox -‐ Start the sandbox VM -‐ find the IP displayed -‐ go to…
hOp://172.16.130.131 -‐ Register -‐ Click on ‘Start Tutorials’ -‐ On the lek hand nav, click on ‘HCatalog, Basic Pig
& Hive Commands’
![Page 26: 2014 feb 24_big_datacongress_hadoopsession1_hadoop101](https://reader034.vdocuments.net/reader034/viewer/2022042714/54c6ba854a795932518b459a/html5/thumbnails/26.jpg)
In this tutorial we will: -‐ Land files in HDFS -‐ Assign metadata with HCatalog -‐ Use SQL with Hive -‐ Learn to process data with Pig
![Page 27: 2014 feb 24_big_datacongress_hadoopsession1_hadoop101](https://reader034.vdocuments.net/reader034/viewer/2022042714/54c6ba854a795932518b459a/html5/thumbnails/27.jpg)
Try the other tutorials.
![Page 28: 2014 feb 24_big_datacongress_hadoopsession1_hadoop101](https://reader034.vdocuments.net/reader034/viewer/2022042714/54c6ba854a795932518b459a/html5/thumbnails/28.jpg)
Hadoop is the new Modern Data Architecture for the Enterprise
![Page 29: 2014 feb 24_big_datacongress_hadoopsession1_hadoop101](https://reader034.vdocuments.net/reader034/viewer/2022042714/54c6ba854a795932518b459a/html5/thumbnails/29.jpg)
© Hortonworks Inc. 2012: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Page 29
There is NO second place
Hortonworks …the Bull Elephant of Hadoop InnovaGon