a hadoop primer
DESCRIPTION
A simple introduction to Hadoop talk given to the Maine Java Users' Group February 15, 2011.TRANSCRIPT
10.20.2005
A Hadoop Primer
Feb 2011
2
http://redmonk.com/public/hadoop.pdf
3
The Background
4
October, 2003
5
December, 2004
6
Map::Reduce
7
Job::Map Reduce::Output
8
Counting Shakespeare
9
The Birth of Hadoop
10
11
12
Project Architecture
Source: Running Hadoop On Ubuntu Linux, Michael G. Noll, 8.8.07
13
Project Traction
14
Employment Potential
15
Hadoop Users
16
Why Hadoop?
17
More Machines = More Faster
18
The reason everyone knows
19
BIG DATA
20
“The big issue is not that everyone will suddenly operate at petabyte scale; a lot of folks do not have that much data.
The more important topics are the specifics of the storage and processing infrastructure and what approaches best suit each problem.”
- Bradford Cross, Flightcaster/Woven
21
The reason not everyone knows
22
DatanU s tr u
ct u
er
d
23
What Hadoop Is
24
“build Amazon's product search indices”“build the recommender system for behavioral targeting”“ETL style processing and statistics generation”“information extraction & search”“searching and analysis of millions of rental bookings”“we use Hadoop to summarize of user's tracking data”“we use Hadoop to store ad serving logs”“the freedom to query the data in an ad-hoc manner”“generating web graphs on 100 nodes”“we use Hadoop for batch-processing large RDF datasets”“facial similarity and recognition across large datasets““We are using Hadoop and Nutch to crawl Blog posts”“Used for ETL & data analysis on terascale datasets”
Source: http://wiki.apache.org/hadoop/PoweredBy
25
What Hadoop Isn't
26
A relational database killer
No Yes
27
Beyond Hadoop
28
The Hadoop Ecosystem
29
What We Use Hadoop For
30
Crawling Largeish Unstructured Datasets
31
Like 1.3M StackOverflow Questions
32
Or 1.7M HackerNews Entries
33
Or Years of Apache Log Files
34
How to Get Started
35
We use Cloudera
36
Mostly because it's easy
37
This easy
38
Or if you prefer
39
Or maybe this
40
QUESTIONS
41
Student? Talk to us