million monkeys user group
DESCRIPTION
Million Monkeys presentation given to Silicon Mountain Technology Group on 11-12-2012.TRANSCRIPT
1
Headline Goes HereSpeaker Name or Subhead Goes Here
DO NOT USE PUBLICLY PRIOR TO 10/23/12Million Monkeys
Jesse Anderson | Curriculum Developer and InstructorNovember 2012
2
About Me
• Cloudera - Educational Services Team• Twitter - @jessetanderson• Blog and more info: http://www.jesse-anderson.com• Screencasts on Pragmatic Programmers: Buy It Now on
http://www.jesse-anderson.com• President – Northern Nevada Software Developers Group
3
About Cloudera
• Cloudera is “The commercial Hadoop company”• Founded by leading experts on Hadoop from Facebook, Google,
Oracle and Yahoo• Provides consulting and training services for Hadoop users• Staff includes committers to virtually all Hadoop projects
4
Introduction
• Infinite Monkey Theorem• Hadoop• Million Monkeys Algorithm• Business Case
Infinite Monkey Theorem
5
“A million monkeys on a million typewriters will eventually recreate Shakespeare
”
6
Exponential Growth (aka Big Data)
Odds of finding a group of characters is 1 in 26 raised to the power of
the number of contiguous characters
1 in 26n
Contiguous Characters Combinations
8 208,827,064,576
9 5,429,503,678,976
10 141,167,095,653,376
7
Hadoop
• Apache Project• Reliable, Scalable, Distributed Computing• Software Framework• MapReduce• Distributed File System (HDFS)• Other projects
8
MapCreate or process the input data
9
ReduceProcess data from Map into something usable
10
Data Flow
11
Million Monkeys Algorithm
12
Business Case
13
Hadoop Scalability
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 200
20
40
60
80
100Percent of Linear Scalability
RDBMSHadoop
Perc
ent
RDBMS = Relational DatabaseNodes
14
Scaling does not require massive re-engineering
and complete rewrites of code
Business Value of Scalability
Adding more computers to cluster gets a
predictable increase in computational power and
storage
$$$SAVETIMESAVE
15
Going Viral (and taking over the world)
26,000 unique visits from 119 countries in one day
Covered internationally in BBC, Wall Street Journal, Wired and Slashdot
16
Next Steps
• Books• Hadoop: The Definitive Guide - Tom White• Hadoop Operations - Eric Sammer
• Cloudera Training• Developer, Admin, Hive and Pig, HBase, Essentials
• CDH• Cloudera's Apache Distribution Including Hadoop• Open Source• VM Image
17
Conclusion
• MapReduce breaks up problem efficiently• No code changes to scale• Incredible scalability• Enables previously impossible tasks
18