Transcript
Page 1: Strata + Hadoop World 2012 Keynote: Beyond Batch - Doug Cutting

1

Headline Goes HereSpeaker Name or Subhead Goes Here

DO NOT USE PUBLICLY PRIOR TO 10/23/12Beyond Batch

Doug Cutting October 2012

Page 2: Strata + Hadoop World 2012 Keynote: Beyond Batch - Doug Cutting

2

Hadoop Started As Batch

MapReduce• Simple, powerful• Kills a lot of birds

• Efficient, scalable• Compute at storage

• Shared platform• Used by Pig, Hive, etc.

• Incredibly useful!• But not sufficient

Page 3: Strata + Hadoop World 2012 Keynote: Beyond Batch - Doug Cutting

3

Big Data Is Not (Just) Batch

Its true themes are:• Scalability

• Affordability• Commodity hardware• Open-source software

• Distributed & reliable• Schema on read• Data beats algorithms

Page 4: Strata + Hadoop World 2012 Keynote: Beyond Batch - Doug Cutting

4

HBase: First Non-Batch Component

Online key/value store• Complement to batch

• Online put/get• Batch load & analyze• Best of both• Popular combination

• A step towards the future…

Page 5: Strata + Hadoop World 2012 Keynote: Beyond Batch - Doug Cutting

5

Holy Grail Of Big Data

• Open source, commodity HW, etc.• Linear scaling

• To scale, just buy more hardware• On many axes

• Storage capacity• Throughput & latency

• of batch & query• Transactions, Joins, Indexes

• and batch!

Page 6: Strata + Hadoop World 2012 Keynote: Beyond Batch - Doug Cutting

6

Google Gives Us A Map

Google publication Apache project

2004 GFS & MapReduce 2006 Hadoop batch programs

2005 Sawzall 2008 Pig & Hive batch queries

2006 BigTable 2008 HBase online key/value

... ... ... ... ...

2012 Spanner ? ? holy grail?

Google publication Open source project

2004 GFS & MapReduce 2006 Hadoop batch programs

2005 Sawzall 2008 Pig & Hive batch queries

2006 BigTable 2008 HBase online key/value

... ... ... ... ...

2012 Spanner ? ? transactions, etc.

5 years – 26 authors!

Page 7: Strata + Hadoop World 2012 Keynote: Beyond Batch - Doug Cutting

7

Impala Is Latest Step

Google publication Apache project

2004 GFS & MapReduce 2006 Hadoop batch programs

2005 Sawzall 2008 Pig & Hive batch queries

2006 BigTable 2008 HBase online key/value

... ... ... ... ...

2012 Spanner ? ? holy grail?

Google publication Open source project

2004 GFS & MapReduce 2006 Hadoop batch programs

2005 Sawzall 2008 Pig & Hive batch queries

2006 BigTable 2008 HBase online key/value

2010 Dremel/F1 2012 Impala online queries

2012 Spanner ? ? transactions, etc.

Page 8: Strata + Hadoop World 2012 Keynote: Beyond Batch - Doug Cutting

8

@cutting #bigquestions


Top Related